[PATCH 1/2] cpufreq: try to resume policies which failed on last resume

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
@ 2013-12-24  1:41 Viresh Kumar
  2013-12-24  1:41 ` [PATCH 2/2] cpufreq: preserve user_policy across suspend/resume Viresh Kumar
  2013-12-26  1:05 ` [PATCH 1/2] cpufreq: try to resume policies which failed on last resume Rafael J. Wysocki
  0 siblings, 2 replies; 21+ messages in thread
From: Viresh Kumar @ 2013-12-24  1:41 UTC (permalink / raw)
  To: rjw
  Cc: linaro-kernel, patches, cpufreq, linux-pm, linux-kernel, bjorn,
	Viresh Kumar

__cpufreq_add_dev() can fail sometimes while we are resuming our system.
Currently we are clearing all sysfs nodes for cpufreq's failed policy as that
could make userspace unstable. But if we suspend/resume again, we should atleast
try to bring back those policies.

This patch fixes this issue by clearing fallback data on failure and trying to
allocate a new struct cpufreq_policy on second resume.

Reported-and-tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
These are sent again (earlier sent as reply to emails), so that people can
give inputs if they have any.

Tested on my thinkpad T420.

 drivers/cpufreq/cpufreq.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 16d7b4a..0a48e71 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1016,16 +1016,24 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
 	read_unlock_irqrestore(&cpufreq_driver_lock, flags);
 #endif
 
-	if (frozen)
+	if (frozen) {
 		/* Restore the saved policy when doing light-weight init */
 		policy = cpufreq_policy_restore(cpu);
-	else
+
+		/*
+		 * As we failed to resume cpufreq core last time, lets try to
+		 * create a new policy.
+		 */
+		if (!policy)
+			frozen = false;
+	}
+
+	if (!frozen)
 		policy = cpufreq_policy_alloc();
 
 	if (!policy)
 		goto nomem_out;
 
-
 	/*
 	 * In the resume path, since we restore a saved policy, the assignment
 	 * to policy->cpu is like an update of the existing policy, rather than
@@ -1118,8 +1126,14 @@ err_get_freq:
 	if (cpufreq_driver->exit)
 		cpufreq_driver->exit(policy);
 err_set_policy_cpu:
-	if (frozen)
+	if (frozen) {
+		/*
+		 * Clear fallback data as we should try to make things work on
+		 * next suspend/resume
+		 */
+		per_cpu(cpufreq_cpu_data_fallback, cpu) = NULL;
 		cpufreq_policy_put_kobj(policy);
+	}
 	cpufreq_policy_free(policy);
 
 nomem_out:
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/2] cpufreq: preserve user_policy across suspend/resume
  2013-12-24  1:41 [PATCH 1/2] cpufreq: try to resume policies which failed on last resume Viresh Kumar
@ 2013-12-24  1:41 ` Viresh Kumar
  2013-12-26  1:05 ` [PATCH 1/2] cpufreq: try to resume policies which failed on last resume Rafael J. Wysocki
  1 sibling, 0 replies; 21+ messages in thread
From: Viresh Kumar @ 2013-12-24  1:41 UTC (permalink / raw)
  To: rjw
  Cc: linaro-kernel, patches, cpufreq, linux-pm, linux-kernel, bjorn,
	Viresh Kumar

In __cpufreq_add_dev() we are reinitializing user_policy with default values and
hence would loose values of user_policy.{min|max|policy|governor} fields.

Preserve them by not overriding these for suspend/resume.

Reported-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 0a48e71..59e918e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -839,9 +839,6 @@ static void cpufreq_init_policy(struct cpufreq_policy *policy)
 
 	/* set default policy */
 	ret = cpufreq_set_policy(policy, &new_policy);
-	policy->user_policy.policy = policy->policy;
-	policy->user_policy.governor = policy->governor;
-
 	if (ret) {
 		pr_debug("setting policy failed\n");
 		if (cpufreq_driver->exit)
@@ -1077,8 +1074,10 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
 	 */
 	cpumask_and(policy->cpus, policy->cpus, cpu_online_mask);
 
-	policy->user_policy.min = policy->min;
-	policy->user_policy.max = policy->max;
+	if (!frozen) {
+		policy->user_policy.min = policy->min;
+		policy->user_policy.max = policy->max;
+	}
 
 	blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
 				     CPUFREQ_START, policy);
@@ -1109,6 +1108,11 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
 
 	cpufreq_init_policy(policy);
 
+	if (!frozen) {
+		policy->user_policy.policy = policy->policy;
+		policy->user_policy.governor = policy->governor;
+	}
+
 	kobject_uevent(&policy->kobj, KOBJ_ADD);
 	up_read(&cpufreq_rwsem);
 
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2013-12-24  1:41 [PATCH 1/2] cpufreq: try to resume policies which failed on last resume Viresh Kumar
  2013-12-24  1:41 ` [PATCH 2/2] cpufreq: preserve user_policy across suspend/resume Viresh Kumar
@ 2013-12-26  1:05 ` Rafael J. Wysocki
  2013-12-26  2:47   ` Viresh Kumar
  2014-01-02 12:15   ` Bjørn Mork
  1 sibling, 2 replies; 21+ messages in thread
From: Rafael J. Wysocki @ 2013-12-26  1:05 UTC (permalink / raw)
  To: Viresh Kumar, bjorn
  Cc: linaro-kernel, patches, cpufreq, linux-pm, linux-kernel

On Tuesday, December 24, 2013 07:11:00 AM Viresh Kumar wrote:
> __cpufreq_add_dev() can fail sometimes while we are resuming our system.
> Currently we are clearing all sysfs nodes for cpufreq's failed policy as that
> could make userspace unstable. But if we suspend/resume again, we should atleast
> try to bring back those policies.
> 
> This patch fixes this issue by clearing fallback data on failure and trying to
> allocate a new struct cpufreq_policy on second resume.
> 
> Reported-and-tested-by: Bjørn Mork <bjorn@mork.no>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

Well, while I appreciate the work done here, I don't like the changelog,
I don't really like the way the code is written and I don't like the comments.
Sorry about that.

Bjorn, can you please test the patch below instead along with the [2/2]
from this series (on top of linux-pm.git/pm-cpufreq)?

Rafael


---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: cpufreq: Clean up after a failing light-weight initialization

If cpufreq_policy_restore() returns NULL during system resume,
__cpufreq_add_dev() should just fall back to the full initialization
instead of returning an error, because that may actually make things
work.  Moreover, it should not leave stale fallback data behind after
it has failed to restore a previously existing policy.

This change is based on Viresh Kumar's work.

Reported-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq.c |   23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

Index: linux-pm/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq.c
+++ linux-pm/drivers/cpufreq/cpufreq.c
@@ -1016,15 +1016,17 @@ static int __cpufreq_add_dev(struct devi
 	read_unlock_irqrestore(&cpufreq_driver_lock, flags);
 #endif
 
-	if (frozen)
-		/* Restore the saved policy when doing light-weight init */
-		policy = cpufreq_policy_restore(cpu);
-	else
+	/*
+	 * Restore the saved policy when doing light-weight init and fall back
+	 * to the full init if that fails.
+	 */
+	policy = frozen ? cpufreq_policy_restore(cpu) : NULL;
+	if (!policy) {
+		frozen = false;
 		policy = cpufreq_policy_alloc();
-
-	if (!policy)
-		goto nomem_out;
-
+		if (!policy)
+			goto nomem_out;
+	}
 
 	/*
 	 * In the resume path, since we restore a saved policy, the assignment
@@ -1118,8 +1120,11 @@ err_get_freq:
 	if (cpufreq_driver->exit)
 		cpufreq_driver->exit(policy);
 err_set_policy_cpu:
-	if (frozen)
+	if (frozen) {
+		/* Do not leave stale fallback data behind. */
+		per_cpu(cpufreq_cpu_data_fallback, cpu) = NULL;
 		cpufreq_policy_put_kobj(policy);
+	}
 	cpufreq_policy_free(policy);
 
 nomem_out:


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2013-12-26  1:05 ` [PATCH 1/2] cpufreq: try to resume policies which failed on last resume Rafael J. Wysocki
@ 2013-12-26  2:47   ` Viresh Kumar
  2013-12-27  9:57     ` Viresh Kumar
  2014-01-02 12:15   ` Bjørn Mork
  1 sibling, 1 reply; 21+ messages in thread
From: Viresh Kumar @ 2013-12-26  2:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bjørn Mork, Lists linaro-kernel, Patch Tracking,
	cpufreq@vger.kernel.org, linux-pm@vger.kernel.org,
	Linux Kernel Mailing List

On 26 December 2013 06:35, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: cpufreq: Clean up after a failing light-weight initialization
>
> If cpufreq_policy_restore() returns NULL during system resume,
> __cpufreq_add_dev() should just fall back to the full initialization
> instead of returning an error, because that may actually make things
> work.  Moreover, it should not leave stale fallback data behind after
> it has failed to restore a previously existing policy.
>
> This change is based on Viresh Kumar's work.
>
> Reported-by: Bjørn Mork <bjorn@mork.no>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2013-12-26  2:47   ` Viresh Kumar
@ 2013-12-27  9:57     ` Viresh Kumar
  2013-12-27  9:58       ` Viresh Kumar
  0 siblings, 1 reply; 21+ messages in thread
From: Viresh Kumar @ 2013-12-27  9:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bjørn Mork, Lists linaro-kernel, Patch Tracking,
	cpufreq@vger.kernel.org, linux-pm@vger.kernel.org,
	Linux Kernel Mailing List

On 26 December 2013 08:17, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 26 December 2013 06:35, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Subject: cpufreq: Clean up after a failing light-weight initialization
>>
>> If cpufreq_policy_restore() returns NULL during system resume,
>> __cpufreq_add_dev() should just fall back to the full initialization
>> instead of returning an error, because that may actually make things
>> work.  Moreover, it should not leave stale fallback data behind after
>> it has failed to restore a previously existing policy.
>>
>> This change is based on Viresh Kumar's work.
>>
>> Reported-by: Bjørn Mork <bjorn@mork.no>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

I think there is nothing much different in this patch compared to what Bjorn
tested. So you can probably push that now and let him test linux-next later
once he is back?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2013-12-27  9:57     ` Viresh Kumar
@ 2013-12-27  9:58       ` Viresh Kumar
  2013-12-30 16:40         ` Bjørn Mork
  0 siblings, 1 reply; 21+ messages in thread
From: Viresh Kumar @ 2013-12-27  9:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bjørn Mork, Lists linaro-kernel, Patch Tracking,
	cpufreq@vger.kernel.org, linux-pm@vger.kernel.org,
	Linux Kernel Mailing List

On 27 December 2013 15:27, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> I think there is nothing much different in this patch compared to what Bjorn
> tested. So you can probably push that now and let him test linux-next later
> once he is back?

Just saw that you already pushed it. :)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2013-12-27  9:58       ` Viresh Kumar
@ 2013-12-30 16:40         ` Bjørn Mork
  0 siblings, 0 replies; 21+ messages in thread
From: Bjørn Mork @ 2013-12-30 16:40 UTC (permalink / raw)
  To: Viresh Kumar, Rafael J. Wysocki
  Cc: Lists linaro-kernel, Patch Tracking, cpufreq@vger.kernel.org,
	linux-pm@vger.kernel.org, Linux Kernel Mailing List

Yes, that's good. As you might have guessed by the lack of any response, I won't be able to test this in 2013.

Thanks for all the good work!


Bjørn 

Viresh Kumar <viresh.kumar@linaro.org> wrote:
>On 27 December 2013 15:27, Viresh Kumar <viresh.kumar@linaro.org>
>wrote:
>> I think there is nothing much different in this patch compared to
>what Bjorn
>> tested. So you can probably push that now and let him test linux-next
>later
>> once he is back?
>
>Just saw that you already pushed it. :)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2013-12-26  1:05 ` [PATCH 1/2] cpufreq: try to resume policies which failed on last resume Rafael J. Wysocki
  2013-12-26  2:47   ` Viresh Kumar
@ 2014-01-02 12:15   ` Bjørn Mork
  2014-01-03  0:40     ` Rafael J. Wysocki
  1 sibling, 1 reply; 21+ messages in thread
From: Bjørn Mork @ 2014-01-02 12:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Viresh Kumar, linaro-kernel, patches, cpufreq, linux-pm,
	linux-kernel

"Rafael J. Wysocki" <rjw@rjwysocki.net> writes:

> On Tuesday, December 24, 2013 07:11:00 AM Viresh Kumar wrote:
>> __cpufreq_add_dev() can fail sometimes while we are resuming our system.
>> Currently we are clearing all sysfs nodes for cpufreq's failed policy as that
>> could make userspace unstable. But if we suspend/resume again, we should atleast
>> try to bring back those policies.
>> 
>> This patch fixes this issue by clearing fallback data on failure and trying to
>> allocate a new struct cpufreq_policy on second resume.
>> 
>> Reported-and-tested-by: Bjørn Mork <bjorn@mork.no>
>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>
> Well, while I appreciate the work done here, I don't like the changelog,
> I don't really like the way the code is written and I don't like the comments.
> Sorry about that.
>
> Bjorn, can you please test the patch below instead along with the [2/2]
> from this series (on top of linux-pm.git/pm-cpufreq)?

I tested this series with your modified 1/2 today, but on top of a
current mainline (commit 9a0bb2966ef) instead of linux-pm.git/pm-cpufreq.
Shouldn't make much difference since Linus already has pulled your
'pm+acpi-3.13-rc6' tag, which included that pm-cpufreq branch.

This series fixes the reported bug for me.


But I observed this, most likely unrelated, splat during the test:

ACPI Exception: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20131115/evregion-282)
ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC_.EC__.LPMD] (Node ffff880232499c18), AE_BAD_PARAMETER (20131115/psparse-536)
ACPI Error: Method parse/execution failed [\_PR_.CPU0._PPC] (Node ffff8802326f93d0), AE_BAD_PARAMETER (20131115/psparse-536)
ACPI Error: Method parse/execution failed [\_PR_.CPU1._PPC] (Node ffff8802326f9268), AE_BAD_PARAMETER (20131115/psparse-536)
ACPI Exception: AE_BAD_PARAMETER, Evaluating _PPC (20131115/processor_perflib-140)

======================================================
[ INFO: possible circular locking dependency detected ]
3.13.0-rc6+ #181 Not tainted
-------------------------------------------------------
s2disk/5651 is trying to acquire lock:
 (s_active#170){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46

but task is already holding lock:
 (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (cpu_hotplug.lock){+.+.+.}:
       [<ffffffff81075027>] lock_acquire+0xfb/0x144
       [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
       [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
       [<ffffffff812a6974>] store+0x20/0xad
       [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
       [<ffffffff8112a428>] vfs_write+0x9c/0x102
       [<ffffffff8112a716>] SyS_write+0x50/0x85
       [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b

-> #0 (s_active#170){++++.+}:
       [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
       [<ffffffff81075027>] lock_acquire+0xfb/0x144
       [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
       [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
       [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
       [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
       [<ffffffff811d5d11>] kobject_del+0x18/0x42
       [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
       [<ffffffff811d5ede>] kobject_put+0x45/0x49
       [<ffffffff812a6e85>] cpufreq_policy_put_kobj+0x37/0x83
       [<ffffffff812a8bab>] __cpufreq_add_dev.isra.18+0x75e/0x78c
       [<ffffffff812a8c39>] cpufreq_cpu_callback+0x53/0x88
       [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
       [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
       [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
       [<ffffffff81039ea1>] cpu_notify+0xe/0x10
       [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
       [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
       [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
       [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
       [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
       [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
       [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(cpu_hotplug.lock);
                               lock(s_active#170);
                               lock(cpu_hotplug.lock);
  lock(s_active#170);

 *** DEADLOCK ***

7 locks held by s2disk/5651:
 #0:  (pm_mutex){+.+.+.}, at: [<ffffffff8107e9ea>] snapshot_ioctl+0x4b/0x431
 #1:  (device_hotplug_lock){+.+.+.}, at: [<ffffffff81283730>] lock_device_hotplug+0x12/0x14
 #2:  (acpi_scan_lock){+.+.+.}, at: [<ffffffff8122c657>] acpi_scan_lock_acquire+0x12/0x14
 #3:  (console_lock){+.+.+.}, at: [<ffffffff810817f2>] suspend_console+0x20/0x38
 #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81039fb9>] cpu_maps_update_begin+0x12/0x14
 #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
 #6:  (cpufreq_rwsem){.+.+.+}, at: [<ffffffff812a84cc>] __cpufreq_add_dev.isra.18+0x7f/0x78c

stack backtrace:
CPU: 0 PID: 5651 Comm: s2disk Not tainted 3.13.0-rc6+ #181
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
 ffffffff81d3edf0 ffff8802174898f8 ffffffff81399cac 0000000000004549
 ffffffff81d3edf0 ffff880217489948 ffffffff81397dc5 ffff880217489928
 ffff88023198ea50 0000000000000006 ffff88023198f1d8 0000000000000006
Call Trace:
 [<ffffffff81399cac>] dump_stack+0x4e/0x68
 [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209
 [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
 [<ffffffff8107565d>] ? debug_check_no_locks_freed+0x12c/0x143
 [<ffffffff81075027>] lock_acquire+0xfb/0x144
 [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
 [<ffffffff81072201>] ? lockdep_init_map+0x14e/0x160
 [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
 [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
 [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
 [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
 [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
 [<ffffffff811d5d11>] kobject_del+0x18/0x42
 [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
 [<ffffffff811d5ede>] kobject_put+0x45/0x49
 [<ffffffff812a6e85>] cpufreq_policy_put_kobj+0x37/0x83
 [<ffffffff812a8bab>] __cpufreq_add_dev.isra.18+0x75e/0x78c
 [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54
 [<ffffffff812a8c39>] cpufreq_cpu_callback+0x53/0x88
 [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
 [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
 [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
 [<ffffffff81039ea1>] cpu_notify+0xe/0x10
 [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
 [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
 [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
 [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
 [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
 [<ffffffff811414c8>] ? fcheck_files+0xa1/0xe4
 [<ffffffff81141a67>] ? fget_light+0x33/0x9a
 [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
 [<ffffffff811df4ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
CPU1 is up
ACPI: Waking up from system sleep state S4
PM: noirq thaw of devices complete after 2.727 msecs
PM: early thaw of devices complete after 1.137 msecs



This warning appeared when I tried cancelling hibernation, which is
known to fail when using the acpi-cpufreq driver.  The point of the test
was to verify that such failures still produce the expected result:

 - all stale sysfs files are cleaned up and removed
 - later suspend/resume actions will restore (or actually reinitiate)
   the cpufreq driver for the non-boot CPUs

Which was successful, except that it produced the above lockdep warning.

I have not verified that this is a new warning (which means that it most
likely is not).  And I expect the whole acpi-cpufreq problem, including
that warning, to go away when you eventually push
http://www.spinics.net/lists/cpufreq/msg08714.html with your improved
changelog (thanks for doing that BTW).  So I don't worry too much about
it.  Just wanted to let you know.



Bjørn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2014-01-02 12:15   ` Bjørn Mork
@ 2014-01-03  0:40     ` Rafael J. Wysocki
  2014-01-03  9:24       ` Bjørn Mork
  0 siblings, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2014-01-03  0:40 UTC (permalink / raw)
  To: Bjørn Mork, cpufreq; +Cc: Viresh Kumar, linux-pm, linux-kernel

On Thursday, January 02, 2014 01:15:10 PM Bjørn Mork wrote:
> "Rafael J. Wysocki" <rjw@rjwysocki.net> writes:
> 
> > On Tuesday, December 24, 2013 07:11:00 AM Viresh Kumar wrote:
> >> __cpufreq_add_dev() can fail sometimes while we are resuming our system.
> >> Currently we are clearing all sysfs nodes for cpufreq's failed policy as that
> >> could make userspace unstable. But if we suspend/resume again, we should atleast
> >> try to bring back those policies.
> >> 
> >> This patch fixes this issue by clearing fallback data on failure and trying to
> >> allocate a new struct cpufreq_policy on second resume.
> >> 
> >> Reported-and-tested-by: Bjørn Mork <bjorn@mork.no>
> >> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> >
> > Well, while I appreciate the work done here, I don't like the changelog,
> > I don't really like the way the code is written and I don't like the comments.
> > Sorry about that.
> >
> > Bjorn, can you please test the patch below instead along with the [2/2]
> > from this series (on top of linux-pm.git/pm-cpufreq)?
> 
> I tested this series with your modified 1/2 today, but on top of a
> current mainline (commit 9a0bb2966ef) instead of linux-pm.git/pm-cpufreq.
> Shouldn't make much difference since Linus already has pulled your
> 'pm+acpi-3.13-rc6' tag, which included that pm-cpufreq branch.
> 
> This series fixes the reported bug for me.
> 
> 
> But I observed this, most likely unrelated, splat during the test:
> 
> ACPI Exception: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20131115/evregion-282)
> ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC_.EC__.LPMD] (Node ffff880232499c18), AE_BAD_PARAMETER (20131115/psparse-536)
> ACPI Error: Method parse/execution failed [\_PR_.CPU0._PPC] (Node ffff8802326f93d0), AE_BAD_PARAMETER (20131115/psparse-536)
> ACPI Error: Method parse/execution failed [\_PR_.CPU1._PPC] (Node ffff8802326f9268), AE_BAD_PARAMETER (20131115/psparse-536)
> ACPI Exception: AE_BAD_PARAMETER, Evaluating _PPC (20131115/processor_perflib-140)
> 
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.13.0-rc6+ #181 Not tainted
> -------------------------------------------------------
> s2disk/5651 is trying to acquire lock:
>  (s_active#170){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
> 
> but task is already holding lock:
>  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (cpu_hotplug.lock){+.+.+.}:
>        [<ffffffff81075027>] lock_acquire+0xfb/0x144
>        [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
>        [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
>        [<ffffffff812a6974>] store+0x20/0xad
>        [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
>        [<ffffffff8112a428>] vfs_write+0x9c/0x102
>        [<ffffffff8112a716>] SyS_write+0x50/0x85
>        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
> 
> -> #0 (s_active#170){++++.+}:
>        [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
>        [<ffffffff81075027>] lock_acquire+0xfb/0x144
>        [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
>        [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
>        [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
>        [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
>        [<ffffffff811d5d11>] kobject_del+0x18/0x42
>        [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
>        [<ffffffff811d5ede>] kobject_put+0x45/0x49
>        [<ffffffff812a6e85>] cpufreq_policy_put_kobj+0x37/0x83
>        [<ffffffff812a8bab>] __cpufreq_add_dev.isra.18+0x75e/0x78c
>        [<ffffffff812a8c39>] cpufreq_cpu_callback+0x53/0x88
>        [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
>        [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
>        [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
>        [<ffffffff81039ea1>] cpu_notify+0xe/0x10
>        [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
>        [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
>        [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
>        [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
>        [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
>        [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
>        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
> 
> other info that might help us debug this:
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(cpu_hotplug.lock);
>                                lock(s_active#170);
>                                lock(cpu_hotplug.lock);
>   lock(s_active#170);
> 
>  *** DEADLOCK ***
> 
> 7 locks held by s2disk/5651:
>  #0:  (pm_mutex){+.+.+.}, at: [<ffffffff8107e9ea>] snapshot_ioctl+0x4b/0x431
>  #1:  (device_hotplug_lock){+.+.+.}, at: [<ffffffff81283730>] lock_device_hotplug+0x12/0x14
>  #2:  (acpi_scan_lock){+.+.+.}, at: [<ffffffff8122c657>] acpi_scan_lock_acquire+0x12/0x14
>  #3:  (console_lock){+.+.+.}, at: [<ffffffff810817f2>] suspend_console+0x20/0x38
>  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81039fb9>] cpu_maps_update_begin+0x12/0x14
>  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
>  #6:  (cpufreq_rwsem){.+.+.+}, at: [<ffffffff812a84cc>] __cpufreq_add_dev.isra.18+0x7f/0x78c
> 
> stack backtrace:
> CPU: 0 PID: 5651 Comm: s2disk Not tainted 3.13.0-rc6+ #181
> Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
>  ffffffff81d3edf0 ffff8802174898f8 ffffffff81399cac 0000000000004549
>  ffffffff81d3edf0 ffff880217489948 ffffffff81397dc5 ffff880217489928
>  ffff88023198ea50 0000000000000006 ffff88023198f1d8 0000000000000006
> Call Trace:
>  [<ffffffff81399cac>] dump_stack+0x4e/0x68
>  [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209
>  [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
>  [<ffffffff8107565d>] ? debug_check_no_locks_freed+0x12c/0x143
>  [<ffffffff81075027>] lock_acquire+0xfb/0x144
>  [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
>  [<ffffffff81072201>] ? lockdep_init_map+0x14e/0x160
>  [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
>  [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
>  [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
>  [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
>  [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
>  [<ffffffff811d5d11>] kobject_del+0x18/0x42
>  [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
>  [<ffffffff811d5ede>] kobject_put+0x45/0x49
>  [<ffffffff812a6e85>] cpufreq_policy_put_kobj+0x37/0x83
>  [<ffffffff812a8bab>] __cpufreq_add_dev.isra.18+0x75e/0x78c
>  [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54
>  [<ffffffff812a8c39>] cpufreq_cpu_callback+0x53/0x88
>  [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
>  [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
>  [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
>  [<ffffffff81039ea1>] cpu_notify+0xe/0x10
>  [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
>  [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
>  [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
>  [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
>  [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
>  [<ffffffff811414c8>] ? fcheck_files+0xa1/0xe4
>  [<ffffffff81141a67>] ? fget_light+0x33/0x9a
>  [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
>  [<ffffffff811df4ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
> CPU1 is up
> ACPI: Waking up from system sleep state S4
> PM: noirq thaw of devices complete after 2.727 msecs
> PM: early thaw of devices complete after 1.137 msecs
> 
> 
> 
> This warning appeared when I tried cancelling hibernation, which is
> known to fail when using the acpi-cpufreq driver.  The point of the test
> was to verify that such failures still produce the expected result:
> 
>  - all stale sysfs files are cleaned up and removed
>  - later suspend/resume actions will restore (or actually reinitiate)
>    the cpufreq driver for the non-boot CPUs
> 
> Which was successful, except that it produced the above lockdep warning.
> 
> I have not verified that this is a new warning (which means that it most
> likely is not).  And I expect the whole acpi-cpufreq problem, including
> that warning, to go away when you eventually push
> http://www.spinics.net/lists/cpufreq/msg08714.html with your improved
> changelog (thanks for doing that BTW).  So I don't worry too much about
> it.  Just wanted to let you know.

OK, thanks!  Well, this is somewhat worrisome.

Could you also check the linux-pm.git/fixes branch that contains all patches
I'm planning to push for 3.13-rc7 shortly?

Rafael


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2014-01-03  0:40     ` Rafael J. Wysocki
@ 2014-01-03  9:24       ` Bjørn Mork
  2014-01-03  9:53         ` Bjørn Mork
  0 siblings, 1 reply; 21+ messages in thread
From: Bjørn Mork @ 2014-01-03  9:24 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: cpufreq, Viresh Kumar, linux-pm, linux-kernel

"Rafael J. Wysocki" <rjw@rjwysocki.net> writes:

> OK, thanks!  Well, this is somewhat worrisome.
>
> Could you also check the linux-pm.git/fixes branch that contains all patches
> I'm planning to push for 3.13-rc7 shortly?

It's definitely still there.  But I'm less sure about the exact trigger.

I have now booted and suspend that branch a few times trying to figure
out which actions it depends on. It seems to depend on a modified
scaling_max_freq (or any other attribute, I guess - haven't testing
anything else) in combination with suspend.  But the order doesn't
necessarily matter.  I can sometimes set off the warning by writing to
sysfs after resuming, and sometimes at suspend time if the value is
already modified.  The results are not consistent though.  Sometimes
there is no warning at all.

But what's 100% sure is that I still can trigger this with the
linux-pm.git/fixes branch.


For example, triggered by

  echo 800000 >/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

after resume (I believe the taint is due to a _request_firmware failure
for one of the bluetooth devices):

 
 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.13.0-rc6+ #183 Tainted: G        W   
 -------------------------------------------------------
 bash/4897 is trying to acquire lock:
  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
 
 but task is already holding lock:
  (s_active#273){++++.+}, at: [<ffffffff8118a93f>] sysfs_write_file+0xd6/0x18b
 
 which lock already depends on the new lock.
 
 
 the existing dependency chain (in reverse order) is:
 
 -> #1 (s_active#273){++++.+}:
        [<ffffffff81075027>] lock_acquire+0xfb/0x144
        [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
        [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
        [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
        [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
        [<ffffffff811d5d11>] kobject_del+0x18/0x42
        [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
        [<ffffffff811d5ede>] kobject_put+0x45/0x49
        [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83
        [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c
        [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88
        [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
        [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
        [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
        [<ffffffff81039ea1>] cpu_notify+0xe/0x10
        [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
        [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
        [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
        [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
        [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
        [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
 
 -> #0 (cpu_hotplug.lock){+.+.+.}:
        [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
        [<ffffffff81075027>] lock_acquire+0xfb/0x144
        [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
        [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
        [<ffffffff812a69c4>] store+0x20/0xad
        [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
        [<ffffffff8112a428>] vfs_write+0x9c/0x102
        [<ffffffff8112a716>] SyS_write+0x50/0x85
        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
 
 other info that might help us debug this:
 
  Possible unsafe locking scenario:
 
        CPU0                    CPU1
        ----                    ----
   lock(s_active#273);
                                lock(cpu_hotplug.lock);
                                lock(s_active#273);
   lock(cpu_hotplug.lock);
 
  *** DEADLOCK ***
 
 3 locks held by bash/4897:
  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811297c7>] file_start_write+0x27/0x29
  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff8118a937>] sysfs_write_file+0xce/0x18b
  #2:  (s_active#273){++++.+}, at: [<ffffffff8118a93f>] sysfs_write_file+0xd6/0x18b
 
 stack backtrace:
 CPU: 1 PID: 4897 Comm: bash Tainted: G        W    3.13.0-rc6+ #183
 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
  ffffffff81ca83b0 ffff88021fa65c08 ffffffff81399cac 0000000000000006
  ffffffff81ca83b0 ffff88021fa65c58 ffffffff81397dc5 0000000000000002
  ffff8800b1af2810 0000000000000003 ffff8800b1af2ef0 0000000000000003
 Call Trace:
  [<ffffffff81399cac>] dump_stack+0x4e/0x68
  [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209
  [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
  [<ffffffff81075027>] lock_acquire+0xfb/0x144
  [<ffffffff81039f4a>] ? get_online_cpus+0x3c/0x50
  [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
  [<ffffffff81039f4a>] ? get_online_cpus+0x3c/0x50
  [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54
  [<ffffffff81039f4a>] ? get_online_cpus+0x3c/0x50
  [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
  [<ffffffff812a69c4>] store+0x20/0xad
  [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
  [<ffffffff8112a428>] vfs_write+0x9c/0x102
  [<ffffffff8112a716>] SyS_write+0x50/0x85
  [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b


Or by s2disk (not the same boot - this time without the taint):

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.13.0-rc6+ #183 Not tainted
 -------------------------------------------------------
 s2disk/5123 is trying to acquire lock:
  (s_active#171){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
 
 but task is already holding lock:
  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
 
 which lock already depends on the new lock.
 
 
 the existing dependency chain (in reverse order) is:
 
 -> #1 (cpu_hotplug.lock){+.+.+.}:
        [<ffffffff81075027>] lock_acquire+0xfb/0x144
        [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
        [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
        [<ffffffff812a69c4>] store+0x20/0xad
        [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
        [<ffffffff8112a428>] vfs_write+0x9c/0x102
        [<ffffffff8112a716>] SyS_write+0x50/0x85
        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
 
 -> #0 (s_active#171){++++.+}:
        [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
        [<ffffffff81075027>] lock_acquire+0xfb/0x144
        [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
        [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
        [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
        [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
        [<ffffffff811d5d11>] kobject_del+0x18/0x42
        [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
        [<ffffffff811d5ede>] kobject_put+0x45/0x49
        [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83
        [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c
        [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88
        [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
        [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
        [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
        [<ffffffff81039ea1>] cpu_notify+0xe/0x10
        [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
        [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
        [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
        [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
        [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
        [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
 
 other info that might help us debug this:
 
  Possible unsafe locking scenario:
 
        CPU0                    CPU1
        ----                    ----
   lock(cpu_hotplug.lock);
                                lock(s_active#171);
                                lock(cpu_hotplug.lock);
   lock(s_active#171);
 
  *** DEADLOCK ***
 
 7 locks held by s2disk/5123:
  #0:  (pm_mutex){+.+.+.}, at: [<ffffffff8107e9ea>] snapshot_ioctl+0x4b/0x431
  #1:  (device_hotplug_lock){+.+.+.}, at: [<ffffffff812837ac>] lock_device_hotplug+0x12/0x14
  #2:  (acpi_scan_lock){+.+.+.}, at: [<ffffffff8122c6d3>] acpi_scan_lock_acquire+0x12/0x14
  #3:  (console_lock){+.+.+.}, at: [<ffffffff810817f2>] suspend_console+0x20/0x38
  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81039fb9>] cpu_maps_update_begin+0x12/0x14
  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
  #6:  (cpufreq_rwsem){.+.+.+}, at: [<ffffffff812a851c>] __cpufreq_add_dev.isra.18+0x7f/0x78c
 
 stack backtrace:
 CPU: 0 PID: 5123 Comm: s2disk Not tainted 3.13.0-rc6+ #183
 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
  ffffffff81d3de70 ffff8800b10718f8 ffffffff81399cac 0000000000003a93
  ffffffff81d3de70 ffff8800b1071948 ffffffff81397dc5 ffff8800b1071928
  ffff8800b10c8bd0 0000000000000006 ffff8800b10c9358 0000000000000006
 Call Trace:
  [<ffffffff81399cac>] dump_stack+0x4e/0x68
  [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209
  [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
  [<ffffffff8107565d>] ? debug_check_no_locks_freed+0x12c/0x143
  [<ffffffff81075027>] lock_acquire+0xfb/0x144
  [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
  [<ffffffff81072201>] ? lockdep_init_map+0x14e/0x160
  [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
  [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
  [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
  [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
  [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
  [<ffffffff811d5d11>] kobject_del+0x18/0x42
  [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
  [<ffffffff811d5ede>] kobject_put+0x45/0x49
  [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83
  [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c
  [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54
  [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88
  [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
  [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
  [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
  [<ffffffff81039ea1>] cpu_notify+0xe/0x10
  [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
  [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
  [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
  [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
  [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
  [<ffffffff811414c8>] ? fcheck_files+0xa1/0xe4
  [<ffffffff81141a67>] ? fget_light+0x33/0x9a
  [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
  [<ffffffff811df4ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
 CPU1 is up
 ACPI: Waking up from system sleep state S4
 PM: noirq thaw of devices complete after 1.722 msecs
 PM: early thaw of devices complete after 1.180 msecs



Bjørn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2014-01-03  9:24       ` Bjørn Mork
@ 2014-01-03  9:53         ` Bjørn Mork
  2014-01-03 11:19           ` Viresh Kumar
  0 siblings, 1 reply; 21+ messages in thread
From: Bjørn Mork @ 2014-01-03  9:53 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: cpufreq, Viresh Kumar, linux-pm, linux-kernel

Bjørn Mork <bjorn@mork.no> writes:
> "Rafael J. Wysocki" <rjw@rjwysocki.net> writes:
>
>> OK, thanks!  Well, this is somewhat worrisome.
>>
>> Could you also check the linux-pm.git/fixes branch that contains all patches
>> I'm planning to push for 3.13-rc7 shortly?
>
> It's definitely still there.  But I'm less sure about the exact trigger.

I did a couple of more tests after clean reboots, and 100% sure ways to
trigger the circular locking on my laptop are

  echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
  s2disk

or

  echo 1401000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 
  s2disk

or

  echo 1401000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq 
  s2disk


Note that "ondemand" and "1401000" are the default vaules, so I don't
actually change anything here.  The write is causing the problem, not
the value.  As expected, I guess.

Also note that boot vs non-boot cpu doesn't seem to matter.  Nor does
cancelling the hibernation.  The warning appears on hibernate - not on
resume.

Hope this helps.


Bjørn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2014-01-03  9:53         ` Bjørn Mork
@ 2014-01-03 11:19           ` Viresh Kumar
  2014-01-03 11:55             ` Bjørn Mork
  0 siblings, 1 reply; 21+ messages in thread
From: Viresh Kumar @ 2014-01-03 11:19 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org,
	linux-pm@vger.kernel.org, Linux Kernel Mailing List

On 3 January 2014 15:23, Bjørn Mork <bjorn@mork.no> wrote:
> Note that "ondemand" and "1401000" are the default vaules, so I don't
> actually change anything here.  The write is causing the problem, not
> the value.  As expected, I guess.
>
> Also note that boot vs non-boot cpu doesn't seem to matter.  Nor does
> cancelling the hibernation.  The warning appears on hibernate - not on
> resume.

Hmm... I spent quite some time understanding whats going on and really
couldn't get across anything as of now. I haven't tried reproducing it though.

Few things that I can make out of this mail chain so far:
- Apart from the log, everything is working fine. i.e. system is back in
working condition.
- It only happens when cpufreq_add_dev() fails during hibernation while
we enable non-boot CPUs again to save image to disk. So, isn't a problem
for a system which doesn't have any issues with add_dev() failing on
hibernation
- There is a contention of locks in the order they are taken. And the contention
looks to be between, hotplug lock taken by cpu_online_cpus() and s_active
lock for sysfs files. Don't know what's the role of previous write to
sysfs files.
As that should finish before hibernation starts and so all locks should be back
in place.

--
viresh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2014-01-03 11:19           ` Viresh Kumar
@ 2014-01-03 11:55             ` Bjørn Mork
  2014-01-06  6:27               ` Viresh Kumar
  0 siblings, 1 reply; 21+ messages in thread
From: Bjørn Mork @ 2014-01-03 11:55 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org,
	linux-pm@vger.kernel.org, Linux Kernel Mailing List

Viresh Kumar <viresh.kumar@linaro.org> writes:

> On 3 January 2014 15:23, Bjørn Mork <bjorn@mork.no> wrote:
>> Note that "ondemand" and "1401000" are the default vaules, so I don't
>> actually change anything here.  The write is causing the problem, not
>> the value.  As expected, I guess.
>>
>> Also note that boot vs non-boot cpu doesn't seem to matter.  Nor does
>> cancelling the hibernation.  The warning appears on hibernate - not on
>> resume.
>
> Hmm... I spent quite some time understanding whats going on and really
> couldn't get across anything as of now. I haven't tried reproducing it though.
>
> Few things that I can make out of this mail chain so far:
> - Apart from the log, everything is working fine. i.e. system is back in
> working condition.

Correct. And users not running a lock debugging kernel will of course
not even see the warning.

> - It only happens when cpufreq_add_dev() fails during hibernation while
> we enable non-boot CPUs again to save image to disk. So, isn't a problem
> for a system which doesn't have any issues with add_dev() failing on
> hibernation

Wrong.  This was my initial assumption but I later found out that the
issue is unrelated to hibernation failures.  Sorry about the confusion.

> - There is a contention of locks in the order they are taken. And the contention
> looks to be between, hotplug lock taken by cpu_online_cpus() and s_active
> lock for sysfs files. Don't know what's the role of previous write to
> sysfs files.
> As that should finish before hibernation starts and so all locks should be back
> in place.

Yes, that seems logical.  But I guess this is where it fails?



Bjørn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2014-01-03 11:55             ` Bjørn Mork
@ 2014-01-06  6:27               ` Viresh Kumar
  2014-01-06  9:01                 ` Bjørn Mork
  0 siblings, 1 reply; 21+ messages in thread
From: Viresh Kumar @ 2014-01-06  6:27 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org,
	linux-pm@vger.kernel.org, Linux Kernel Mailing List

On 3 January 2014 17:25, Bjørn Mork <bjorn@mork.no> wrote:
> Correct. And users not running a lock debugging kernel will of course
> not even see the warning.

Okay..

>> - It only happens when cpufreq_add_dev() fails during hibernation while
>> we enable non-boot CPUs again to save image to disk. So, isn't a problem
>> for a system which doesn't have any issues with add_dev() failing on
>> hibernation
>
> Wrong.  This was my initial assumption but I later found out that the
> issue is unrelated to hibernation failures.  Sorry about the confusion.

Hmm.. Can we have the latest warning logs you have? Earlier ones were
related to hibernation..

>> - There is a contention of locks in the order they are taken. And the contention
>> looks to be between, hotplug lock taken by cpu_online_cpus() and s_active
>> lock for sysfs files. Don't know what's the role of previous write to
>> sysfs files.
>> As that should finish before hibernation starts and so all locks should be back
>> in place.
>
> Yes, that seems logical.  But I guess this is where it fails?

It looked like that.. Though your new logs might indicate something else.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2014-01-06  6:27               ` Viresh Kumar
@ 2014-01-06  9:01                 ` Bjørn Mork
  2014-01-06  9:57                   ` Viresh Kumar
  0 siblings, 1 reply; 21+ messages in thread
From: Bjørn Mork @ 2014-01-06  9:01 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org,
	linux-pm@vger.kernel.org, Linux Kernel Mailing List

Viresh Kumar <viresh.kumar@linaro.org> writes:
> On 3 January 2014 17:25, Bjørn Mork <bjorn@mork.no> wrote:
>> Correct. And users not running a lock debugging kernel will of course
>> not even see the warning.
>
> Okay..
>
>>> - It only happens when cpufreq_add_dev() fails during hibernation while
>>> we enable non-boot CPUs again to save image to disk. So, isn't a problem
>>> for a system which doesn't have any issues with add_dev() failing on
>>> hibernation
>>
>> Wrong.  This was my initial assumption but I later found out that the
>> issue is unrelated to hibernation failures.  Sorry about the confusion.
>
> Hmm.. Can we have the latest warning logs you have? Earlier ones were
> related to hibernation..

That's correct.  I have not observed this on suspend to RAM.  But then
again I haven't rigged any way to log that, so I don't think it's
conclusive..

The point I tried to make is that it isn't related to any hibernation
*failures*.  The warning appears even if the add_dev() is successful,
and it also appears if I touch only the *boot* cpu cpufreq attributes.

I.e., this seems to be unrelated to the hotplugging code.

Here's another copy of the warning, captured by cancelling hibernation
as I don't have any other way to log it at the moment.  But I do see the
warning appear on the console *before* cancelling.  And I also see this
warning appear when trying the in-kernel hibernation instead of
userspace.

[  267.893084] ======================================================
[  267.893085] [ INFO: possible circular locking dependency detected ]
[  267.893087] 3.13.0-rc6+ #183 Tainted: G        W   
[  267.893089] -------------------------------------------------------
[  267.893090] s2disk/5450 is trying to acquire lock:
[  267.893101]  (s_active#164){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
[  267.893102] 
[  267.893102] but task is already holding lock:
[  267.893111]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
[  267.893112] 
[  267.893112] which lock already depends on the new lock.
[  267.893112] 
[  267.893113] 
[  267.893113] the existing dependency chain (in reverse order) is:
[  267.893117] 
[  267.893117] -> #1 (cpu_hotplug.lock){+.+.+.}:
[  267.893123]        [<ffffffff81075027>] lock_acquire+0xfb/0x144
[  267.893128]        [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
[  267.893132]        [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
[  267.893137]        [<ffffffff812a69c4>] store+0x20/0xad
[  267.893142]        [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
[  267.893147]        [<ffffffff8112a428>] vfs_write+0x9c/0x102
[  267.893151]        [<ffffffff8112a716>] SyS_write+0x50/0x85
[  267.893155]        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
[  267.893160] 
[  267.893160] -> #0 (s_active#164){++++.+}:
[  267.893164]        [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
[  267.893168]        [<ffffffff81075027>] lock_acquire+0xfb/0x144
[  267.893172]        [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
[  267.893175]        [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
[  267.893178]        [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
[  267.893182]        [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
[  267.893186]        [<ffffffff811d5d11>] kobject_del+0x18/0x42
[  267.893190]        [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
[  267.893193]        [<ffffffff811d5ede>] kobject_put+0x45/0x49
[  267.893197]        [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83
[  267.893201]        [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c
[  267.893204]        [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88
[  267.893208]        [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
[  267.893213]        [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
[  267.893217]        [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
[  267.893221]        [<ffffffff81039ea1>] cpu_notify+0xe/0x10
[  267.893225]        [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
[  267.893230]        [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
[  267.893234]        [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
[  267.893238]        [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
[  267.893242]        [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
[  267.893245]        [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
[  267.893249]        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
[  267.893250] 
[  267.893250] other info that might help us debug this:
[  267.893250] 
[  267.893251]  Possible unsafe locking scenario:
[  267.893251] 
[  267.893252]        CPU0                    CPU1
[  267.893253]        ----                    ----
[  267.893256]   lock(cpu_hotplug.lock);
[  267.893259]                                lock(s_active#164);
[  267.893261]                                lock(cpu_hotplug.lock);
[  267.893265]   lock(s_active#164);
[  267.893266] 
[  267.893266]  *** DEADLOCK ***
[  267.893266] 
[  267.893268] 7 locks held by s2disk/5450:
[  267.893275]  #0:  (pm_mutex){+.+.+.}, at: [<ffffffff8107e9ea>] snapshot_ioctl+0x4b/0x431
[  267.893283]  #1:  (device_hotplug_lock){+.+.+.}, at: [<ffffffff812837ac>] lock_device_hotplug+0x12/0x14
[  267.893290]  #2:  (acpi_scan_lock){+.+.+.}, at: [<ffffffff8122c6d3>] acpi_scan_lock_acquire+0x12/0x14
[  267.893297]  #3:  (console_lock){+.+.+.}, at: [<ffffffff810817f2>] suspend_console+0x20/0x38
[  267.893305]  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81039fb9>] cpu_maps_update_begin+0x12/0x14
[  267.893311]  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
[  267.893318]  #6:  (cpufreq_rwsem){.+.+.+}, at: [<ffffffff812a851c>] __cpufreq_add_dev.isra.18+0x7f/0x78c
[  267.893319] 
[  267.893319] stack backtrace:
[  267.893322] CPU: 0 PID: 5450 Comm: s2disk Tainted: G        W    3.13.0-rc6+ #183
[  267.893324] Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
[  267.893329]  ffffffff81d3cd00 ffff8800b7acd8f8 ffffffff81399cac 00000000000068f7
[  267.893334]  ffffffff81d3cd00 ffff8800b7acd948 ffffffff81397dc5 ffff8800b7acd928
[  267.893338]  ffff8800b7b24990 0000000000000006 ffff8800b7b25118 0000000000000006
[  267.893339] Call Trace:
[  267.893344]  [<ffffffff81399cac>] dump_stack+0x4e/0x68
[  267.893349]  [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209
[  267.893353]  [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
[  267.893357]  [<ffffffff8107565d>] ? debug_check_no_locks_freed+0x12c/0x143
[  267.893361]  [<ffffffff81075027>] lock_acquire+0xfb/0x144
[  267.893365]  [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
[  267.893369]  [<ffffffff81072201>] ? lockdep_init_map+0x14e/0x160
[  267.893372]  [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
[  267.893376]  [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
[  267.893380]  [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
[  267.893383]  [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
[  267.893386]  [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
[  267.893390]  [<ffffffff811d5d11>] kobject_del+0x18/0x42
[  267.893393]  [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
[  267.893396]  [<ffffffff811d5ede>] kobject_put+0x45/0x49
[  267.893400]  [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83
[  267.893404]  [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c
[  267.893407]  [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54
[  267.893413]  [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88
[  267.893416]  [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
[  267.893421]  [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
[  267.893425]  [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
[  267.893428]  [<ffffffff81039ea1>] cpu_notify+0xe/0x10
[  267.893432]  [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
[  267.893436]  [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
[  267.893439]  [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
[  267.893443]  [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
[  267.893447]  [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
[  267.893451]  [<ffffffff811414c8>] ? fcheck_files+0xa1/0xe4
[  267.893454]  [<ffffffff81141a67>] ? fget_light+0x33/0x9a
[  267.893457]  [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
[  267.893462]  [<ffffffff811df4ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  267.893466]  [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b



I don't think I do anything extra-ordinary to trigger this, so I would
be surprised if you can't reproduce it by doing 

  export x=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor`
  echo $x >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
  s2disk

or similar on a kernel with CONFIG_PROVE_LOCKING=y


Bjørn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2014-01-06  9:01                 ` Bjørn Mork
@ 2014-01-06  9:57                   ` Viresh Kumar
  2014-01-06 10:49                     ` Bjørn Mork
  0 siblings, 1 reply; 21+ messages in thread
From: Viresh Kumar @ 2014-01-06  9:57 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org,
	linux-pm@vger.kernel.org, Linux Kernel Mailing List

On 6 January 2014 14:31, Bjørn Mork <bjorn@mork.no> wrote:
> That's correct.  I have not observed this on suspend to RAM.  But then
> again I haven't rigged any way to log that, so I don't think it's
> conclusive..
>
> The point I tried to make is that it isn't related to any hibernation
> *failures*.  The warning appears even if the add_dev() is successful,
> and it also appears if I touch only the *boot* cpu cpufreq attributes.

Okay.. But your log below is for add_dev() failure case.

> I.e., this seems to be unrelated to the hotplugging code.
>
> Here's another copy of the warning, captured by cancelling hibernation
> as I don't have any other way to log it at the moment.  But I do see the
> warning appear on the console *before* cancelling.  And I also see this
> warning appear when trying the in-kernel hibernation instead of
> userspace.

How do you cancel hibernation here? Sorry, what is in-kernel hibernation?

> [  267.893084] ======================================================
> [  267.893085] [ INFO: possible circular locking dependency detected ]
> [  267.893087] 3.13.0-rc6+ #183 Tainted: G        W
> [  267.893089] -------------------------------------------------------
> [  267.893090] s2disk/5450 is trying to acquire lock:
> [  267.893101]  (s_active#164){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
> [  267.893102]
> [  267.893102] but task is already holding lock:
> [  267.893111]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
> [  267.893112]
> [  267.893112] which lock already depends on the new lock.
> [  267.893112]
> [  267.893113]
> [  267.893113] the existing dependency chain (in reverse order) is:
> [  267.893117]
> [  267.893117] -> #1 (cpu_hotplug.lock){+.+.+.}:
> [  267.893123]        [<ffffffff81075027>] lock_acquire+0xfb/0x144
> [  267.893128]        [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
> [  267.893132]        [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
> [  267.893137]        [<ffffffff812a69c4>] store+0x20/0xad
> [  267.893142]        [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
> [  267.893147]        [<ffffffff8112a428>] vfs_write+0x9c/0x102
> [  267.893151]        [<ffffffff8112a716>] SyS_write+0x50/0x85
> [  267.893155]        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
> [  267.893160]
> [  267.893160] -> #0 (s_active#164){++++.+}:
> [  267.893164]        [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
> [  267.893168]        [<ffffffff81075027>] lock_acquire+0xfb/0x144
> [  267.893172]        [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
> [  267.893175]        [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
> [  267.893178]        [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
> [  267.893182]        [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
> [  267.893186]        [<ffffffff811d5d11>] kobject_del+0x18/0x42
> [  267.893190]        [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
> [  267.893193]        [<ffffffff811d5ede>] kobject_put+0x45/0x49
> [  267.893197]        [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83

All above is for case where add dev was required to clear up earlier
allocated policy as something failed during add_dev()

> [  267.893201]        [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c
> [  267.893204]        [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88
> [  267.893208]        [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
> [  267.893213]        [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
> [  267.893217]        [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
> [  267.893221]        [<ffffffff81039ea1>] cpu_notify+0xe/0x10
> [  267.893225]        [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
> [  267.893230]        [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
> [  267.893234]        [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed

And this is again the same log as what you have sent last time. It happened
while saving image after disabling all non-boot CPUs. And so it happened for
your case as your driver->init() was failing, so it may not get reproduced at
my end. Haven't tried it though.

> I don't think I do anything extra-ordinary to trigger this, so I would
> be surprised if you can't reproduce it by doing
>
>   export x=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor`
>   echo $x >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
>   s2disk

I don't see you cancelling hibernation here. Sorry got confused on how
exactly you reproduced it.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2014-01-06  9:57                   ` Viresh Kumar
@ 2014-01-06 10:49                     ` Bjørn Mork
  2014-01-06 10:54                       ` Viresh Kumar
  2014-01-06 11:14                       ` Rafael J. Wysocki
  0 siblings, 2 replies; 21+ messages in thread
From: Bjørn Mork @ 2014-01-06 10:49 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org,
	linux-pm@vger.kernel.org, Linux Kernel Mailing List

Viresh Kumar <viresh.kumar@linaro.org> writes:

> On 6 January 2014 14:31, Bjørn Mork <bjorn@mork.no> wrote:
>> That's correct.  I have not observed this on suspend to RAM.  But then
>> again I haven't rigged any way to log that, so I don't think it's
>> conclusive..
>>
>> The point I tried to make is that it isn't related to any hibernation
>> *failures*.  The warning appears even if the add_dev() is successful,
>> and it also appears if I touch only the *boot* cpu cpufreq attributes.
>
> Okay.. But your log below is for add_dev() failure case.

Any suggestion on how to capure warnings issued between freeze and thaw
is appreciated :-)

Will a netconsole work?  I can try that later.

>> I.e., this seems to be unrelated to the hotplugging code.
>>
>> Here's another copy of the warning, captured by cancelling hibernation
>> as I don't have any other way to log it at the moment.  But I do see the
>> warning appear on the console *before* cancelling.  And I also see this
>> warning appear when trying the in-kernel hibernation instead of
>> userspace.
>
> How do you cancel hibernation here? Sorry, what is in-kernel hibernation?

I don't know the proper terms here.  I normally use s2disk from uswsusp
to hibernate.  By "in-kernel hibernation" I meant the process initiated
by doing

 echo disk >/sys/power/state



>> [  267.893084] ======================================================
>> [  267.893085] [ INFO: possible circular locking dependency detected ]
>> [  267.893087] 3.13.0-rc6+ #183 Tainted: G        W
>> [  267.893089] -------------------------------------------------------
>> [  267.893090] s2disk/5450 is trying to acquire lock:
>> [  267.893101]  (s_active#164){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
>> [  267.893102]
>> [  267.893102] but task is already holding lock:
>> [  267.893111]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
>> [  267.893112]
>> [  267.893112] which lock already depends on the new lock.
>> [  267.893112]
>> [  267.893113]
>> [  267.893113] the existing dependency chain (in reverse order) is:
>> [  267.893117]
>> [  267.893117] -> #1 (cpu_hotplug.lock){+.+.+.}:
>> [  267.893123]        [<ffffffff81075027>] lock_acquire+0xfb/0x144
>> [  267.893128]        [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
>> [  267.893132]        [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
>> [  267.893137]        [<ffffffff812a69c4>] store+0x20/0xad
>> [  267.893142]        [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
>> [  267.893147]        [<ffffffff8112a428>] vfs_write+0x9c/0x102
>> [  267.893151]        [<ffffffff8112a716>] SyS_write+0x50/0x85
>> [  267.893155]        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
>> [  267.893160]
>> [  267.893160] -> #0 (s_active#164){++++.+}:
>> [  267.893164]        [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
>> [  267.893168]        [<ffffffff81075027>] lock_acquire+0xfb/0x144
>> [  267.893172]        [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
>> [  267.893175]        [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
>> [  267.893178]        [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
>> [  267.893182]        [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
>> [  267.893186]        [<ffffffff811d5d11>] kobject_del+0x18/0x42
>> [  267.893190]        [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
>> [  267.893193]        [<ffffffff811d5ede>] kobject_put+0x45/0x49
>> [  267.893197]        [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83
>
> All above is for case where add dev was required to clear up earlier
> allocated policy as something failed during add_dev()

I see.  I'll try capturing at least part of the warning with a camera,
when it's not failing.

>> [  267.893201]        [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c
>> [  267.893204]        [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88
>> [  267.893208]        [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
>> [  267.893213]        [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
>> [  267.893217]        [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
>> [  267.893221]        [<ffffffff81039ea1>] cpu_notify+0xe/0x10
>> [  267.893225]        [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
>> [  267.893230]        [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
>> [  267.893234]        [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
>
> And this is again the same log as what you have sent last time. It happened
> while saving image after disabling all non-boot CPUs. And so it happened for
> your case as your driver->init() was failing, so it may not get reproduced at
> my end. Haven't tried it though.
>
>> I don't think I do anything extra-ordinary to trigger this, so I would
>> be surprised if you can't reproduce it by doing
>>
>>   export x=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor`
>>   echo $x >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
>>   s2disk
>
> I don't see you cancelling hibernation here. Sorry got confused on how
> exactly you reproduced it.

I cancelled by pressing backspace while the image was being written.


Bjørn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2014-01-06 10:49                     ` Bjørn Mork
@ 2014-01-06 10:54                       ` Viresh Kumar
  2014-01-06 11:33                         ` Rafael J. Wysocki
       [not found]                         ` <8738l15pht.fsf@nemi.mork.no>
  2014-01-06 11:14                       ` Rafael J. Wysocki
  1 sibling, 2 replies; 21+ messages in thread
From: Viresh Kumar @ 2014-01-06 10:54 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org,
	linux-pm@vger.kernel.org, Linux Kernel Mailing List

On 6 January 2014 16:19, Bjørn Mork <bjorn@mork.no> wrote:
> I cancelled by pressing backspace while the image was being written.

I will answer other queries on next mail, but a quick question:
Do you get these warnings even if you don't cancel hibernation?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2014-01-06 10:49                     ` Bjørn Mork
  2014-01-06 10:54                       ` Viresh Kumar
@ 2014-01-06 11:14                       ` Rafael J. Wysocki
  1 sibling, 0 replies; 21+ messages in thread
From: Rafael J. Wysocki @ 2014-01-06 11:14 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: Viresh Kumar, cpufreq@vger.kernel.org, linux-pm@vger.kernel.org,
	Linux Kernel Mailing List

On Monday, January 06, 2014 11:49:00 AM Bjørn Mork wrote:
> Viresh Kumar <viresh.kumar@linaro.org> writes:
> 
> > On 6 January 2014 14:31, Bjørn Mork <bjorn@mork.no> wrote:
> >> That's correct.  I have not observed this on suspend to RAM.  But then
> >> again I haven't rigged any way to log that, so I don't think it's
> >> conclusive..
> >>
> >> The point I tried to make is that it isn't related to any hibernation
> >> *failures*.  The warning appears even if the add_dev() is successful,
> >> and it also appears if I touch only the *boot* cpu cpufreq attributes.
> >
> > Okay.. But your log below is for add_dev() failure case.
> 
> Any suggestion on how to capure warnings issued between freeze and thaw
> is appreciated :-)

You can try to add no_console_suspend to the kernel's command line.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
  2014-01-06 10:54                       ` Viresh Kumar
@ 2014-01-06 11:33                         ` Rafael J. Wysocki
       [not found]                         ` <8738l15pht.fsf@nemi.mork.no>
  1 sibling, 0 replies; 21+ messages in thread
From: Rafael J. Wysocki @ 2014-01-06 11:33 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Bjørn Mork, cpufreq@vger.kernel.org,
	linux-pm@vger.kernel.org, Linux Kernel Mailing List

On Monday, January 06, 2014 04:24:09 PM Viresh Kumar wrote:
> On 6 January 2014 16:19, Bjørn Mork <bjorn@mork.no> wrote:
> > I cancelled by pressing backspace while the image was being written.

Basically, stuff is still frozen at this point, but devices are fully
functional.  It's kind of like returning an error code from swsusp_write()
in hibernate() (kernel/power/hibernate.c).

> I will answer other queries on next mail, but a quick question:
> Do you get these warnings even if you don't cancel hibernation?

Good question. :-)

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
       [not found]                         ` <8738l15pht.fsf@nemi.mork.no>
@ 2014-01-08  5:51                           ` Viresh Kumar
  0 siblings, 0 replies; 21+ messages in thread
From: Viresh Kumar @ 2014-01-08  5:51 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org,
	linux-pm@vger.kernel.org, Linux Kernel Mailing List

On 6 January 2014 16:47, Bjørn Mork <bjorn@mork.no> wrote:
> Viresh Kumar <viresh.kumar@linaro.org> writes:
>
>> On 6 January 2014 16:19, Bjørn Mork <bjorn@mork.no> wrote:
>>> I cancelled by pressing backspace while the image was being written.
>>
>> I will answer other queries on next mail, but a quick question:
>> Do you get these warnings even if you don't cancel hibernation?
>
> Yes.

Hmm.. I see. Now, I really don't believe that you wouldn't get these
warnings on a suspend/resume but only on hibernation.

> I believe the photo shows that you are right: This warning happens when
> the failure path is entered, even if both the hibernation and following
> resume is successful.  We see __cpufreq_add_dev calling
> cpufreq_policy_put_kobj here as well.

@Rafael: I need your expert opinion here, I am not so good with these
concepts :)

Here is my theory about these warnings:
- Bjorn gets these only when CONFIG_PROVE_LOCKING=y.
- This feature enables the kernel to report locking related deadlocks
before they actually occur. (lib/Kconfig.debug)
- So, no locking issue has happened here but its just an warning that
you might end up getting into such issue later.
- Contention is between sysfs and cpu-hotplug lock.
- While changing stuff from sysfs we do this:
   sysfs-lock (by kernel)
   cpu-hotplug lock (by cpufreq)
- And while doing hibernation if add_dev fails then we do this
   cpu-hotplug lock (by kernel while adding CPUs)
   sysfs-lock (by cpufreq while removing sysfs directories)

And kernel is recording all these sequences all the time. Because
we first tried changing values from sysfs, kernel recorded that
sequence. And as soon as it found another one it WARNed us?

While on a suspend/resume, we will get into add-dev failure only
in resume and probably by that time these recordings are refreshed?
And that's why Bjorn isn't facing any issue there ? Just an wild guess.

And if all above "virtual" theory is correct, then I don't know if we
can/want to fix that..

> Thinking about this: This failure will always happen, regardless of
> cancelling or not, won't it?

Yeah.

> But it happens *after* the freeze so it
> doesn't matter if we resume from the frozen image.

yeah..

So, even tests like this should give you this warning:

Documentation/power/basic-pm-debugging.txt
core
- test the freezing of processes, suspending of devices, platform global
  control methods(*), the disabling of nonboot CPUs and suspending of
  platform/system devices

> So you are probably right as usual:

Really? Ask Rafael, he doesn't agree :)

> This warning is related to the
> acpi-cpufreq driver failing, and you cannot reproduce it without
> simulating such a failure (or using that driver yourself).  Which means
> that the warning disappears with my patch in
> http://www.spinics.net/lists/cpufreq/msg08714.html
> which explains why I don't normally see it.  I depend on being able to
> cancel hibernation, so I use that patch most of the time. Except when
> I'm testing one of your branches.

Correct.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-01-08  5:51 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-24  1:41 [PATCH 1/2] cpufreq: try to resume policies which failed on last resume Viresh Kumar
2013-12-24  1:41 ` [PATCH 2/2] cpufreq: preserve user_policy across suspend/resume Viresh Kumar
2013-12-26  1:05 ` [PATCH 1/2] cpufreq: try to resume policies which failed on last resume Rafael J. Wysocki
2013-12-26  2:47   ` Viresh Kumar
2013-12-27  9:57     ` Viresh Kumar
2013-12-27  9:58       ` Viresh Kumar
2013-12-30 16:40         ` Bjørn Mork
2014-01-02 12:15   ` Bjørn Mork
2014-01-03  0:40     ` Rafael J. Wysocki
2014-01-03  9:24       ` Bjørn Mork
2014-01-03  9:53         ` Bjørn Mork
2014-01-03 11:19           ` Viresh Kumar
2014-01-03 11:55             ` Bjørn Mork
2014-01-06  6:27               ` Viresh Kumar
2014-01-06  9:01                 ` Bjørn Mork
2014-01-06  9:57                   ` Viresh Kumar
2014-01-06 10:49                     ` Bjørn Mork
2014-01-06 10:54                       ` Viresh Kumar
2014-01-06 11:33                         ` Rafael J. Wysocki
     [not found]                         ` <8738l15pht.fsf@nemi.mork.no>
2014-01-08  5:51                           ` Viresh Kumar
2014-01-06 11:14                       ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).