public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Bjørn Mork" <bjorn@mork.no>
To: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>,
	linaro-kernel@lists.linaro.org, patches@linaro.org,
	cpufreq@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume
Date: Thu, 02 Jan 2014 13:15:10 +0100	[thread overview]
Message-ID: <87zjne7f75.fsf@nemi.mork.no> (raw)
In-Reply-To: <5562479.pVWRuDL0y6@vostro.rjw.lan> (Rafael J. Wysocki's message of "Thu, 26 Dec 2013 02:05:50 +0100")

"Rafael J. Wysocki" <rjw@rjwysocki.net> writes:

> On Tuesday, December 24, 2013 07:11:00 AM Viresh Kumar wrote:
>> __cpufreq_add_dev() can fail sometimes while we are resuming our system.
>> Currently we are clearing all sysfs nodes for cpufreq's failed policy as that
>> could make userspace unstable. But if we suspend/resume again, we should atleast
>> try to bring back those policies.
>> 
>> This patch fixes this issue by clearing fallback data on failure and trying to
>> allocate a new struct cpufreq_policy on second resume.
>> 
>> Reported-and-tested-by: Bjørn Mork <bjorn@mork.no>
>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>
> Well, while I appreciate the work done here, I don't like the changelog,
> I don't really like the way the code is written and I don't like the comments.
> Sorry about that.
>
> Bjorn, can you please test the patch below instead along with the [2/2]
> from this series (on top of linux-pm.git/pm-cpufreq)?

I tested this series with your modified 1/2 today, but on top of a
current mainline (commit 9a0bb2966ef) instead of linux-pm.git/pm-cpufreq.
Shouldn't make much difference since Linus already has pulled your
'pm+acpi-3.13-rc6' tag, which included that pm-cpufreq branch.

This series fixes the reported bug for me.


But I observed this, most likely unrelated, splat during the test:

ACPI Exception: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20131115/evregion-282)
ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC_.EC__.LPMD] (Node ffff880232499c18), AE_BAD_PARAMETER (20131115/psparse-536)
ACPI Error: Method parse/execution failed [\_PR_.CPU0._PPC] (Node ffff8802326f93d0), AE_BAD_PARAMETER (20131115/psparse-536)
ACPI Error: Method parse/execution failed [\_PR_.CPU1._PPC] (Node ffff8802326f9268), AE_BAD_PARAMETER (20131115/psparse-536)
ACPI Exception: AE_BAD_PARAMETER, Evaluating _PPC (20131115/processor_perflib-140)

======================================================
[ INFO: possible circular locking dependency detected ]
3.13.0-rc6+ #181 Not tainted
-------------------------------------------------------
s2disk/5651 is trying to acquire lock:
 (s_active#170){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46

but task is already holding lock:
 (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (cpu_hotplug.lock){+.+.+.}:
       [<ffffffff81075027>] lock_acquire+0xfb/0x144
       [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
       [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
       [<ffffffff812a6974>] store+0x20/0xad
       [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
       [<ffffffff8112a428>] vfs_write+0x9c/0x102
       [<ffffffff8112a716>] SyS_write+0x50/0x85
       [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b

-> #0 (s_active#170){++++.+}:
       [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
       [<ffffffff81075027>] lock_acquire+0xfb/0x144
       [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
       [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
       [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
       [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
       [<ffffffff811d5d11>] kobject_del+0x18/0x42
       [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
       [<ffffffff811d5ede>] kobject_put+0x45/0x49
       [<ffffffff812a6e85>] cpufreq_policy_put_kobj+0x37/0x83
       [<ffffffff812a8bab>] __cpufreq_add_dev.isra.18+0x75e/0x78c
       [<ffffffff812a8c39>] cpufreq_cpu_callback+0x53/0x88
       [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
       [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
       [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
       [<ffffffff81039ea1>] cpu_notify+0xe/0x10
       [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
       [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
       [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
       [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
       [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
       [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
       [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(cpu_hotplug.lock);
                               lock(s_active#170);
                               lock(cpu_hotplug.lock);
  lock(s_active#170);

 *** DEADLOCK ***

7 locks held by s2disk/5651:
 #0:  (pm_mutex){+.+.+.}, at: [<ffffffff8107e9ea>] snapshot_ioctl+0x4b/0x431
 #1:  (device_hotplug_lock){+.+.+.}, at: [<ffffffff81283730>] lock_device_hotplug+0x12/0x14
 #2:  (acpi_scan_lock){+.+.+.}, at: [<ffffffff8122c657>] acpi_scan_lock_acquire+0x12/0x14
 #3:  (console_lock){+.+.+.}, at: [<ffffffff810817f2>] suspend_console+0x20/0x38
 #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81039fb9>] cpu_maps_update_begin+0x12/0x14
 #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
 #6:  (cpufreq_rwsem){.+.+.+}, at: [<ffffffff812a84cc>] __cpufreq_add_dev.isra.18+0x7f/0x78c

stack backtrace:
CPU: 0 PID: 5651 Comm: s2disk Not tainted 3.13.0-rc6+ #181
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
 ffffffff81d3edf0 ffff8802174898f8 ffffffff81399cac 0000000000004549
 ffffffff81d3edf0 ffff880217489948 ffffffff81397dc5 ffff880217489928
 ffff88023198ea50 0000000000000006 ffff88023198f1d8 0000000000000006
Call Trace:
 [<ffffffff81399cac>] dump_stack+0x4e/0x68
 [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209
 [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
 [<ffffffff8107565d>] ? debug_check_no_locks_freed+0x12c/0x143
 [<ffffffff81075027>] lock_acquire+0xfb/0x144
 [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
 [<ffffffff81072201>] ? lockdep_init_map+0x14e/0x160
 [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
 [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
 [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
 [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
 [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
 [<ffffffff811d5d11>] kobject_del+0x18/0x42
 [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
 [<ffffffff811d5ede>] kobject_put+0x45/0x49
 [<ffffffff812a6e85>] cpufreq_policy_put_kobj+0x37/0x83
 [<ffffffff812a8bab>] __cpufreq_add_dev.isra.18+0x75e/0x78c
 [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54
 [<ffffffff812a8c39>] cpufreq_cpu_callback+0x53/0x88
 [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
 [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
 [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
 [<ffffffff81039ea1>] cpu_notify+0xe/0x10
 [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
 [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
 [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
 [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
 [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
 [<ffffffff811414c8>] ? fcheck_files+0xa1/0xe4
 [<ffffffff81141a67>] ? fget_light+0x33/0x9a
 [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
 [<ffffffff811df4ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
CPU1 is up
ACPI: Waking up from system sleep state S4
PM: noirq thaw of devices complete after 2.727 msecs
PM: early thaw of devices complete after 1.137 msecs



This warning appeared when I tried cancelling hibernation, which is
known to fail when using the acpi-cpufreq driver.  The point of the test
was to verify that such failures still produce the expected result:

 - all stale sysfs files are cleaned up and removed
 - later suspend/resume actions will restore (or actually reinitiate)
   the cpufreq driver for the non-boot CPUs

Which was successful, except that it produced the above lockdep warning.

I have not verified that this is a new warning (which means that it most
likely is not).  And I expect the whole acpi-cpufreq problem, including
that warning, to go away when you eventually push
http://www.spinics.net/lists/cpufreq/msg08714.html with your improved
changelog (thanks for doing that BTW).  So I don't worry too much about
it.  Just wanted to let you know.



Bjørn

  parent reply	other threads:[~2014-01-02 12:32 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-24  1:41 [PATCH 1/2] cpufreq: try to resume policies which failed on last resume Viresh Kumar
2013-12-24  1:41 ` [PATCH 2/2] cpufreq: preserve user_policy across suspend/resume Viresh Kumar
2013-12-26  1:05 ` [PATCH 1/2] cpufreq: try to resume policies which failed on last resume Rafael J. Wysocki
2013-12-26  2:47   ` Viresh Kumar
2013-12-27  9:57     ` Viresh Kumar
2013-12-27  9:58       ` Viresh Kumar
2013-12-30 16:40         ` Bjørn Mork
2014-01-02 12:15   ` Bjørn Mork [this message]
2014-01-03  0:40     ` Rafael J. Wysocki
2014-01-03  9:24       ` Bjørn Mork
2014-01-03  9:53         ` Bjørn Mork
2014-01-03 11:19           ` Viresh Kumar
2014-01-03 11:55             ` Bjørn Mork
2014-01-06  6:27               ` Viresh Kumar
2014-01-06  9:01                 ` Bjørn Mork
2014-01-06  9:57                   ` Viresh Kumar
2014-01-06 10:49                     ` Bjørn Mork
2014-01-06 10:54                       ` Viresh Kumar
2014-01-06 11:33                         ` Rafael J. Wysocki
     [not found]                         ` <8738l15pht.fsf@nemi.mork.no>
2014-01-08  5:51                           ` Viresh Kumar
2014-01-06 11:14                       ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zjne7f75.fsf@nemi.mork.no \
    --to=bjorn@mork.no \
    --cc=cpufreq@vger.kernel.org \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=patches@linaro.org \
    --cc=rjw@rjwysocki.net \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox