From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Rafael J. Wysocki" Subject: Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume Date: Fri, 03 Jan 2014 01:40:42 +0100 Message-ID: <2302938.b8tymqrMEz@vostro.rjw.lan> References: <5562479.pVWRuDL0y6@vostro.rjw.lan> <87zjne7f75.fsf@nemi.mork.no> Mime-Version: 1.0 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <87zjne7f75.fsf@nemi.mork.no> Sender: linux-pm-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="utf-8" To: =?ISO-8859-1?Q?Bj=F8rn?= Mork , cpufreq@vger.kernel.org Cc: Viresh Kumar , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org On Thursday, January 02, 2014 01:15:10 PM Bj=C3=B8rn Mork wrote: > "Rafael J. Wysocki" writes: >=20 > > On Tuesday, December 24, 2013 07:11:00 AM Viresh Kumar wrote: > >> __cpufreq_add_dev() can fail sometimes while we are resuming our s= ystem. > >> Currently we are clearing all sysfs nodes for cpufreq's failed pol= icy as that > >> could make userspace unstable. But if we suspend/resume again, we = should atleast > >> try to bring back those policies. > >>=20 > >> This patch fixes this issue by clearing fallback data on failure a= nd trying to > >> allocate a new struct cpufreq_policy on second resume. > >>=20 > >> Reported-and-tested-by: Bj=C3=B8rn Mork > >> Signed-off-by: Viresh Kumar > > > > Well, while I appreciate the work done here, I don't like the chang= elog, > > I don't really like the way the code is written and I don't like th= e comments. > > Sorry about that. > > > > Bjorn, can you please test the patch below instead along with the [= 2/2] > > from this series (on top of linux-pm.git/pm-cpufreq)? >=20 > I tested this series with your modified 1/2 today, but on top of a > current mainline (commit 9a0bb2966ef) instead of linux-pm.git/pm-cpuf= req. > Shouldn't make much difference since Linus already has pulled your > 'pm+acpi-3.13-rc6' tag, which included that pm-cpufreq branch. >=20 > This series fixes the reported bug for me. >=20 >=20 > But I observed this, most likely unrelated, splat during the test: >=20 > ACPI Exception: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedCo= ntrol] (20131115/evregion-282) > ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC_.EC__.LPMD]= (Node ffff880232499c18), AE_BAD_PARAMETER (20131115/psparse-536) > ACPI Error: Method parse/execution failed [\_PR_.CPU0._PPC] (Node fff= f8802326f93d0), AE_BAD_PARAMETER (20131115/psparse-536) > ACPI Error: Method parse/execution failed [\_PR_.CPU1._PPC] (Node fff= f8802326f9268), AE_BAD_PARAMETER (20131115/psparse-536) > ACPI Exception: AE_BAD_PARAMETER, Evaluating _PPC (20131115/processor= _perflib-140) >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D > [ INFO: possible circular locking dependency detected ] > 3.13.0-rc6+ #181 Not tainted > ------------------------------------------------------- > s2disk/5651 is trying to acquire lock: > (s_active#170){++++.+}, at: [] sysfs_addrm_finish+= 0x28/0x46 >=20 > but task is already holding lock: > (cpu_hotplug.lock){+.+.+.}, at: [] cpu_hotplug_beg= in+0x28/0x50 >=20 > which lock already depends on the new lock. >=20 >=20 > the existing dependency chain (in reverse order) is: >=20 > -> #1 (cpu_hotplug.lock){+.+.+.}: > [] lock_acquire+0xfb/0x144 > [] mutex_lock_nested+0x6c/0x397 > [] get_online_cpus+0x3c/0x50 > [] store+0x20/0xad > [] sysfs_write_file+0x138/0x18b > [] vfs_write+0x9c/0x102 > [] SyS_write+0x50/0x85 > [] system_call_fastpath+0x16/0x1b >=20 > -> #0 (s_active#170){++++.+}: > [] __lock_acquire+0xae3/0xe68 > [] lock_acquire+0xfb/0x144 > [] sysfs_deactivate+0xa5/0x108 > [] sysfs_addrm_finish+0x28/0x46 > [] sysfs_remove+0x2a/0x31 > [] sysfs_remove_dir+0x66/0x6b > [] kobject_del+0x18/0x42 > [] kobject_cleanup+0xe1/0x14f > [] kobject_put+0x45/0x49 > [] cpufreq_policy_put_kobj+0x37/0x83 > [] __cpufreq_add_dev.isra.18+0x75e/0x78c > [] cpufreq_cpu_callback+0x53/0x88 > [] notifier_call_chain+0x67/0x92 > [] __raw_notifier_call_chain+0x9/0xb > [] __cpu_notify+0x1b/0x32 > [] cpu_notify+0xe/0x10 > [] _cpu_up+0xf1/0x124 > [] enable_nonboot_cpus+0x52/0xbf > [] hibernation_snapshot+0x1be/0x2ed > [] snapshot_ioctl+0x1e5/0x431 > [] do_vfs_ioctl+0x45e/0x4a8 > [] SyS_ioctl+0x52/0x82 > [] system_call_fastpath+0x16/0x1b >=20 > other info that might help us debug this: >=20 > Possible unsafe locking scenario: >=20 > CPU0 CPU1 > ---- ---- > lock(cpu_hotplug.lock); > lock(s_active#170); > lock(cpu_hotplug.lock); > lock(s_active#170); >=20 > *** DEADLOCK *** >=20 > 7 locks held by s2disk/5651: > #0: (pm_mutex){+.+.+.}, at: [] snapshot_ioctl+0x4= b/0x431 > #1: (device_hotplug_lock){+.+.+.}, at: [] lock_de= vice_hotplug+0x12/0x14 > #2: (acpi_scan_lock){+.+.+.}, at: [] acpi_scan_lo= ck_acquire+0x12/0x14 > #3: (console_lock){+.+.+.}, at: [] suspend_consol= e+0x20/0x38 > #4: (cpu_add_remove_lock){+.+.+.}, at: [] cpu_map= s_update_begin+0x12/0x14 > #5: (cpu_hotplug.lock){+.+.+.}, at: [] cpu_hotplu= g_begin+0x28/0x50 > #6: (cpufreq_rwsem){.+.+.+}, at: [] __cpufreq_add= _dev.isra.18+0x7f/0x78c >=20 > stack backtrace: > CPU: 0 PID: 5651 Comm: s2disk Not tainted 3.13.0-rc6+ #181 > Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/20= 11 > ffffffff81d3edf0 ffff8802174898f8 ffffffff81399cac 0000000000004549 > ffffffff81d3edf0 ffff880217489948 ffffffff81397dc5 ffff880217489928 > ffff88023198ea50 0000000000000006 ffff88023198f1d8 0000000000000006 > Call Trace: > [] dump_stack+0x4e/0x68 > [] print_circular_bug+0x1f8/0x209 > [] __lock_acquire+0xae3/0xe68 > [] ? debug_check_no_locks_freed+0x12c/0x143 > [] lock_acquire+0xfb/0x144 > [] ? sysfs_addrm_finish+0x28/0x46 > [] ? lockdep_init_map+0x14e/0x160 > [] sysfs_deactivate+0xa5/0x108 > [] ? sysfs_addrm_finish+0x28/0x46 > [] sysfs_addrm_finish+0x28/0x46 > [] sysfs_remove+0x2a/0x31 > [] sysfs_remove_dir+0x66/0x6b > [] kobject_del+0x18/0x42 > [] kobject_cleanup+0xe1/0x14f > [] kobject_put+0x45/0x49 > [] cpufreq_policy_put_kobj+0x37/0x83 > [] __cpufreq_add_dev.isra.18+0x75e/0x78c > [] ? __lock_is_held+0x32/0x54 > [] cpufreq_cpu_callback+0x53/0x88 > [] notifier_call_chain+0x67/0x92 > [] __raw_notifier_call_chain+0x9/0xb > [] __cpu_notify+0x1b/0x32 > [] cpu_notify+0xe/0x10 > [] _cpu_up+0xf1/0x124 > [] enable_nonboot_cpus+0x52/0xbf > [] hibernation_snapshot+0x1be/0x2ed > [] snapshot_ioctl+0x1e5/0x431 > [] do_vfs_ioctl+0x45e/0x4a8 > [] ? fcheck_files+0xa1/0xe4 > [] ? fget_light+0x33/0x9a > [] SyS_ioctl+0x52/0x82 > [] ? trace_hardirqs_on_thunk+0x3a/0x3f > [] system_call_fastpath+0x16/0x1b > CPU1 is up > ACPI: Waking up from system sleep state S4 > PM: noirq thaw of devices complete after 2.727 msecs > PM: early thaw of devices complete after 1.137 msecs >=20 >=20 >=20 > This warning appeared when I tried cancelling hibernation, which is > known to fail when using the acpi-cpufreq driver. The point of the t= est > was to verify that such failures still produce the expected result: >=20 > - all stale sysfs files are cleaned up and removed > - later suspend/resume actions will restore (or actually reinitiate) > the cpufreq driver for the non-boot CPUs >=20 > Which was successful, except that it produced the above lockdep warni= ng. >=20 > I have not verified that this is a new warning (which means that it m= ost > likely is not). And I expect the whole acpi-cpufreq problem, includi= ng > that warning, to go away when you eventually push > http://www.spinics.net/lists/cpufreq/msg08714.html with your improved > changelog (thanks for doing that BTW). So I don't worry too much abo= ut > it. Just wanted to let you know. OK, thanks! Well, this is somewhat worrisome. Could you also check the linux-pm.git/fixes branch that contains all pa= tches I'm planning to push for 3.13-rc7 shortly? Rafael