* cpufreq_stats NULL deref on second system suspend @ 2013-09-09 19:22 Stephen Warren 2013-09-09 20:01 ` Rafael J. Wysocki 0 siblings, 1 reply; 38+ messages in thread From: Stephen Warren @ 2013-09-09 19:22 UTC (permalink / raw) To: Viresh Kumar, linux-pm@vger.kernel.org; +Cc: Rafael J . Wysocki Viresh, I'm seeing the crash below when suspending my system for the second time. I can avoid this with the following patch, which adds a check which already exists in all-but-one other places that the same lookup is made: > diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c > index 4cf0d28..d54f467 100644 > --- a/drivers/cpufreq/cpufreq_stats.c > +++ b/drivers/cpufreq/cpufreq_stats.c > @@ -266,6 +266,9 @@ static void cpufreq_stats_update_policy_cpu(struct cpufreq_policy *policy) > struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, > policy->last_cpu); > > + if (!stat) > + return; > + > pr_debug("Updating stats_table for new_cpu %u from last_cpu %u\n", > policy->cpu, policy->last_cpu); > per_cpu(cpufreq_stats_table, policy->cpu) = per_cpu(cpufreq_stats_table, Is that a legitimate fix, or is there something more wrong here? > [ 76.065009] PM: suspend of devices complete after 452.313 msecs > [ 76.073117] PM: late suspend of devices complete after 1.748 msecs > [ 76.081832] PM: noirq suspend of devices complete after 2.091 msecs > [ 76.088533] Disabling non-boot CPUs... > [ 76.092779] Unable to handle kernel NULL pointer dereference at virtual address 00000000 > [ 76.101396] pgd = ece14000 > [ 76.104283] [00000000] *pgd=2d01e831, *pte=00000000, *ppte=00000000 > [ 76.111086] Internal error: Oops: 817 [#1] PREEMPT SMP ARM > [ 76.116894] Modules linked in: brcmutil [last unloaded: brcmfmac] > [ 76.123468] CPU: 0 PID: 1100 Comm: bash Not tainted 3.11.0-next-20130903-00021-g4af676e-dirty #40 > [ 76.132847] task: eda3c080 ti: ecd32000 task.ti: ecd32000 > [ 76.138590] PC is at cpufreq_stat_notifier_policy+0x248/0x2e4 > [ 76.144704] LR is at notifier_call_chain+0x44/0x84 > [ 76.149796] pc : [<c038eae0>] lr : [<c004493c>] psr: 60000113 > [ 76.149796] sp : ecd33d98 ip : 008c6000 fp : 00000002 > [ 76.161908] r10: c07c6018 r9 : c07d0f08 r8 : ee338bc0 > [ 76.167440] r7 : 00000004 r6 : 00000000 r5 : ee338bc0 r4 : 00000000 > [ 76.174342] r3 : 00000000 r2 : 008c6000 r1 : 00000000 r0 : c081db18 > [ 76.181249] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user > [ 76.188798] Control: 10c5387d Table: 2ce1404a DAC: 00000015 > [ 76.194880] Process bash (pid: 1100, stack limit = 0xecd32240) > [ 76.201057] Stack: (0xecd33d98 to 0xecd34000) > [ 76.205679] 3d80: ee331e10 00000000 > [ 76.214336] 3da0: c07d0c74 ffffffff 00000000 00000000 00000004 ee338bc0 00000000 c07d0f08 > [ 76.234928] 3dc0: 00000002 c004493c c07fb60c ffffffff ee338bc0 00000004 c07c5ff0 c0044b10 > [ 76.255517] 3de0: 00000000 00200200 eda3c080 ee338bc0 00000001 c086e41c 00000001 c0044b40 > [ 76.276076] 3e00: 00000000 c038c43c ee338bc0 c038cb4c c00ab890 c1093d4c c1093d4c 00000001 > [ 76.296554] 3e20: 00000051 00000015 00000001 c0871064 00000015 00000001 ee089f00 c07d17cc > [ 76.317063] 3e40: 000f03b8 c038e468 c038e400 ffffffdc ecd33e88 c004493c 00000001 c07d03d4 > [ 76.337592] 3e60: c07d0c74 00000010 00000001 c0025e10 ecd33e88 c07d03d4 c07d0c74 c0567cf0 > [ 76.358112] 3e80: 000f03b8 c056a994 00000023 ecd33ea4 00000010 00000001 00000001 c07d03d4 > [ 76.378755] 3ea0: c07d0c74 00000000 c0825300 c0026308 c0828a2c 00000000 00000003 c0828a2c > [ 76.399498] 3ec0: c00407e0 c005f058 000f03b8 c056a994 c06d91b8 ecd33eec 00000003 ecd33eec > [ 76.420368] 3ee0: 00000000 00000003 c08158c8 00000003 ed80f000 ee089f00 00000004 c005f39c > [ 76.441354] 3f00: 00000003 c06abb5c c0575d2c c005e1e4 00000004 00000004 ecd3f7c0 ecd3f7d8 > [ 76.462347] 3f20: ecd33f80 c059128c ee0f2868 c01f3140 00000004 c0123630 ed829080 00000004 > [ 76.483335] 3f40: 000ac408 ecd33f80 00000000 ecd32000 00000004 c00cfc10 edb71900 00000001 > [ 76.504286] 3f60: 0000000a 00000000 00000000 ed829080 000ac408 00000000 00000004 c00cffe4 > [ 76.525208] 3f80: 00000000 00000000 edab98c0 b6f3aa78 00000004 000ac408 00000004 c000f1e4 > [ 76.546215] 3fa0: 00000000 c000f060 b6f3aa78 00000004 00000001 000ac408 00000004 00000000 > [ 76.567344] 3fc0: b6f3aa78 00000004 000ac408 00000004 bed9995c 000a6094 00000000 000f03b8 > [ 76.588609] 3fe0: 00000000 bed998dc b6eaab77 b6ee125c 40070010 00000001 656d5f70 7063006d > [ 76.610137] [<c038eae0>] (cpufreq_stat_notifier_policy+0x248/0x2e4) from [<c004493c>] (notifier_call_chain+0x44/0x84) > [ 76.634459] [<c004493c>] (notifier_call_chain+0x44/0x84) from [<c0044b10>] (__blocking_notifier_call_chain+0x48/0x60) > [ 76.658877] [<c0044b10>] (__blocking_notifier_call_chain+0x48/0x60) from [<c0044b40>] (blocking_notifier_call_chain+0x18/0x20) > [ 76.684322] [<c0044b40>] (blocking_notifier_call_chain+0x18/0x20) from [<c038cb4c>] (__cpufreq_remove_dev.isra.13+0x158/0x4a8) > [ 76.709933] [<c038cb4c>] (__cpufreq_remove_dev.isra.13+0x158/0x4a8) from [<c038e468>] (cpufreq_cpu_callback+0x68/0x70) > [ 76.734970] [<c038e468>] (cpufreq_cpu_callback+0x68/0x70) from [<c004493c>] (notifier_call_chain+0x44/0x84) > [ 76.759131] [<c004493c>] (notifier_call_chain+0x44/0x84) from [<c0025e10>] (__cpu_notify+0x28/0x44) > [ 76.782720] [<c0025e10>] (__cpu_notify+0x28/0x44) from [<c0567cf0>] (_cpu_down+0x80/0x238) > [ 76.805598] [<c0567cf0>] (_cpu_down+0x80/0x238) from [<c0026308>] (disable_nonboot_cpus+0x68/0xe8) > [ 76.829346] [<c0026308>] (disable_nonboot_cpus+0x68/0xe8) from [<c005f058>] (suspend_devices_and_enter+0x160/0x2f8) > [ 76.854747] [<c005f058>] (suspend_devices_and_enter+0x160/0x2f8) from [<c005f39c>] (pm_suspend+0x1ac/0x260) > [ 76.879489] [<c005f39c>] (pm_suspend+0x1ac/0x260) from [<c005e1e4>] (state_store+0x6c/0xbc) > [ 76.902912] [<c005e1e4>] (state_store+0x6c/0xbc) from [<c01f3140>] (kobj_attr_store+0x14/0x20) > [ 76.926666] [<c01f3140>] (kobj_attr_store+0x14/0x20) from [<c0123630>] (sysfs_write_file+0x168/0x198) > [ 76.951174] [<c0123630>] (sysfs_write_file+0x168/0x198) from [<c00cfc10>] (vfs_write+0xb0/0x194) > [ 76.975189] [<c00cfc10>] (vfs_write+0xb0/0x194) from [<c00cffe4>] (SyS_write+0x3c/0x70) > [ 76.998360] [<c00cffe4>] (SyS_write+0x3c/0x70) from [<c000f060>] (ret_fast_syscall+0x0/0x30) > [ 77.022001] Code: e5952010 e7992102 e78a3002 e595300c (e5863000) > [ 77.044143] ---[ end trace faeaf1120c9722e9 ]--- ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-09 19:22 cpufreq_stats NULL deref on second system suspend Stephen Warren @ 2013-09-09 20:01 ` Rafael J. Wysocki 2013-09-09 20:01 ` Stephen Warren 0 siblings, 1 reply; 38+ messages in thread From: Rafael J. Wysocki @ 2013-09-09 20:01 UTC (permalink / raw) To: Stephen Warren; +Cc: Viresh Kumar, linux-pm@vger.kernel.org On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote: > Viresh, > > I'm seeing the crash below when suspending my system for the second time. > > I can avoid this with the following patch, which adds a check which > already exists in all-but-one other places that the same lookup is made: Which kernel did you test? > > diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c > > index 4cf0d28..d54f467 100644 > > --- a/drivers/cpufreq/cpufreq_stats.c > > +++ b/drivers/cpufreq/cpufreq_stats.c > > @@ -266,6 +266,9 @@ static void cpufreq_stats_update_policy_cpu(struct cpufreq_policy *policy) > > struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, > > policy->last_cpu); > > > > + if (!stat) > > + return; > > + > > pr_debug("Updating stats_table for new_cpu %u from last_cpu %u\n", > > policy->cpu, policy->last_cpu); > > per_cpu(cpufreq_stats_table, policy->cpu) = per_cpu(cpufreq_stats_table, > > Is that a legitimate fix, or is there something more wrong here? > > > [ 76.065009] PM: suspend of devices complete after 452.313 msecs > > [ 76.073117] PM: late suspend of devices complete after 1.748 msecs > > [ 76.081832] PM: noirq suspend of devices complete after 2.091 msecs > > [ 76.088533] Disabling non-boot CPUs... > > [ 76.092779] Unable to handle kernel NULL pointer dereference at virtual address 00000000 > > [ 76.101396] pgd = ece14000 > > [ 76.104283] [00000000] *pgd=2d01e831, *pte=00000000, *ppte=00000000 > > [ 76.111086] Internal error: Oops: 817 [#1] PREEMPT SMP ARM > > [ 76.116894] Modules linked in: brcmutil [last unloaded: brcmfmac] > > [ 76.123468] CPU: 0 PID: 1100 Comm: bash Not tainted 3.11.0-next-20130903-00021-g4af676e-dirty #40 > > [ 76.132847] task: eda3c080 ti: ecd32000 task.ti: ecd32000 > > [ 76.138590] PC is at cpufreq_stat_notifier_policy+0x248/0x2e4 > > [ 76.144704] LR is at notifier_call_chain+0x44/0x84 > > [ 76.149796] pc : [<c038eae0>] lr : [<c004493c>] psr: 60000113 > > [ 76.149796] sp : ecd33d98 ip : 008c6000 fp : 00000002 > > [ 76.161908] r10: c07c6018 r9 : c07d0f08 r8 : ee338bc0 > > [ 76.167440] r7 : 00000004 r6 : 00000000 r5 : ee338bc0 r4 : 00000000 > > [ 76.174342] r3 : 00000000 r2 : 008c6000 r1 : 00000000 r0 : c081db18 > > [ 76.181249] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user > > [ 76.188798] Control: 10c5387d Table: 2ce1404a DAC: 00000015 > > [ 76.194880] Process bash (pid: 1100, stack limit = 0xecd32240) > > [ 76.201057] Stack: (0xecd33d98 to 0xecd34000) > > [ 76.205679] 3d80: ee331e10 00000000 > > [ 76.214336] 3da0: c07d0c74 ffffffff 00000000 00000000 00000004 ee338bc0 00000000 c07d0f08 > > [ 76.234928] 3dc0: 00000002 c004493c c07fb60c ffffffff ee338bc0 00000004 c07c5ff0 c0044b10 > > [ 76.255517] 3de0: 00000000 00200200 eda3c080 ee338bc0 00000001 c086e41c 00000001 c0044b40 > > [ 76.276076] 3e00: 00000000 c038c43c ee338bc0 c038cb4c c00ab890 c1093d4c c1093d4c 00000001 > > [ 76.296554] 3e20: 00000051 00000015 00000001 c0871064 00000015 00000001 ee089f00 c07d17cc > > [ 76.317063] 3e40: 000f03b8 c038e468 c038e400 ffffffdc ecd33e88 c004493c 00000001 c07d03d4 > > [ 76.337592] 3e60: c07d0c74 00000010 00000001 c0025e10 ecd33e88 c07d03d4 c07d0c74 c0567cf0 > > [ 76.358112] 3e80: 000f03b8 c056a994 00000023 ecd33ea4 00000010 00000001 00000001 c07d03d4 > > [ 76.378755] 3ea0: c07d0c74 00000000 c0825300 c0026308 c0828a2c 00000000 00000003 c0828a2c > > [ 76.399498] 3ec0: c00407e0 c005f058 000f03b8 c056a994 c06d91b8 ecd33eec 00000003 ecd33eec > > [ 76.420368] 3ee0: 00000000 00000003 c08158c8 00000003 ed80f000 ee089f00 00000004 c005f39c > > [ 76.441354] 3f00: 00000003 c06abb5c c0575d2c c005e1e4 00000004 00000004 ecd3f7c0 ecd3f7d8 > > [ 76.462347] 3f20: ecd33f80 c059128c ee0f2868 c01f3140 00000004 c0123630 ed829080 00000004 > > [ 76.483335] 3f40: 000ac408 ecd33f80 00000000 ecd32000 00000004 c00cfc10 edb71900 00000001 > > [ 76.504286] 3f60: 0000000a 00000000 00000000 ed829080 000ac408 00000000 00000004 c00cffe4 > > [ 76.525208] 3f80: 00000000 00000000 edab98c0 b6f3aa78 00000004 000ac408 00000004 c000f1e4 > > [ 76.546215] 3fa0: 00000000 c000f060 b6f3aa78 00000004 00000001 000ac408 00000004 00000000 > > [ 76.567344] 3fc0: b6f3aa78 00000004 000ac408 00000004 bed9995c 000a6094 00000000 000f03b8 > > [ 76.588609] 3fe0: 00000000 bed998dc b6eaab77 b6ee125c 40070010 00000001 656d5f70 7063006d > > [ 76.610137] [<c038eae0>] (cpufreq_stat_notifier_policy+0x248/0x2e4) from [<c004493c>] (notifier_call_chain+0x44/0x84) > > [ 76.634459] [<c004493c>] (notifier_call_chain+0x44/0x84) from [<c0044b10>] (__blocking_notifier_call_chain+0x48/0x60) > > [ 76.658877] [<c0044b10>] (__blocking_notifier_call_chain+0x48/0x60) from [<c0044b40>] (blocking_notifier_call_chain+0x18/0x20) > > [ 76.684322] [<c0044b40>] (blocking_notifier_call_chain+0x18/0x20) from [<c038cb4c>] (__cpufreq_remove_dev.isra.13+0x158/0x4a8) > > [ 76.709933] [<c038cb4c>] (__cpufreq_remove_dev.isra.13+0x158/0x4a8) from [<c038e468>] (cpufreq_cpu_callback+0x68/0x70) > > [ 76.734970] [<c038e468>] (cpufreq_cpu_callback+0x68/0x70) from [<c004493c>] (notifier_call_chain+0x44/0x84) > > [ 76.759131] [<c004493c>] (notifier_call_chain+0x44/0x84) from [<c0025e10>] (__cpu_notify+0x28/0x44) > > [ 76.782720] [<c0025e10>] (__cpu_notify+0x28/0x44) from [<c0567cf0>] (_cpu_down+0x80/0x238) > > [ 76.805598] [<c0567cf0>] (_cpu_down+0x80/0x238) from [<c0026308>] (disable_nonboot_cpus+0x68/0xe8) > > [ 76.829346] [<c0026308>] (disable_nonboot_cpus+0x68/0xe8) from [<c005f058>] (suspend_devices_and_enter+0x160/0x2f8) > > [ 76.854747] [<c005f058>] (suspend_devices_and_enter+0x160/0x2f8) from [<c005f39c>] (pm_suspend+0x1ac/0x260) > > [ 76.879489] [<c005f39c>] (pm_suspend+0x1ac/0x260) from [<c005e1e4>] (state_store+0x6c/0xbc) > > [ 76.902912] [<c005e1e4>] (state_store+0x6c/0xbc) from [<c01f3140>] (kobj_attr_store+0x14/0x20) > > [ 76.926666] [<c01f3140>] (kobj_attr_store+0x14/0x20) from [<c0123630>] (sysfs_write_file+0x168/0x198) > > [ 76.951174] [<c0123630>] (sysfs_write_file+0x168/0x198) from [<c00cfc10>] (vfs_write+0xb0/0x194) > > [ 76.975189] [<c00cfc10>] (vfs_write+0xb0/0x194) from [<c00cffe4>] (SyS_write+0x3c/0x70) > > [ 76.998360] [<c00cffe4>] (SyS_write+0x3c/0x70) from [<c000f060>] (ret_fast_syscall+0x0/0x30) > > [ 77.022001] Code: e5952010 e7992102 e78a3002 e595300c (e5863000) > > [ 77.044143] ---[ end trace faeaf1120c9722e9 ]--- -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-09 20:01 ` Rafael J. Wysocki @ 2013-09-09 20:01 ` Stephen Warren 2013-09-09 20:24 ` Rafael J. Wysocki 0 siblings, 1 reply; 38+ messages in thread From: Stephen Warren @ 2013-09-09 20:01 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Viresh Kumar, linux-pm@vger.kernel.org On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote: > On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote: >> Viresh, >> >> I'm seeing the crash below when suspending my system for the second time. >> >> I can avoid this with the following patch, which adds a check which >> already exists in all-but-one other places that the same lookup is made: > > Which kernel did you test? next-20130909. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-09 20:01 ` Stephen Warren @ 2013-09-09 20:24 ` Rafael J. Wysocki 2013-09-09 21:29 ` Stephen Warren 0 siblings, 1 reply; 38+ messages in thread From: Rafael J. Wysocki @ 2013-09-09 20:24 UTC (permalink / raw) To: Stephen Warren; +Cc: Viresh Kumar, linux-pm@vger.kernel.org On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote: > On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote: > > On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote: > >> Viresh, > >> > >> I'm seeing the crash below when suspending my system for the second time. > >> > >> I can avoid this with the following patch, which adds a check which > >> already exists in all-but-one other places that the same lookup is made: > > > > Which kernel did you test? > > next-20130909. Is it reproducible with the current mainline? Rafael ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-09 20:24 ` Rafael J. Wysocki @ 2013-09-09 21:29 ` Stephen Warren 2013-09-09 23:14 ` Rafael J. Wysocki 0 siblings, 1 reply; 38+ messages in thread From: Stephen Warren @ 2013-09-09 21:29 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Viresh Kumar, linux-pm@vger.kernel.org On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote: > On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote: >> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote: >>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote: >>>> Viresh, >>>> >>>> I'm seeing the crash below when suspending my system for the second time. >>>> >>>> I can avoid this with the following patch, which adds a check which >>>> already exists in all-but-one other places that the same lookup is made: >>> >>> Which kernel did you test? >> >> next-20130909. > > Is it reproducible with the current mainline? This does not affect v3.11, but does affect current HEAD; 300893b "Merge tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs". ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-09 21:29 ` Stephen Warren @ 2013-09-09 23:14 ` Rafael J. Wysocki 2013-09-10 20:53 ` Stephen Warren 0 siblings, 1 reply; 38+ messages in thread From: Rafael J. Wysocki @ 2013-09-09 23:14 UTC (permalink / raw) To: Stephen Warren; +Cc: Viresh Kumar, linux-pm@vger.kernel.org On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote: > On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote: > > On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote: > >> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote: > >>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote: > >>>> Viresh, > >>>> > >>>> I'm seeing the crash below when suspending my system for the second time. > >>>> > >>>> I can avoid this with the following patch, which adds a check which > >>>> already exists in all-but-one other places that the same lookup is made: > >>> > >>> Which kernel did you test? > >> > >> next-20130909. > > > > Is it reproducible with the current mainline? > > This does not affect v3.11, but does affect current HEAD; 300893b "Merge > tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs". What system does it break on? Any chance to bisect cpufreq changes between 3.11 and the current HEAD? Rafael ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-09 23:14 ` Rafael J. Wysocki @ 2013-09-10 20:53 ` Stephen Warren 2013-09-10 22:34 ` Rafael J. Wysocki 0 siblings, 1 reply; 38+ messages in thread From: Stephen Warren @ 2013-09-10 20:53 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Viresh Kumar, linux-pm@vger.kernel.org, Srivatsa S. Bhat On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote: > On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote: >> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote: >>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote: >>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote: >>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote: >>>>>> Viresh, >>>>>> >>>>>> I'm seeing the crash below when suspending my system for the second time. >>>>>> >>>>>> I can avoid this with the following patch, which adds a check which >>>>>> already exists in all-but-one other places that the same lookup is made: >>>>> >>>>> Which kernel did you test? >>>> >>>> next-20130909. >>> >>> Is it reproducible with the current mainline? >> >> This does not affect v3.11, but does affect current HEAD; 300893b "Merge >> tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs". > > What system does it break on? A dual-core ARM system (NVIDIA Tegra20 SoC, Harmony board). > Any chance to bisect cpufreq changes between 3.11 and the current HEAD? Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown during suspend/resume". ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-10 20:53 ` Stephen Warren @ 2013-09-10 22:34 ` Rafael J. Wysocki 2013-09-11 10:21 ` Srivatsa S. Bhat 0 siblings, 1 reply; 38+ messages in thread From: Rafael J. Wysocki @ 2013-09-10 22:34 UTC (permalink / raw) To: Stephen Warren, Srivatsa S. Bhat; +Cc: Viresh Kumar, linux-pm@vger.kernel.org On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote: > On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote: > > On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote: > >> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote: > >>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote: > >>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote: > >>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote: > >>>>>> Viresh, > >>>>>> > >>>>>> I'm seeing the crash below when suspending my system for the second time. > >>>>>> > >>>>>> I can avoid this with the following patch, which adds a check which > >>>>>> already exists in all-but-one other places that the same lookup is made: > >>>>> > >>>>> Which kernel did you test? > >>>> > >>>> next-20130909. > >>> > >>> Is it reproducible with the current mainline? > >> > >> This does not affect v3.11, but does affect current HEAD; 300893b "Merge > >> tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs". > > > > What system does it break on? > > A dual-core ARM system (NVIDIA Tegra20 SoC, Harmony board). > > > Any chance to bisect cpufreq changes between 3.11 and the current HEAD? > > Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown > during suspend/resume". Thanks! Srivatsa, any chance to look into this? Rafael ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-10 22:34 ` Rafael J. Wysocki @ 2013-09-11 10:21 ` Srivatsa S. Bhat 2013-09-11 10:44 ` Viresh Kumar ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: Srivatsa S. Bhat @ 2013-09-11 10:21 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Stephen Warren, Viresh Kumar, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote: > On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote: >> On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote: >>> On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote: >>>> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote: >>>>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote: >>>>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote: >>>>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote: >>>>>>>> Viresh, >>>>>>>> >>>>>>>> I'm seeing the crash below when suspending my system for the second time. >>>>>>>> >>>>>>>> I can avoid this with the following patch, which adds a check which >>>>>>>> already exists in all-but-one other places that the same lookup is made: >>>>>>> >>>>>>> Which kernel did you test? >>>>>> >>>>>> next-20130909. >>>>> >>>>> Is it reproducible with the current mainline? >>>> >>>> This does not affect v3.11, but does affect current HEAD; 300893b "Merge >>>> tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs". >>> >>> What system does it break on? >> >> A dual-core ARM system (NVIDIA Tegra20 SoC, Harmony board). >> >>> Any chance to bisect cpufreq changes between 3.11 and the current HEAD? >> >> Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown >> during suspend/resume". > > Thanks! > > Srivatsa, any chance to look into this? > Sure, Rafael. Thanks for CC'ing me. Stephen, I went through the code and I think I found out what is going wrong. Can you please try the following patch? Regards, Srivatsa S. Bhat ---------------------------------------------------------------------------- From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Subject: [PATCH] cpufreq: Fix crash in cpufreq-stats during suspend/resume Stephen Warren reported that the cpufreq-stats code hits a NULL pointer dereference during the second attempt to suspend a system. He also pin-pointed the problem to commit 5302c3f "cpufreq: Perform light-weight init/teardown during suspend/resume". That commit actually ensured that the cpufreq-stats table and the cpufreq-stats sysfs entries are *not* torn down (ie., not freed) during suspend/resume, which makes it all the more surprising. However, it turns out that the root-cause is not that we access an already freed memory, but that the reference to the allocated memory gets moved around and we lose track of that during resume, leading to the reported crash in a subsequent suspend attempt. In the suspend path, during CPU offline, the value of policy->cpu is updated by choosing one of the surviving CPUs in that policy, as long as there is atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is invoked to update the reference to the stats structure by assigning it to the new CPU. However, in the resume path, during CPU online, we end up assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats know about this. Thus the reference to the stats structure remains (incorrectly) associated with the old CPU. So, in a subsequent suspend attempt, during CPU offline, we end up accessing an incorrect location to get the stats structure, which eventually leads to the NULL pointer dereference. Fix this by letting cpufreq-stats know about the update of the policy->cpu during CPU online in the resume path. (Also, move the update_policy_cpu() function higher up in the file, so that __cpufreq_add_dev() can invoke it). Reported-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> --- drivers/cpufreq/cpufreq.c | 37 ++++++++++++++++++++++++------------- 1 file changed, 24 insertions(+), 13 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 5a64f66..62bdb95 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -947,6 +947,18 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy) kfree(policy); } +static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu) +{ + policy->last_cpu = policy->cpu; + policy->cpu = cpu; + +#ifdef CONFIG_CPU_FREQ_TABLE + cpufreq_frequency_table_update_policy_cpu(policy); +#endif + blocking_notifier_call_chain(&cpufreq_policy_notifier_list, + CPUFREQ_UPDATE_POLICY_CPU, policy); +} + static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, bool frozen) { @@ -1000,7 +1012,18 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, if (!policy) goto nomem_out; - policy->cpu = cpu; + + /* + * In the resume path, since we restore a saved policy, the assignment + * to policy->cpu is like an update of the existing policy, rather than + * the creation of a brand new one. So we need to perform this update + * by invoking update_policy_cpu(). + */ + if (frozen && cpu != policy->cpu) + update_policy_cpu(policy, cpu); + else + policy->cpu = cpu; + policy->governor = CPUFREQ_DEFAULT_GOVERNOR; cpumask_copy(policy->cpus, cpumask_of(cpu)); @@ -1092,18 +1115,6 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) return __cpufreq_add_dev(dev, sif, false); } -static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu) -{ - policy->last_cpu = policy->cpu; - policy->cpu = cpu; - -#ifdef CONFIG_CPU_FREQ_TABLE - cpufreq_frequency_table_update_policy_cpu(policy); -#endif - blocking_notifier_call_chain(&cpufreq_policy_notifier_list, - CPUFREQ_UPDATE_POLICY_CPU, policy); -} - static int cpufreq_nominate_new_policy_cpu(struct cpufreq_policy *policy, unsigned int old_cpu, bool frozen) { ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 10:21 ` Srivatsa S. Bhat @ 2013-09-11 10:44 ` Viresh Kumar 2013-09-11 10:45 ` Viresh Kumar 2013-09-11 11:10 ` Srivatsa S. Bhat 2013-09-11 11:09 ` Srivatsa S. Bhat 2013-09-11 16:05 ` Stephen Warren 2 siblings, 2 replies; 38+ messages in thread From: Viresh Kumar @ 2013-09-11 10:44 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 11 September 2013 15:51, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote: >> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote: >>> Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown >>> during suspend/resume". Sorry Stephen, I was away on vacations and came back yesterday only.. And was badly stuck in something other CPUFreq bugs until now :) > Sure, Rafael. Thanks for CC'ing me. Thanks for jumping in and helping us out buddy!!!. > Stephen, I went through the code and I think I found out what is going wrong. > Can you please try the following patch? > > Regards, > Srivatsa S. Bhat > > ---------------------------------------------------------------------------- > > > From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> > Subject: [PATCH] cpufreq: Fix crash in cpufreq-stats during suspend/resume > > Stephen Warren reported that the cpufreq-stats code hits a NULL pointer > dereference during the second attempt to suspend a system. He also > pin-pointed the problem to commit 5302c3f "cpufreq: Perform light-weight > init/teardown during suspend/resume". > > That commit actually ensured that the cpufreq-stats table and the > cpufreq-stats sysfs entries are *not* torn down (ie., not freed) during > suspend/resume, which makes it all the more surprising. However, it turns > out that the root-cause is not that we access an already freed memory, but > that the reference to the allocated memory gets moved around and we lose > track of that during resume, leading to the reported crash in a subsequent > suspend attempt. > > In the suspend path, during CPU offline, the value of policy->cpu is updated > by choosing one of the surviving CPUs in that policy, as long as there is > atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is > invoked to update the reference to the stats structure by assigning it to > the new CPU. However, in the resume path, during CPU online, we end up > assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats > know about this. Thus the reference to the stats structure remains > (incorrectly) associated with the old CPU. So, in a subsequent suspend attempt, > during CPU offline, we end up accessing an incorrect location to get the > stats structure, which eventually leads to the NULL pointer dereference. > > Fix this by letting cpufreq-stats know about the update of the policy->cpu > during CPU online in the resume path. (Also, move the update_policy_cpu() > function higher up in the file, so that __cpufreq_add_dev() can invoke > it). Observation looks good.. > Reported-by: Stephen Warren <swarren@nvidia.com> > Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> > --- > > drivers/cpufreq/cpufreq.c | 37 ++++++++++++++++++++++++------------- > 1 file changed, 24 insertions(+), 13 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 5a64f66..62bdb95 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -947,6 +947,18 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy) > kfree(policy); > } > > +static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu) > +{ > + policy->last_cpu = policy->cpu; > + policy->cpu = cpu; > + > +#ifdef CONFIG_CPU_FREQ_TABLE > + cpufreq_frequency_table_update_policy_cpu(policy); > +#endif > + blocking_notifier_call_chain(&cpufreq_policy_notifier_list, > + CPUFREQ_UPDATE_POLICY_CPU, policy); > +} > + > static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, > bool frozen) > { > @@ -1000,7 +1012,18 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, > if (!policy) > goto nomem_out; > > - policy->cpu = cpu; > + > + /* > + * In the resume path, since we restore a saved policy, the assignment > + * to policy->cpu is like an update of the existing policy, rather than > + * the creation of a brand new one. So we need to perform this update > + * by invoking update_policy_cpu(). > + */ > + if (frozen && cpu != policy->cpu) > + update_policy_cpu(policy, cpu); > + else > + policy->cpu = cpu; > + > policy->governor = CPUFREQ_DEFAULT_GOVERNOR; > cpumask_copy(policy->cpus, cpumask_of(cpu)); > > @@ -1092,18 +1115,6 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) > return __cpufreq_add_dev(dev, sif, false); > } > > -static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu) > -{ > - policy->last_cpu = policy->cpu; > - policy->cpu = cpu; > - > -#ifdef CONFIG_CPU_FREQ_TABLE > - cpufreq_frequency_table_update_policy_cpu(policy); > -#endif > - blocking_notifier_call_chain(&cpufreq_policy_notifier_list, > - CPUFREQ_UPDATE_POLICY_CPU, policy); > -} > - But I would have solved it differently :) We don't really need to call update_policy_cpu() again and again as we don't really need to update policy->cpu... Rather it would be better to just move following inside cpufreq_policy_alloc(): ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 10:44 ` Viresh Kumar @ 2013-09-11 10:45 ` Viresh Kumar 2013-09-11 11:14 ` Srivatsa S. Bhat 2013-09-11 11:10 ` Srivatsa S. Bhat 1 sibling, 1 reply; 38+ messages in thread From: Viresh Kumar @ 2013-09-11 10:45 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 11 September 2013 16:14, Viresh Kumar <viresh.kumar@linaro.org> wrote: > But I would have solved it differently :) > > We don't really need to call update_policy_cpu() again and again > as we don't really need to update policy->cpu... > > Rather it would be better to just move following inside > cpufreq_policy_alloc(): Half written mail sent :( What about moving following to cpufreq_policy_alloc(): policy->cpu = cpu; ?? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 10:45 ` Viresh Kumar @ 2013-09-11 11:14 ` Srivatsa S. Bhat 2013-09-11 11:59 ` Viresh Kumar 0 siblings, 1 reply; 38+ messages in thread From: Srivatsa S. Bhat @ 2013-09-11 11:14 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/11/2013 04:15 PM, Viresh Kumar wrote: > On 11 September 2013 16:14, Viresh Kumar <viresh.kumar@linaro.org> wrote: >> But I would have solved it differently :) >> >> We don't really need to call update_policy_cpu() again and again >> as we don't really need to update policy->cpu... >> >> Rather it would be better to just move following inside >> cpufreq_policy_alloc(): > > Half written mail sent :( > > What about moving following to cpufreq_policy_alloc(): > > policy->cpu = cpu; > > ?? > Hmm? The problem is not about merely updating the policy->cpu field; the main issue is that the existing code was not letting the cpufreq-stats code know that we updated the policy->cpu under the hood. It is important for cpufreq-stats to know this because it maintains the reference to its stats structure by associating it with the policy->cpu. So if policy->cpu changes under the hood, it loses track of its reference. So we need to keep that code informed about changes to policy->cpu. Thus, we need to call update_policy_cpu() in the CPU online path (during resume). I don't see how we can skip that. Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 11:14 ` Srivatsa S. Bhat @ 2013-09-11 11:59 ` Viresh Kumar 2013-09-11 13:56 ` Srivatsa S. Bhat 2013-09-12 5:52 ` Viresh Kumar 0 siblings, 2 replies; 38+ messages in thread From: Viresh Kumar @ 2013-09-11 11:59 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 11 September 2013 16:44, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > Hmm? The problem is not about merely updating the policy->cpu field; the > main issue is that the existing code was not letting the cpufreq-stats > code know that we updated the policy->cpu under the hood. It is important > for cpufreq-stats to know this because it maintains the reference to its > stats structure by associating it with the policy->cpu. So if policy->cpu > changes under the hood, it loses track of its reference. So we need to > keep that code informed about changes to policy->cpu. Thus, we need to call > update_policy_cpu() in the CPU online path (during resume). I don't see > how we can skip that. Okay.. There are two different ways in which cpufreq_add_dev() work currently.. Boot cluster (i.e. policy with boot CPU) --------------- Here cpufreq_remove_dev() is never called for boot cpu but all others. And similarly cpufreq_add_dev() is never called for boot cpu but all others. Now policy->cpu contains meaningful cpu at beginning of resume and we don't need to modify that at all.. For all the remaining CPUs we better call cpufreq_add_policy_cpu() rather.. Non-boot Cluster --------------------- All CPUs here are removed and at the end policy->cpu contains the last cpu removed.. So, for a cluster with cpu 2 and 3.... it will contain 3.. Not at resume we will add cpu2 first and so need to update policy->cpu to 2.. But for all other CPUs in this cluster we return early from cpufreq_add_dev() and call cpufreq_add_policy_cpu() as policy->cpus was fixed by call to ->init() for the first cpu of this cluster.. And so we never reach the line: policy->cpu = cpu; For the first cpu of non-boot cluster we need to call update_policy_cpu() and not for others.. But for the boot cluster if we can call ->init() somehow at resume time, then things would be fairly similar in both cases.. I am running of time now, as need to leave office now... I hope I made the problem more clear or probably the way I see it :) -- viresh ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 11:59 ` Viresh Kumar @ 2013-09-11 13:56 ` Srivatsa S. Bhat 2013-09-12 5:52 ` Viresh Kumar 1 sibling, 0 replies; 38+ messages in thread From: Srivatsa S. Bhat @ 2013-09-11 13:56 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/11/2013 05:29 PM, Viresh Kumar wrote: > On 11 September 2013 16:44, Srivatsa S. Bhat > <srivatsa.bhat@linux.vnet.ibm.com> wrote: >> Hmm? The problem is not about merely updating the policy->cpu field; the >> main issue is that the existing code was not letting the cpufreq-stats >> code know that we updated the policy->cpu under the hood. It is important >> for cpufreq-stats to know this because it maintains the reference to its >> stats structure by associating it with the policy->cpu. So if policy->cpu >> changes under the hood, it loses track of its reference. So we need to >> keep that code informed about changes to policy->cpu. Thus, we need to call >> update_policy_cpu() in the CPU online path (during resume). I don't see >> how we can skip that. > > Okay.. There are two different ways in which cpufreq_add_dev() work > currently.. > > Boot cluster (i.e. policy with boot CPU) > --------------- > > Here cpufreq_remove_dev() is never called for boot cpu but all others. > And similarly cpufreq_add_dev() is never called for boot cpu but all others. > > Now policy->cpu contains meaningful cpu at beginning of resume and > we don't need to modify that at all.. For all the remaining CPUs we > better call cpufreq_add_policy_cpu() rather.. > Yes, and the existing code already does that. And if you observe the placement of the call to update_policy_cpu() in my patch, you'll find that it won't be called in cases where we call cpufreq_add_policy_cpu(). Which is exactly what you want, IIUC. > Non-boot Cluster > --------------------- > > All CPUs here are removed and at the end policy->cpu contains the last > cpu removed.. So, for a cluster with cpu 2 and 3.... it will contain 3.. > > Not at resume we will add cpu2 first and so need to update policy->cpu > to 2.. But for all other CPUs in this cluster we return early from > cpufreq_add_dev() and call cpufreq_add_policy_cpu() as policy->cpus > was fixed by call to ->init() for the first cpu of this cluster.. > > And so we never reach the line: policy->cpu = cpu; > > For the first cpu of non-boot cluster we need to call update_policy_cpu() > and not for others.. > Yes, and that's precisely why I added the call to update_policy_cpu() only near the statement 'policy->cpu = cpu'. All other cases don't need to call that function, and in my patch, they indeed don't call that function. A simple way to look at this is: If policy->cpu is updated anywhere in the resume (cpu online) path, we need to call update_policy_cpu(). Otherwise, we don't need to call that function. This will cover both the scenarios you described above. > > But for the boot cluster if we can call ->init() somehow at resume time, > then things would be fairly similar in both cases.. > > I am running of time now, as need to leave office now... > I hope I made the problem more clear or probably the way I see it :) > Well, your explanation matches my understanding of the scenarios, and I had written the patch keeping those scenarios in mind; so I believe the patch is correct as it is. So what I didn't get is this: are you suggesting that something is wrong in my patch, or are you just reiterating the different scenarios involved so that I can recheck whether my patch is right? If it is the former, please point me to the exact problem you found in my patch. If it is the latter, I'm pretty sure my patch is right, I've already checked it :-) Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 11:59 ` Viresh Kumar 2013-09-11 13:56 ` Srivatsa S. Bhat @ 2013-09-12 5:52 ` Viresh Kumar 2013-09-12 6:26 ` Srivatsa S. Bhat 1 sibling, 1 reply; 38+ messages in thread From: Viresh Kumar @ 2013-09-12 5:52 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq Let me fix my mail first.. I was running out of time yesterday and so couldn't frame things correctly :) On 11 September 2013 17:29, Viresh Kumar <viresh.kumar@linaro.org> wrote: > Okay.. There are two different ways in which cpufreq_add_dev() work > currently.. > > Boot cluster (i.e. policy with boot CPU) > --------------- > > Here cpufreq_remove_dev() is never called for boot cpu but all others. > And similarly cpufreq_add_dev() is never called for boot cpu but all others. > > Now policy->cpu contains meaningful cpu at beginning of resume and > we don't need to modify that at all.. For all the remaining CPUs we > better call cpufreq_add_policy_cpu() rather.. And this should be done without your patch. Or actually we will simply return from this place. Atleast for systems with single cluster, like Tegra. policy->related_cpus is still valid after resume and we haven't removed policy from the cpufreq_policy_list (though there is a bug which I have fixed separately and sent it to you..).. So no change required for a single cluster system.. > Non-boot Cluster > --------------------- > > All CPUs here are removed and at the end policy->cpu contains the last > cpu removed.. So, for a cluster with cpu 2 and 3.... it will contain 3.. > > Now at resume we will add cpu2 first and so need to update policy->cpu > to 2.. > But for all other CPUs in this cluster we return early from > cpufreq_add_dev() and call cpufreq_add_policy_cpu() as policy->cpus > was fixed by call to ->init() for the first cpu of this cluster.. This was wrong, we need a valid policy->related_cpus field which is always valid and so we return early here too, but not for the first cpu of cluster. > And so we never reach the line: policy->cpu = cpu; > > For the first cpu of non-boot cluster we need to call update_policy_cpu() > and not for others.. that's correct, thought I have one more idea.. :) > But for the boot cluster if we can call ->init() somehow at resume time, > then things would be fairly similar in both cases.. Not required.. its all working already.. and so Stephen shouldn't need your patch for Tegra, but rather my patches that fix other cpufreq bugs.. Now coming back to the ideas I have... Same code will work if hotplug sequence is fixed a bit. Why aren't we doing exact opposite of suspend in resume? We are removing CPUs (leaving the boot cpu) in ascending order and then adding them back in same order.. Why? Why not remove CPUs in descending order and add in ascending order? Or remove in ascending order and add in descending order? That way policy->cpu will be updated with the right cpu and your patch wouldn't be required.. I am not saying that this can't be hacked/fixed in cpufreq but suspend/resume may also be fixed and that looks logically more correct to me.. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-12 5:52 ` Viresh Kumar @ 2013-09-12 6:26 ` Srivatsa S. Bhat 2013-09-12 6:41 ` Viresh Kumar 2013-09-12 15:55 ` Stephen Warren 0 siblings, 2 replies; 38+ messages in thread From: Srivatsa S. Bhat @ 2013-09-12 6:26 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/12/2013 11:22 AM, Viresh Kumar wrote: > Let me fix my mail first.. I was running out of time yesterday and so couldn't > frame things correctly :) > > On 11 September 2013 17:29, Viresh Kumar <viresh.kumar@linaro.org> wrote: >> Okay.. There are two different ways in which cpufreq_add_dev() work >> currently.. >> >> Boot cluster (i.e. policy with boot CPU) >> --------------- >> >> Here cpufreq_remove_dev() is never called for boot cpu but all others. >> And similarly cpufreq_add_dev() is never called for boot cpu but all others. >> >> Now policy->cpu contains meaningful cpu at beginning of resume and >> we don't need to modify that at all.. For all the remaining CPUs we >> better call cpufreq_add_policy_cpu() rather.. > > And this should be done without your patch. Or actually we will simply > return from this place. Atleast for systems with single cluster, like Tegra. > > policy->related_cpus is still valid after resume and we haven't removed > policy from the cpufreq_policy_list (though there is a bug which I have > fixed separately and sent it to you..).. So no change required for a single > cluster system.. > >> Non-boot Cluster >> --------------------- >> >> All CPUs here are removed and at the end policy->cpu contains the last >> cpu removed.. So, for a cluster with cpu 2 and 3.... it will contain 3.. >> >> Now at resume we will add cpu2 first and so need to update policy->cpu >> to 2.. > >> But for all other CPUs in this cluster we return early from >> cpufreq_add_dev() and call cpufreq_add_policy_cpu() as policy->cpus >> was fixed by call to ->init() for the first cpu of this cluster.. > > This was wrong, we need a valid policy->related_cpus field which is always > valid and so we return early here too, but not for the first cpu of cluster. > >> And so we never reach the line: policy->cpu = cpu; >> >> For the first cpu of non-boot cluster we need to call update_policy_cpu() >> and not for others.. > > that's correct, thought I have one more idea.. :) > >> But for the boot cluster if we can call ->init() somehow at resume time, >> then things would be fairly similar in both cases.. > > Not required.. its all working already.. and so Stephen shouldn't need your > patch for Tegra, but rather my patches that fix other cpufreq bugs.. > > Now coming back to the ideas I have... > Same code will work if hotplug sequence is fixed a bit. Why aren't we doing > exact opposite of suspend in resume? > > We are removing CPUs (leaving the boot cpu) in ascending order and then > adding them back in same order.. Why? > > Why not remove CPUs in descending order and add in ascending order? Or > remove in ascending order and add in descending order? > I had the same thought when solving this bug.. We have had similar issues with CPU hotplug notifiers too: why are they invoked in the same order during both CPU down and up, instead of reversing the order? I even had a patchset to perform reverse-invocation of notifiers.. http://lwn.net/Articles/508072/ ... but people didn't find that very compelling to have. > That way policy->cpu will be updated with the right cpu and your patch wouldn't > be required.. > > I am not saying that this can't be hacked/fixed in cpufreq but suspend/resume > may also be fixed and that looks logically more correct to me.. > It does to me too, but I think the reason nobody really bothered is because perhaps not many other subsystems care about the order in which CPUs are torn down or brought up; they just need the total number to match.. cpufreq is one exception as we saw with this bug. Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-12 6:26 ` Srivatsa S. Bhat @ 2013-09-12 6:41 ` Viresh Kumar 2013-09-12 6:46 ` Srivatsa S. Bhat 2013-09-12 15:55 ` Stephen Warren 1 sibling, 1 reply; 38+ messages in thread From: Viresh Kumar @ 2013-09-12 6:41 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 12 September 2013 11:56, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > I had the same thought when solving this bug.. We have had similar issues with > CPU hotplug notifiers too: why are they invoked in the same order during both > CPU down and up, instead of reversing the order? I even had a patchset to perform > reverse-invocation of notifiers.. http://lwn.net/Articles/508072/ > ... but people didn't find that very compelling to have. > > It does to me too, but I think the reason nobody really bothered is because perhaps > not many other subsystems care about the order in which CPUs are torn down or > brought up; they just need the total number to match.. cpufreq is one exception > as we saw with this bug. Probably its time to re-spin that series and make CPUFreq as one of the users of that patchset.. Resume should be just opposite of suspend and so that patchset would make sense even if not many people care about it :) Over that there is one more problem that I see, don't know if it is really a big issue.. After a suspend/resume value of policy->cpu may get changed... And so the hierarchy of sysfs cpufreq files too.. Folder that had links to other CPUs folder can now be actual folders instead of links and vice versa.. Don't know if this can break something ?? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-12 6:41 ` Viresh Kumar @ 2013-09-12 6:46 ` Srivatsa S. Bhat 2013-09-12 6:52 ` Viresh Kumar 0 siblings, 1 reply; 38+ messages in thread From: Srivatsa S. Bhat @ 2013-09-12 6:46 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/12/2013 12:11 PM, Viresh Kumar wrote: > On 12 September 2013 11:56, Srivatsa S. Bhat > <srivatsa.bhat@linux.vnet.ibm.com> wrote: >> I had the same thought when solving this bug.. We have had similar issues with >> CPU hotplug notifiers too: why are they invoked in the same order during both >> CPU down and up, instead of reversing the order? I even had a patchset to perform >> reverse-invocation of notifiers.. http://lwn.net/Articles/508072/ >> ... but people didn't find that very compelling to have. >> > >> It does to me too, but I think the reason nobody really bothered is because perhaps >> not many other subsystems care about the order in which CPUs are torn down or >> brought up; they just need the total number to match.. cpufreq is one exception >> as we saw with this bug. > > Probably its time to re-spin that series and make CPUFreq as one of the users > of that patchset.. Resume should be just opposite of suspend and so > that patchset > would make sense even if not many people care about it :) > > Over that there is one more problem that I see, don't know if it is really a big > issue.. > > After a suspend/resume value of policy->cpu may get changed... And so the > hierarchy of sysfs cpufreq files too.. Folder that had links to other > CPUs folder > can now be actual folders instead of links and vice versa.. > > Don't know if this can break something ?? > Interesting observation :-) But we just managed to retain sysfs file permissions across suspend/resume with a lot of trouble and regressions. That's probably good enough for some time to come ;-) We can retain folder/links when somebody really finds a need to do that ;-) Of course, if we change the suspend/resume sequence and that fixes this, that would be like getting it for free, nobody would say no to it ;-) Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-12 6:46 ` Srivatsa S. Bhat @ 2013-09-12 6:52 ` Viresh Kumar 2013-09-12 7:14 ` Srivatsa S. Bhat 0 siblings, 1 reply; 38+ messages in thread From: Viresh Kumar @ 2013-09-12 6:52 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 12 September 2013 12:16, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > Of course, if we change the suspend/resume sequence and that fixes this, that > would be like getting it for free, nobody would say no to it ;-) Not really :) Policy with 4 CPUs, 0,1,2,3, policy->cpu currently set to 1, 2 or 3... We will remove CPUs in order 3,2,1 and add back in 1,2,3... Or Vice Versa policy->cpu after resume is 0 :) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-12 6:52 ` Viresh Kumar @ 2013-09-12 7:14 ` Srivatsa S. Bhat 0 siblings, 0 replies; 38+ messages in thread From: Srivatsa S. Bhat @ 2013-09-12 7:14 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/12/2013 12:22 PM, Viresh Kumar wrote: > On 12 September 2013 12:16, Srivatsa S. Bhat > <srivatsa.bhat@linux.vnet.ibm.com> wrote: >> Of course, if we change the suspend/resume sequence and that fixes this, that >> would be like getting it for free, nobody would say no to it ;-) > > Not really :) > > Policy with 4 CPUs, 0,1,2,3, policy->cpu currently set to 1, 2 or 3... > > We will remove CPUs in order 3,2,1 and add back in 1,2,3... Or Vice Versa > > policy->cpu after resume is 0 :) > Ah, great counter-example! :-) Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-12 6:26 ` Srivatsa S. Bhat 2013-09-12 6:41 ` Viresh Kumar @ 2013-09-12 15:55 ` Stephen Warren 2013-09-12 17:26 ` Srivatsa S. Bhat 1 sibling, 1 reply; 38+ messages in thread From: Stephen Warren @ 2013-09-12 15:55 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Viresh Kumar, Rafael J. Wysocki, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/12/2013 12:26 AM, Srivatsa S. Bhat wrote: > On 09/12/2013 11:22 AM, Viresh Kumar wrote: ... >> Now coming back to the ideas I have... >> Same code will work if hotplug sequence is fixed a bit. Why aren't we doing >> exact opposite of suspend in resume? >> >> We are removing CPUs (leaving the boot cpu) in ascending order and then >> adding them back in same order.. Why? >> >> Why not remove CPUs in descending order and add in ascending order? Or >> remove in ascending order and add in descending order? > > I had the same thought when solving this bug.. We have had similar issues with > CPU hotplug notifiers too: why are they invoked in the same order during both > CPU down and up, instead of reversing the order? I even had a patchset to perform > reverse-invocation of notifiers.. http://lwn.net/Articles/508072/ > ... but people didn't find that very compelling to have. I'm not sure if you're talking about the order the CPUs get plugged back in after resume, or the order of the (pair of?) notifiers that gets called for each individual CPU. Sorry if this is blindingly obvious, but with CPU hotplug, I can manually unplug/re-plug CPUs in any order I feel like, and cpufreq had better work if I do that. Given that, I don't think there's any particular need for suspend/resume to unplug/re-plug CPUs in any particular order for correctness at least, although perhaps it'd be nice cosmetically for some reason? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-12 15:55 ` Stephen Warren @ 2013-09-12 17:26 ` Srivatsa S. Bhat 2013-09-13 4:26 ` Viresh Kumar 0 siblings, 1 reply; 38+ messages in thread From: Srivatsa S. Bhat @ 2013-09-12 17:26 UTC (permalink / raw) To: Stephen Warren Cc: Viresh Kumar, Rafael J. Wysocki, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/12/2013 09:25 PM, Stephen Warren wrote: > On 09/12/2013 12:26 AM, Srivatsa S. Bhat wrote: >> On 09/12/2013 11:22 AM, Viresh Kumar wrote: > ... >>> Now coming back to the ideas I have... >>> Same code will work if hotplug sequence is fixed a bit. Why aren't we doing >>> exact opposite of suspend in resume? >>> >>> We are removing CPUs (leaving the boot cpu) in ascending order and then >>> adding them back in same order.. Why? >>> >>> Why not remove CPUs in descending order and add in ascending order? Or >>> remove in ascending order and add in descending order? >> >> I had the same thought when solving this bug.. We have had similar issues with >> CPU hotplug notifiers too: why are they invoked in the same order during both >> CPU down and up, instead of reversing the order? I even had a patchset to perform >> reverse-invocation of notifiers.. http://lwn.net/Articles/508072/ >> ... but people didn't find that very compelling to have. > > I'm not sure if you're talking about the order the CPUs get plugged back > in after resume, or the order of the (pair of?) notifiers that gets > called for each individual CPU. Well, initially we were discussing about the order the CPUs get plugged back in after resume, then I gave an example of a similar ordering issue in the way the registered CPU hotplug notifiers are invoked: during *both* CPU online and offline, the chain of notifiers are invoked in the same order, rather than in the opposite order. To work around this unnatural ordering, subsystems have resorted to splitting up their callbacks into multiple callbacks and then assigning appropriate priorities etc., to them, to get the ordering right. We could have done away with all that complexity/confusion if the core CPU hotplug code had set the ordering right. And that's what my patchset tried to do. Anyway, nevermind, as of now, subsystems do work around this suitably, so there is no known bug as such at the present. Just that we could have probably done it a better way, that's all. > > Sorry if this is blindingly obvious, but with CPU hotplug, I can > manually unplug/re-plug CPUs in any order I feel like, and cpufreq had > better work if I do that. Given that, I don't think there's any > particular need for suspend/resume to unplug/re-plug CPUs in any > particular order for correctness at least, although perhaps it'd be nice > cosmetically for some reason? > You're absolutely right! Regular CPU hotplug is more demanding than suspend/resume in the context we are discussing, since any CPU can be hotplugged at any time and put back in any order. So code like cpufreq should be prepared to work with any ordering. So yes, its not very compelling to change the suspend/resume CPU hotplug ordering, since cpufreq has to deal with the regular (and harsher) situation anyway. That said, one subtle but key distinction between regular CPU hotplug and suspend/resume is that, the kernel guarantees that the state of the system after resume will be exactly the same as it was before suspend (atleast to the extent that is practically possible). On the other hand, no such guarantees are made for regular CPU hotplug, since its almost as if the user is mutating the system at runtime!. However, suspend/resume as a concept itself provides the above mentioned guarantee, and in fact, the very fact that CPU hotplug is used under the hood for suspend/resume is just an implementation detail, and should not affect the guarantees. That's the reason we often pay special attention to CPU hotplug handling in the suspend/resume path, such as preserving sysfs file permissions etc.. On those lines, Viresh and me were just wondering if 'fixing' the suspend/ resume CPU hotplug sequence would yield any additional benefits to better serve this guarantee. That was the context in which this discussion happened. Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-12 17:26 ` Srivatsa S. Bhat @ 2013-09-13 4:26 ` Viresh Kumar 0 siblings, 0 replies; 38+ messages in thread From: Viresh Kumar @ 2013-09-13 4:26 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Stephen Warren, Rafael J. Wysocki, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 12 September 2013 22:56, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > On 09/12/2013 09:25 PM, Stephen Warren wrote: > Anyway, nevermind, as of now, subsystems do work around this suitably, so > there is no known bug as such at the present. Just that we could have probably > done it a better way, that's all. Yeah, there is no bug as of now due to the number of hacks adopted by different framework.. I believe we can still have a cleanup series to take care of this stuff.. That would be some improvement and would be better for future.. Otherwise this kind of problems would keep coming again and again.. > You're absolutely right! Regular CPU hotplug is more demanding than > suspend/resume in the context we are discussing, since any CPU can be > hotplugged at any time and put back in any order. So code like cpufreq should > be prepared to work with any ordering. And that part is well implemented and tested as far as I know.. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 10:44 ` Viresh Kumar 2013-09-11 10:45 ` Viresh Kumar @ 2013-09-11 11:10 ` Srivatsa S. Bhat 2013-09-11 11:15 ` Viresh Kumar 1 sibling, 1 reply; 38+ messages in thread From: Srivatsa S. Bhat @ 2013-09-11 11:10 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/11/2013 04:14 PM, Viresh Kumar wrote: > On 11 September 2013 15:51, Srivatsa S. Bhat > <srivatsa.bhat@linux.vnet.ibm.com> wrote: >> On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote: >>> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote: > >>>> Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown >>>> during suspend/resume". > > Sorry Stephen, I was away on vacations and came back yesterday only.. > And was badly stuck in something other CPUFreq bugs until now :) > >> Sure, Rafael. Thanks for CC'ing me. > > Thanks for jumping in and helping us out buddy!!!. > No problem :-) Besides, I am the one who broke it ;-( Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 11:10 ` Srivatsa S. Bhat @ 2013-09-11 11:15 ` Viresh Kumar 2013-09-11 11:17 ` Srivatsa S. Bhat 0 siblings, 1 reply; 38+ messages in thread From: Viresh Kumar @ 2013-09-11 11:15 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 11 September 2013 16:40, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > On 09/11/2013 04:14 PM, Viresh Kumar wrote: >> On 11 September 2013 15:51, Srivatsa S. Bhat >> <srivatsa.bhat@linux.vnet.ibm.com> wrote: >>> On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote: >>>> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote: >> >>>>> Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown >>>>> during suspend/resume". >> >> Sorry Stephen, I was away on vacations and came back yesterday only.. >> And was badly stuck in something other CPUFreq bugs until now :) >> >>> Sure, Rafael. Thanks for CC'ing me. >> >> Thanks for jumping in and helping us out buddy!!!. >> > > No problem :-) Besides, I am the one who broke it ;-( I believe you missed some part of my mail ? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 11:15 ` Viresh Kumar @ 2013-09-11 11:17 ` Srivatsa S. Bhat 2013-09-11 11:41 ` Viresh Kumar 0 siblings, 1 reply; 38+ messages in thread From: Srivatsa S. Bhat @ 2013-09-11 11:17 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/11/2013 04:45 PM, Viresh Kumar wrote: > On 11 September 2013 16:40, Srivatsa S. Bhat > <srivatsa.bhat@linux.vnet.ibm.com> wrote: >> On 09/11/2013 04:14 PM, Viresh Kumar wrote: >>> On 11 September 2013 15:51, Srivatsa S. Bhat >>> <srivatsa.bhat@linux.vnet.ibm.com> wrote: >>>> On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote: >>>>> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote: >>> >>>>>> Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown >>>>>> during suspend/resume". >>> >>> Sorry Stephen, I was away on vacations and came back yesterday only.. >>> And was badly stuck in something other CPUFreq bugs until now :) >>> >>>> Sure, Rafael. Thanks for CC'ing me. >>> >>> Thanks for jumping in and helping us out buddy!!!. >>> >> >> No problem :-) Besides, I am the one who broke it ;-( > > I believe you missed some part of my mail ? > See my next reply :-) I was composing it :-) Man, you are *fast*! ;-) Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 11:17 ` Srivatsa S. Bhat @ 2013-09-11 11:41 ` Viresh Kumar 0 siblings, 0 replies; 38+ messages in thread From: Viresh Kumar @ 2013-09-11 11:41 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 11 September 2013 16:47, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > See my next reply :-) I was composing it :-) > > Man, you are *fast*! ;-) haha... ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 10:21 ` Srivatsa S. Bhat 2013-09-11 10:44 ` Viresh Kumar @ 2013-09-11 11:09 ` Srivatsa S. Bhat 2013-09-11 16:05 ` Stephen Warren 2 siblings, 0 replies; 38+ messages in thread From: Srivatsa S. Bhat @ 2013-09-11 11:09 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Stephen Warren, Viresh Kumar, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/11/2013 03:51 PM, Srivatsa S. Bhat wrote: > On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote: >> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote: >>> On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote: >>>> On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote: >>>>> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote: >>>>>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote: >>>>>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote: >>>>>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote: >>>>>>>>> Viresh, >>>>>>>>> >>>>>>>>> I'm seeing the crash below when suspending my system for the second time. >>>>>>>>> >>>>>>>>> I can avoid this with the following patch, which adds a check which >>>>>>>>> already exists in all-but-one other places that the same lookup is made: >>>>>>>> >>>>>>>> Which kernel did you test? >>>>>>> >>>>>>> next-20130909. >>>>>> >>>>>> Is it reproducible with the current mainline? >>>>> >>>>> This does not affect v3.11, but does affect current HEAD; 300893b "Merge >>>>> tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs". >>>> >>>> What system does it break on? >>> >>> A dual-core ARM system (NVIDIA Tegra20 SoC, Harmony board). >>> >>>> Any chance to bisect cpufreq changes between 3.11 and the current HEAD? >>> >>> Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown >>> during suspend/resume". >> >> Thanks! >> >> Srivatsa, any chance to look into this? >> > > Sure, Rafael. Thanks for CC'ing me. > > Stephen, I went through the code and I think I found out what is going wrong. > Can you please try the following patch? > > Regards, > Srivatsa S. Bhat > > ---------------------------------------------------------------------------- > > > From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> > Subject: [PATCH] cpufreq: Fix crash in cpufreq-stats during suspend/resume > And here is a follow-up patch, on top of this one. I hit the bug while working on this patch, but it doesn't occur in practice, since none of the existing callers call update_policy_cpu() with new == old. (I had called it like that by mistake while working on the fix and was surprised by the level of problems it caused; hence thought of adding a check). Regards, Srivatsa S. Bhat ----------------------------------------------------------------------------- From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Subject: [PATCH] cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu If update_policy_cpu() is invoked with the existing policy->cpu itself as the new-cpu parameter, then a lot of things can go terribly wrong. In its present form, update_policy_cpu() always assumes that the new-cpu is different from policy->cpu and invokes other functions to perform their respective updates. And those functions implement the actual update like this: per_cpu(..., new_cpu) = per_cpu(..., last_cpu); per_cpu(..., last_cpu) = NULL; Thus, when new_cpu == last_cpu, the final NULL assignment makes the per-cpu references vanish into thin air! (memory leak). From there, it leads to more problems: cpufreq_stats_create_table() now doesn't find the per-cpu reference and hence tries to create a new sysfs-group; but sysfs already had created the group earlier, so it complains that it cannot create a duplicate filename. In short, the repercussions of a rather innocuous invocation of update_policy_cpu() can turn out to be pretty nasty. Ideally update_policy_cpu() should handle this situation (new == last) gracefully, and not lead to such severe problems. So fix it by adding an appropriate check. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> --- drivers/cpufreq/cpufreq.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 62bdb95..a61b7a1 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -949,6 +949,9 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy) static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu) { + if (cpu == policy->cpu) + return; + policy->last_cpu = policy->cpu; policy->cpu = cpu; ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 10:21 ` Srivatsa S. Bhat 2013-09-11 10:44 ` Viresh Kumar 2013-09-11 11:09 ` Srivatsa S. Bhat @ 2013-09-11 16:05 ` Stephen Warren 2013-09-11 18:03 ` Srivatsa S. Bhat 2 siblings, 1 reply; 38+ messages in thread From: Stephen Warren @ 2013-09-11 16:05 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/11/2013 04:21 AM, Srivatsa S. Bhat wrote: > On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote: >> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote: >>> On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote: >>>> On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote: >>>>> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote: >>>>>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote: >>>>>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote: >>>>>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote: >>>>>>>>> Viresh, >>>>>>>>> >>>>>>>>> I'm seeing the crash below when suspending my system for the second time. ... > Stephen, I went through the code and I think I found out what is going wrong. > Can you please try the following patch? Unfortunately, I still see the exact same failure/backtrace with this patch applied. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 16:05 ` Stephen Warren @ 2013-09-11 18:03 ` Srivatsa S. Bhat 2013-09-11 18:42 ` Srivatsa S. Bhat 0 siblings, 1 reply; 38+ messages in thread From: Srivatsa S. Bhat @ 2013-09-11 18:03 UTC (permalink / raw) To: Stephen Warren Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/11/2013 09:35 PM, Stephen Warren wrote: > On 09/11/2013 04:21 AM, Srivatsa S. Bhat wrote: >> On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote: >>> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote: >>>> On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote: >>>>> On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote: >>>>>> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote: >>>>>>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote: >>>>>>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote: >>>>>>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote: >>>>>>>>>> Viresh, >>>>>>>>>> >>>>>>>>>> I'm seeing the crash below when suspending my system for the second time. > ... >> Stephen, I went through the code and I think I found out what is going wrong. >> Can you please try the following patch? > > Unfortunately, I still see the exact same failure/backtrace with this > patch applied. > Oh, is it? Can you please give me the map of the related cpus on your system? (ie., cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus for each CPU.) I must be missing something... Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 18:03 ` Srivatsa S. Bhat @ 2013-09-11 18:42 ` Srivatsa S. Bhat 2013-09-11 19:03 ` Stephen Warren 2013-09-12 5:58 ` Viresh Kumar 0 siblings, 2 replies; 38+ messages in thread From: Srivatsa S. Bhat @ 2013-09-11 18:42 UTC (permalink / raw) To: Stephen Warren Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/11/2013 11:33 PM, Srivatsa S. Bhat wrote: > On 09/11/2013 09:35 PM, Stephen Warren wrote: >> On 09/11/2013 04:21 AM, Srivatsa S. Bhat wrote: >>> On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote: >>>> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote: >>>>> On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote: >>>>>> On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote: >>>>>>> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote: >>>>>>>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote: >>>>>>>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote: >>>>>>>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote: >>>>>>>>>>> Viresh, >>>>>>>>>>> >>>>>>>>>>> I'm seeing the crash below when suspending my system for the second time. >> ... >>> Stephen, I went through the code and I think I found out what is going wrong. >>> Can you please try the following patch? >> >> Unfortunately, I still see the exact same failure/backtrace with this >> patch applied. >> > > Oh, is it? Can you please give me the map of the related cpus on your > system? (ie., cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus for > each CPU.) > > I must be missing something... > OK, I took a second look at the code, and I suspect that applying the second patch might help. So can you try by applying both the patches please[1][2]? Basically here is my hunch: say CPUs 2 and 3 are part of a policy and 3 is the policy->cpu. During suspend, CPU 2 will be taken offline first, and we hit this code: 1199 if (cpu != policy->cpu && !frozen) { 1200 sysfs_remove_link(&dev->kobj, "cpufreq"); 1201 } else if (cpus > 1) { 1202 1203 new_cpu = cpufreq_nominate_new_policy_cpu(policy, cpu, frozen); 1204 if (new_cpu >= 0) { 1205 WARN_ON(lock_policy_rwsem_write(cpu)); 1206 update_policy_cpu(policy, new_cpu); 1207 unlock_policy_rwsem_write(cpu); 1208 1209 if (!frozen) { 1210 pr_debug("%s: policy Kobject moved to cpu: %d " 1211 "from: %d\n",__func__, new_cpu, cpu); 1212 } 1213 } 1214 } At this point, the first 'if' condition fails because frozen == true. So it enters the else part. But, policy->cpu is actually 3, not 2, and hence we invoke nominate_...() unnecessarily. That function returns 3 since that's the only CPU remaining in the mask, and so we call update_policy_cpu() with new_cpu = 3, and old_cpu was also 3! And that is the perfect recipe for disaster, with the current implementation of update_policy_cpu(). And my second patch [2] tried to fix this exact problem, although I didn't realize we actually had a case where we hit this in the current code itself. So please try by applying both the patches and let me know how it goes. Thanks a lot for your testing efforts! [1]. http://marc.info/?l=linux-kernel&m=137889516210816&w=2 [2]. http://marc.info/?l=linux-kernel&m=137889800511940&w=2 Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 18:42 ` Srivatsa S. Bhat @ 2013-09-11 19:03 ` Stephen Warren 2013-09-11 19:46 ` Srivatsa S. Bhat 2013-09-12 6:00 ` Viresh Kumar 2013-09-12 5:58 ` Viresh Kumar 1 sibling, 2 replies; 38+ messages in thread From: Stephen Warren @ 2013-09-11 19:03 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/11/2013 12:42 PM, Srivatsa S. Bhat wrote: ... > OK, I took a second look at the code, and I suspect that applying the > second patch might help. So can you try by applying both the patches > please[1][2]? > ... > [1]. http://marc.info/?l=linux-kernel&m=137889516210816&w=2 > [2]. http://marc.info/?l=linux-kernel&m=137889800511940&w=2 Yes, with both of those patches applies, the problem is solved:-) I was going to test the second patch originally, but it sounded like it was more of a cleanup rather than a fix for my issue, so I didn't bother when I found the problem wasn't solved by patch 1. Sorry! For the record, I'm testing on a 2-CPU system, so I'm not sure whether your explanation applies; it talks about CPUs 2 and 3 whereas I only have CPUs 0 and 1, but perhaps your explanation applies equally to any pair of CPUs? For the record, here's the information you requested in the other email: # cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus 0 1 0 1 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 19:03 ` Stephen Warren @ 2013-09-11 19:46 ` Srivatsa S. Bhat 2013-09-11 20:07 ` Stephen Warren 2013-09-12 6:04 ` Viresh Kumar 2013-09-12 6:00 ` Viresh Kumar 1 sibling, 2 replies; 38+ messages in thread From: Srivatsa S. Bhat @ 2013-09-11 19:46 UTC (permalink / raw) To: Stephen Warren Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/12/2013 12:33 AM, Stephen Warren wrote: > On 09/11/2013 12:42 PM, Srivatsa S. Bhat wrote: > ... >> OK, I took a second look at the code, and I suspect that applying the >> second patch might help. So can you try by applying both the patches >> please[1][2]? >> > ... >> [1]. http://marc.info/?l=linux-kernel&m=137889516210816&w=2 >> [2]. http://marc.info/?l=linux-kernel&m=137889800511940&w=2 > > Yes, with both of those patches applies, the problem is solved:-) > > I was going to test the second patch originally, but it sounded like it > was more of a cleanup rather than a fix for my issue, so I didn't bother > when I found the problem wasn't solved by patch 1. Sorry! > Well, honestly, even I had intended the second patch as a cleanup and hadn't asked you to test it ;-) Only when you reported that the first patch failed to solve your problem, I realized that the second patch was important too! :-) Thanks for testing! > For the record, I'm testing on a 2-CPU system, so I'm not sure whether > your explanation applies; it talks about CPUs 2 and 3 whereas I only > have CPUs 0 and 1, but perhaps your explanation applies equally to any > pair of CPUs? > Yes, it applies to any pair of CPUs, as long as the CPU first taken down is not the policy->cpu. In your case, it applies like this: IIUC, CPU0 is the boot cpu, and hence it wont be taken offline using hotplug. So only CPU 1 is taken offline during suspend. And if it is not the policy->cpu, then it hits the very same bug that I described with the analogy of CPUs 2 and 3. > For the record, here's the information you requested in the other email: > > # cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus > 0 1 > 0 1 > Thanks! It would have been more useful to somehow know which was the policy->cpu. But looking at the problem, certainly CPU0 was the policy->cpu in your case. Anyway, nevermind, good to know that the problem got solved by the 2 patches :-) And more importantly, we now fully understand the problems that can lead to the NULL deref and the solutions, as outlined below: Problem 1 : The last surviving policy->cpu during suspend might not be the one which is onlined during resume. So policy->cpu updates can get missed by the cpufreq-stats code. This is solved by patch 1. Problem 2 : If a CPU other than the policy->cpu goes down first during suspend, then we end up spuriously updating the policy->cpu field, making update_policy_cpu() go crazy. This is solved by patch 2. Ideally, I think we should fix the weird if/else condition, since *that* is the real culprit; and retain patch 2 as a cleanup. Something like this: From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Subject: [PATCH] cpufreq: Restructure if/else block to avoid unintended behavior In __cpufreq_remove_dev_prepare(), the code which decides whether to remove the sysfs link or nominate a new policy cpu, is governed by an if/else block with a rather complex set of conditionals. Worse, they harbor a subtlety which leads to certain unintended behavior. The code looks like this: if (cpu != policy->cpu && !frozen) { sysfs_remove_link(&dev->kobj, "cpufreq"); } else if (cpus > 1) { new_cpu = cpufreq_nominate_new_policy_cpu(...); ... update_policy_cpu(..., new_cpu); } The original intention was: If the CPU going offline is not policy->cpu, just remove the link. On the other hand, if the CPU going offline is the policy->cpu itself, handover the policy->cpu job to some other surviving CPU in that policy. But because the 'if' condition also includes the 'frozen' check, now there are *two* possibilities by which we can enter the 'else' block: 1. cpu == policy->cpu (intended) 2. cpu != policy->cpu && frozen (unintended) Due to the second (unintended) scenario, we end up spuriously nominating a CPU as the policy->cpu, even when the existing policy->cpu is alive and well. This can cause problems further down the line, especially when we end up nominating the same policy->cpu as the new one (ie., old == new), because it totally confuses update_policy_cpu(). To avoid this mess, restructure the if/else block to only do what was originally intended, and thus prevent any unwelcome surprises. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> --- drivers/cpufreq/cpufreq.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 62bdb95..247842b 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1193,8 +1193,9 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, cpumask_clear_cpu(cpu, policy->cpus); unlock_policy_rwsem_write(cpu); - if (cpu != policy->cpu && !frozen) { - sysfs_remove_link(&dev->kobj, "cpufreq"); + if (cpu != policy->cpu) { + if (!frozen) + sysfs_remove_link(&dev->kobj, "cpufreq"); } else if (cpus > 1) { new_cpu = cpufreq_nominate_new_policy_cpu(policy, cpu, frozen); So can you see if patch 1 + this above fix solves your problem as well? Then we can retain the original patch 2 as a cleanup, after these 2 patches. This organization also makes the code look better and understandable. Rafael, I'll post the 3 patches separately after knowing the results from Stephen. You don't have to bother deciphering the patch ordering just yet ;-) Regards, Srivatsa S. Bhat ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 19:46 ` Srivatsa S. Bhat @ 2013-09-11 20:07 ` Stephen Warren 2013-09-11 20:05 ` Srivatsa S. Bhat 2013-09-12 6:04 ` Viresh Kumar 1 sibling, 1 reply; 38+ messages in thread From: Stephen Warren @ 2013-09-11 20:07 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/11/2013 01:46 PM, Srivatsa S. Bhat wrote: > On 09/12/2013 12:33 AM, Stephen Warren wrote: >> On 09/11/2013 12:42 PM, Srivatsa S. Bhat wrote: >> ... >>> OK, I took a second look at the code, and I suspect that applying the >>> second patch might help. So can you try by applying both the patches >>> please[1][2]? >>> >> ... >>> [1]. http://marc.info/?l=linux-kernel&m=137889516210816&w=2 >>> [2]. http://marc.info/?l=linux-kernel&m=137889800511940&w=2 >> >> Yes, with both of those patches applies, the problem is solved:-) >> >> I was going to test the second patch originally, but it sounded like it >> was more of a cleanup rather than a fix for my issue, so I didn't bother >> when I found the problem wasn't solved by patch 1. Sorry! >> > > Well, honestly, even I had intended the second patch as a cleanup and > hadn't asked you to test it ;-) Only when you reported that the first patch > failed to solve your problem, I realized that the second patch was > important too! :-) Thanks for testing! > >> For the record, I'm testing on a 2-CPU system, so I'm not sure whether >> your explanation applies; it talks about CPUs 2 and 3 whereas I only >> have CPUs 0 and 1, but perhaps your explanation applies equally to any >> pair of CPUs? >> > > Yes, it applies to any pair of CPUs, as long as the CPU first taken down > is not the policy->cpu. In your case, it applies like this: > IIUC, CPU0 is the boot cpu, and hence it wont be taken offline using hotplug. > So only CPU 1 is taken offline during suspend. And if it is not the policy->cpu, > then it hits the very same bug that I described with the analogy of CPUs 2 > and 3. > >> For the record, here's the information you requested in the other email: >> >> # cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus >> 0 1 >> 0 1 > > Thanks! It would have been more useful to somehow know which was the > policy->cpu. But looking at the problem, certainly CPU0 was the policy->cpu > in your case. Yes, I believe CPU0 since, > # ls -l /sys/devices/system/cpu/cpu1/cpufreq > lrwxrwxrwx 1 root root 0 Jan 1 00:01 /sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq and cpu0/cpufreq/ has all the files in it. ... > So can you see if patch 1 + this above fix solves your problem as well? > Then we can retain the original patch 2 as a cleanup, after these 2 patches. > This organization also makes the code look better and understandable. Yes, both patch 1+3 and 1+3+2 work fine. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 20:07 ` Stephen Warren @ 2013-09-11 20:05 ` Srivatsa S. Bhat 0 siblings, 0 replies; 38+ messages in thread From: Srivatsa S. Bhat @ 2013-09-11 20:05 UTC (permalink / raw) To: Stephen Warren Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 09/12/2013 01:37 AM, Stephen Warren wrote: > On 09/11/2013 01:46 PM, Srivatsa S. Bhat wrote: >> On 09/12/2013 12:33 AM, Stephen Warren wrote: >>> On 09/11/2013 12:42 PM, Srivatsa S. Bhat wrote: >>> ... >>>> OK, I took a second look at the code, and I suspect that applying the >>>> second patch might help. So can you try by applying both the patches >>>> please[1][2]? >>>> >>> ... >>>> [1]. http://marc.info/?l=linux-kernel&m=137889516210816&w=2 >>>> [2]. http://marc.info/?l=linux-kernel&m=137889800511940&w=2 >>> >>> Yes, with both of those patches applies, the problem is solved:-) >>> >>> I was going to test the second patch originally, but it sounded like it >>> was more of a cleanup rather than a fix for my issue, so I didn't bother >>> when I found the problem wasn't solved by patch 1. Sorry! >>> >> >> Well, honestly, even I had intended the second patch as a cleanup and >> hadn't asked you to test it ;-) Only when you reported that the first patch >> failed to solve your problem, I realized that the second patch was >> important too! :-) Thanks for testing! >> >>> For the record, I'm testing on a 2-CPU system, so I'm not sure whether >>> your explanation applies; it talks about CPUs 2 and 3 whereas I only >>> have CPUs 0 and 1, but perhaps your explanation applies equally to any >>> pair of CPUs? >>> >> >> Yes, it applies to any pair of CPUs, as long as the CPU first taken down >> is not the policy->cpu. In your case, it applies like this: >> IIUC, CPU0 is the boot cpu, and hence it wont be taken offline using hotplug. >> So only CPU 1 is taken offline during suspend. And if it is not the policy->cpu, >> then it hits the very same bug that I described with the analogy of CPUs 2 >> and 3. >> >>> For the record, here's the information you requested in the other email: >>> >>> # cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus >>> 0 1 >>> 0 1 >> >> Thanks! It would have been more useful to somehow know which was the >> policy->cpu. But looking at the problem, certainly CPU0 was the policy->cpu >> in your case. > > Yes, I believe CPU0 since, > >> # ls -l /sys/devices/system/cpu/cpu1/cpufreq >> lrwxrwxrwx 1 root root 0 Jan 1 00:01 /sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq > > and cpu0/cpufreq/ has all the files in it. > > ... Ah, nice! >> So can you see if patch 1 + this above fix solves your problem as well? >> Then we can retain the original patch 2 as a cleanup, after these 2 patches. >> This organization also makes the code look better and understandable. > > Yes, both patch 1+3 and 1+3+2 work fine. > Cool! Thanks a lot for all your testing efforts Stephen! :-) Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 19:46 ` Srivatsa S. Bhat 2013-09-11 20:07 ` Stephen Warren @ 2013-09-12 6:04 ` Viresh Kumar 1 sibling, 0 replies; 38+ messages in thread From: Viresh Kumar @ 2013-09-12 6:04 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Stephen Warren, Rafael J. Wysocki, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 12 September 2013 01:16, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> > Subject: [PATCH] cpufreq: Restructure if/else block to avoid unintended behavior > > In __cpufreq_remove_dev_prepare(), the code which decides whether to remove > the sysfs link or nominate a new policy cpu, is governed by an if/else block > with a rather complex set of conditionals. Worse, they harbor a subtlety > which leads to certain unintended behavior. > > The code looks like this: > > if (cpu != policy->cpu && !frozen) { > sysfs_remove_link(&dev->kobj, "cpufreq"); > } else if (cpus > 1) { > new_cpu = cpufreq_nominate_new_policy_cpu(...); > ... > update_policy_cpu(..., new_cpu); > } > > The original intention was: > If the CPU going offline is not policy->cpu, just remove the link. > On the other hand, if the CPU going offline is the policy->cpu itself, > handover the policy->cpu job to some other surviving CPU in that policy. > > But because the 'if' condition also includes the 'frozen' check, now there > are *two* possibilities by which we can enter the 'else' block: > > 1. cpu == policy->cpu (intended) > 2. cpu != policy->cpu && frozen (unintended) > > Due to the second (unintended) scenario, we end up spuriously nominating > a CPU as the policy->cpu, even when the existing policy->cpu is alive and > well. This can cause problems further down the line, especially when we end > up nominating the same policy->cpu as the new one (ie., old == new), > because it totally confuses update_policy_cpu(). > > To avoid this mess, restructure the if/else block to only do what was > originally intended, and thus prevent any unwelcome surprises. > > Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> > --- > > drivers/cpufreq/cpufreq.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 62bdb95..247842b 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -1193,8 +1193,9 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, > cpumask_clear_cpu(cpu, policy->cpus); > unlock_policy_rwsem_write(cpu); > > - if (cpu != policy->cpu && !frozen) { > - sysfs_remove_link(&dev->kobj, "cpufreq"); > + if (cpu != policy->cpu) { > + if (!frozen) > + sysfs_remove_link(&dev->kobj, "cpufreq"); > } else if (cpus > 1) { > > new_cpu = cpufreq_nominate_new_policy_cpu(policy, cpu, frozen); Ahh, I wrote exactly the same crap.. Rafael please take Srivatsa's patch here :) > So can you see if patch 1 + this above fix solves your problem as well? > Then we can retain the original patch 2 as a cleanup, after these 2 patches. Why do we need 2 now? We should never hit that case I would say.. And If we do, there is some other bug in our code which we have hidden :) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 19:03 ` Stephen Warren 2013-09-11 19:46 ` Srivatsa S. Bhat @ 2013-09-12 6:00 ` Viresh Kumar 1 sibling, 0 replies; 38+ messages in thread From: Viresh Kumar @ 2013-09-12 6:00 UTC (permalink / raw) To: Stephen Warren Cc: Srivatsa S. Bhat, Rafael J. Wysocki, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 12 September 2013 00:33, Stephen Warren <swarren@wwwdotorg.org> wrote: > For the record, I'm testing on a 2-CPU system, so I'm not sure whether > your explanation applies; it talks about CPUs 2 and 3 whereas I only > have CPUs 0 and 1, but perhaps your explanation applies equally to any > pair of CPUs? It does apply to your system as well.. We remove cpu 1 in suspend and so try to nominate cpu 0 as policy->cpu, which it already is :) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: cpufreq_stats NULL deref on second system suspend 2013-09-11 18:42 ` Srivatsa S. Bhat 2013-09-11 19:03 ` Stephen Warren @ 2013-09-12 5:58 ` Viresh Kumar 1 sibling, 0 replies; 38+ messages in thread From: Viresh Kumar @ 2013-09-12 5:58 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Stephen Warren, Rafael J. Wysocki, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, cpufreq On 12 September 2013 00:12, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > OK, I took a second look at the code, and I suspect that applying the > second patch might help. So can you try by applying both the patches > please[1][2]? > > Basically here is my hunch: say CPUs 2 and 3 are part of a policy and > 3 is the policy->cpu. During suspend, CPU 2 will be taken offline first, > and we hit this code: > > 1199 if (cpu != policy->cpu && !frozen) { > 1200 sysfs_remove_link(&dev->kobj, "cpufreq"); > 1201 } else if (cpus > 1) { > 1202 > 1203 new_cpu = cpufreq_nominate_new_policy_cpu(policy, cpu, frozen); > 1204 if (new_cpu >= 0) { > 1205 WARN_ON(lock_policy_rwsem_write(cpu)); > 1206 update_policy_cpu(policy, new_cpu); > 1207 unlock_policy_rwsem_write(cpu); > 1208 > 1209 if (!frozen) { > 1210 pr_debug("%s: policy Kobject moved to cpu: %d " > 1211 "from: %d\n",__func__, new_cpu, cpu); > 1212 } > 1213 } > 1214 } > > At this point, the first 'if' condition fails because frozen == true. > So it enters the else part. But, policy->cpu is actually 3, not 2, > and hence we invoke nominate_...() unnecessarily. That function returns > 3 since that's the only CPU remaining in the mask, and so we call > update_policy_cpu() with new_cpu = 3, and old_cpu was also 3! And that > is the perfect recipe for disaster, with the current implementation of > update_policy_cpu(). The problem here is not the wrong implementation of update_policy_cpu() but 1199 if (cpu != policy->cpu && !frozen) { Though I have fixed it before looking into your replies :) ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2013-09-13 4:26 UTC | newest] Thread overview: 38+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-09 19:22 cpufreq_stats NULL deref on second system suspend Stephen Warren 2013-09-09 20:01 ` Rafael J. Wysocki 2013-09-09 20:01 ` Stephen Warren 2013-09-09 20:24 ` Rafael J. Wysocki 2013-09-09 21:29 ` Stephen Warren 2013-09-09 23:14 ` Rafael J. Wysocki 2013-09-10 20:53 ` Stephen Warren 2013-09-10 22:34 ` Rafael J. Wysocki 2013-09-11 10:21 ` Srivatsa S. Bhat 2013-09-11 10:44 ` Viresh Kumar 2013-09-11 10:45 ` Viresh Kumar 2013-09-11 11:14 ` Srivatsa S. Bhat 2013-09-11 11:59 ` Viresh Kumar 2013-09-11 13:56 ` Srivatsa S. Bhat 2013-09-12 5:52 ` Viresh Kumar 2013-09-12 6:26 ` Srivatsa S. Bhat 2013-09-12 6:41 ` Viresh Kumar 2013-09-12 6:46 ` Srivatsa S. Bhat 2013-09-12 6:52 ` Viresh Kumar 2013-09-12 7:14 ` Srivatsa S. Bhat 2013-09-12 15:55 ` Stephen Warren 2013-09-12 17:26 ` Srivatsa S. Bhat 2013-09-13 4:26 ` Viresh Kumar 2013-09-11 11:10 ` Srivatsa S. Bhat 2013-09-11 11:15 ` Viresh Kumar 2013-09-11 11:17 ` Srivatsa S. Bhat 2013-09-11 11:41 ` Viresh Kumar 2013-09-11 11:09 ` Srivatsa S. Bhat 2013-09-11 16:05 ` Stephen Warren 2013-09-11 18:03 ` Srivatsa S. Bhat 2013-09-11 18:42 ` Srivatsa S. Bhat 2013-09-11 19:03 ` Stephen Warren 2013-09-11 19:46 ` Srivatsa S. Bhat 2013-09-11 20:07 ` Stephen Warren 2013-09-11 20:05 ` Srivatsa S. Bhat 2013-09-12 6:04 ` Viresh Kumar 2013-09-12 6:00 ` Viresh Kumar 2013-09-12 5:58 ` Viresh Kumar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).