cpufreq_stats NULL deref on second system suspend

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* cpufreq_stats NULL deref on second system suspend
@ 2013-09-09 19:22 Stephen Warren
  2013-09-09 20:01 ` Rafael J. Wysocki
  0 siblings, 1 reply; 38+ messages in thread
From: Stephen Warren @ 2013-09-09 19:22 UTC (permalink / raw)
  To: Viresh Kumar, linux-pm@vger.kernel.org; +Cc: Rafael J . Wysocki

Viresh,

I'm seeing the crash below when suspending my system for the second time.

I can avoid this with the following patch, which adds a check which
already exists in all-but-one other places that the same lookup is made:

> diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
> index 4cf0d28..d54f467 100644
> --- a/drivers/cpufreq/cpufreq_stats.c
> +++ b/drivers/cpufreq/cpufreq_stats.c
> @@ -266,6 +266,9 @@ static void cpufreq_stats_update_policy_cpu(struct cpufreq_policy *policy)
>         struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table,
>                         policy->last_cpu);
>  
> +       if (!stat)
> +               return;
> +
>         pr_debug("Updating stats_table for new_cpu %u from last_cpu %u\n",
>                         policy->cpu, policy->last_cpu);
>         per_cpu(cpufreq_stats_table, policy->cpu) = per_cpu(cpufreq_stats_table,

Is that a legitimate fix, or is there something more wrong here?

> [  76.065009] PM: suspend of devices complete after 452.313 msecs
> [  76.073117] PM: late suspend of devices complete after 1.748 msecs
> [  76.081832] PM: noirq suspend of devices complete after 2.091 msecs
> [  76.088533] Disabling non-boot CPUs...
> [  76.092779] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [  76.101396] pgd = ece14000
> [  76.104283] [00000000] *pgd=2d01e831, *pte=00000000, *ppte=00000000
> [  76.111086] Internal error: Oops: 817 [#1] PREEMPT SMP ARM
> [  76.116894] Modules linked in: brcmutil [last unloaded: brcmfmac]
> [  76.123468] CPU: 0 PID: 1100 Comm: bash Not tainted 3.11.0-next-20130903-00021-g4af676e-dirty #40
> [  76.132847] task: eda3c080 ti: ecd32000 task.ti: ecd32000
> [  76.138590] PC is at cpufreq_stat_notifier_policy+0x248/0x2e4
> [  76.144704] LR is at notifier_call_chain+0x44/0x84
> [  76.149796] pc : [<c038eae0>]   lr : [<c004493c>]   psr: 60000113
> [  76.149796] sp : ecd33d98 ip : 008c6000 fp : 00000002
> [  76.161908] r10: c07c6018 r9 : c07d0f08 r8 : ee338bc0
> [  76.167440] r7 : 00000004 r6 : 00000000 r5 : ee338bc0 r4 : 00000000
> [  76.174342] r3 : 00000000 r2 : 008c6000 r1 : 00000000 r0 : c081db18
> [  76.181249] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
> [  76.188798] Control: 10c5387d Table: 2ce1404a DAC: 00000015
> [  76.194880] Process bash (pid: 1100, stack limit = 0xecd32240)
> [  76.201057] Stack: (0xecd33d98 to 0xecd34000)
> [  76.205679] 3d80:                                                      ee331e10 00000000
> [  76.214336] 3da0: c07d0c74 ffffffff 00000000 00000000 00000004 ee338bc0 00000000 c07d0f08
> [  76.234928] 3dc0: 00000002 c004493c c07fb60c ffffffff ee338bc0 00000004 c07c5ff0 c0044b10
> [  76.255517] 3de0: 00000000 00200200 eda3c080 ee338bc0 00000001 c086e41c 00000001 c0044b40
> [  76.276076] 3e00: 00000000 c038c43c ee338bc0 c038cb4c c00ab890 c1093d4c c1093d4c 00000001
> [  76.296554] 3e20: 00000051 00000015 00000001 c0871064 00000015 00000001 ee089f00 c07d17cc
> [  76.317063] 3e40: 000f03b8 c038e468 c038e400 ffffffdc ecd33e88 c004493c 00000001 c07d03d4
> [  76.337592] 3e60: c07d0c74 00000010 00000001 c0025e10 ecd33e88 c07d03d4 c07d0c74 c0567cf0
> [  76.358112] 3e80: 000f03b8 c056a994 00000023 ecd33ea4 00000010 00000001 00000001 c07d03d4
> [  76.378755] 3ea0: c07d0c74 00000000 c0825300 c0026308 c0828a2c 00000000 00000003 c0828a2c
> [  76.399498] 3ec0: c00407e0 c005f058 000f03b8 c056a994 c06d91b8 ecd33eec 00000003 ecd33eec
> [  76.420368] 3ee0: 00000000 00000003 c08158c8 00000003 ed80f000 ee089f00 00000004 c005f39c
> [  76.441354] 3f00: 00000003 c06abb5c c0575d2c c005e1e4 00000004 00000004 ecd3f7c0 ecd3f7d8
> [  76.462347] 3f20: ecd33f80 c059128c ee0f2868 c01f3140 00000004 c0123630 ed829080 00000004
> [  76.483335] 3f40: 000ac408 ecd33f80 00000000 ecd32000 00000004 c00cfc10 edb71900 00000001
> [  76.504286] 3f60: 0000000a 00000000 00000000 ed829080 000ac408 00000000 00000004 c00cffe4
> [  76.525208] 3f80: 00000000 00000000 edab98c0 b6f3aa78 00000004 000ac408 00000004 c000f1e4
> [  76.546215] 3fa0: 00000000 c000f060 b6f3aa78 00000004 00000001 000ac408 00000004 00000000
> [  76.567344] 3fc0: b6f3aa78 00000004 000ac408 00000004 bed9995c 000a6094 00000000 000f03b8
> [  76.588609] 3fe0: 00000000 bed998dc b6eaab77 b6ee125c 40070010 00000001 656d5f70 7063006d
> [  76.610137] [<c038eae0>] (cpufreq_stat_notifier_policy+0x248/0x2e4) from [<c004493c>] (notifier_call_chain+0x44/0x84)
> [  76.634459] [<c004493c>] (notifier_call_chain+0x44/0x84) from [<c0044b10>] (__blocking_notifier_call_chain+0x48/0x60)
> [  76.658877] [<c0044b10>] (__blocking_notifier_call_chain+0x48/0x60) from [<c0044b40>] (blocking_notifier_call_chain+0x18/0x20)
> [  76.684322] [<c0044b40>] (blocking_notifier_call_chain+0x18/0x20) from [<c038cb4c>] (__cpufreq_remove_dev.isra.13+0x158/0x4a8)
> [  76.709933] [<c038cb4c>] (__cpufreq_remove_dev.isra.13+0x158/0x4a8) from [<c038e468>] (cpufreq_cpu_callback+0x68/0x70)
> [  76.734970] [<c038e468>] (cpufreq_cpu_callback+0x68/0x70) from [<c004493c>] (notifier_call_chain+0x44/0x84)
> [  76.759131] [<c004493c>] (notifier_call_chain+0x44/0x84) from [<c0025e10>] (__cpu_notify+0x28/0x44)
> [  76.782720] [<c0025e10>] (__cpu_notify+0x28/0x44) from [<c0567cf0>] (_cpu_down+0x80/0x238)
> [  76.805598] [<c0567cf0>] (_cpu_down+0x80/0x238) from [<c0026308>] (disable_nonboot_cpus+0x68/0xe8)
> [  76.829346] [<c0026308>] (disable_nonboot_cpus+0x68/0xe8) from [<c005f058>] (suspend_devices_and_enter+0x160/0x2f8)
> [  76.854747] [<c005f058>] (suspend_devices_and_enter+0x160/0x2f8) from [<c005f39c>] (pm_suspend+0x1ac/0x260)
> [  76.879489] [<c005f39c>] (pm_suspend+0x1ac/0x260) from [<c005e1e4>] (state_store+0x6c/0xbc)
> [  76.902912] [<c005e1e4>] (state_store+0x6c/0xbc) from [<c01f3140>] (kobj_attr_store+0x14/0x20)
> [  76.926666] [<c01f3140>] (kobj_attr_store+0x14/0x20) from [<c0123630>] (sysfs_write_file+0x168/0x198)
> [  76.951174] [<c0123630>] (sysfs_write_file+0x168/0x198) from [<c00cfc10>] (vfs_write+0xb0/0x194)
> [  76.975189] [<c00cfc10>] (vfs_write+0xb0/0x194) from [<c00cffe4>] (SyS_write+0x3c/0x70)
> [  76.998360] [<c00cffe4>] (SyS_write+0x3c/0x70) from [<c000f060>] (ret_fast_syscall+0x0/0x30)
> [  77.022001] Code: e5952010 e7992102 e78a3002 e595300c (e5863000)
> [  77.044143] ---[ end trace faeaf1120c9722e9 ]--- 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-09 19:22 cpufreq_stats NULL deref on second system suspend Stephen Warren
@ 2013-09-09 20:01 ` Rafael J. Wysocki
  2013-09-09 20:01   ` Stephen Warren
  0 siblings, 1 reply; 38+ messages in thread
From: Rafael J. Wysocki @ 2013-09-09 20:01 UTC (permalink / raw)
  To: Stephen Warren; +Cc: Viresh Kumar, linux-pm@vger.kernel.org

On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote:
> Viresh,
> 
> I'm seeing the crash below when suspending my system for the second time.
> 
> I can avoid this with the following patch, which adds a check which
> already exists in all-but-one other places that the same lookup is made:

Which kernel did you test?

> > diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
> > index 4cf0d28..d54f467 100644
> > --- a/drivers/cpufreq/cpufreq_stats.c
> > +++ b/drivers/cpufreq/cpufreq_stats.c
> > @@ -266,6 +266,9 @@ static void cpufreq_stats_update_policy_cpu(struct cpufreq_policy *policy)
> >         struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table,
> >                         policy->last_cpu);
> >  
> > +       if (!stat)
> > +               return;
> > +
> >         pr_debug("Updating stats_table for new_cpu %u from last_cpu %u\n",
> >                         policy->cpu, policy->last_cpu);
> >         per_cpu(cpufreq_stats_table, policy->cpu) = per_cpu(cpufreq_stats_table,
> 
> Is that a legitimate fix, or is there something more wrong here?
> 
> > [  76.065009] PM: suspend of devices complete after 452.313 msecs
> > [  76.073117] PM: late suspend of devices complete after 1.748 msecs
> > [  76.081832] PM: noirq suspend of devices complete after 2.091 msecs
> > [  76.088533] Disabling non-boot CPUs...
> > [  76.092779] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > [  76.101396] pgd = ece14000
> > [  76.104283] [00000000] *pgd=2d01e831, *pte=00000000, *ppte=00000000
> > [  76.111086] Internal error: Oops: 817 [#1] PREEMPT SMP ARM
> > [  76.116894] Modules linked in: brcmutil [last unloaded: brcmfmac]
> > [  76.123468] CPU: 0 PID: 1100 Comm: bash Not tainted 3.11.0-next-20130903-00021-g4af676e-dirty #40
> > [  76.132847] task: eda3c080 ti: ecd32000 task.ti: ecd32000
> > [  76.138590] PC is at cpufreq_stat_notifier_policy+0x248/0x2e4
> > [  76.144704] LR is at notifier_call_chain+0x44/0x84
> > [  76.149796] pc : [<c038eae0>]   lr : [<c004493c>]   psr: 60000113
> > [  76.149796] sp : ecd33d98 ip : 008c6000 fp : 00000002
> > [  76.161908] r10: c07c6018 r9 : c07d0f08 r8 : ee338bc0
> > [  76.167440] r7 : 00000004 r6 : 00000000 r5 : ee338bc0 r4 : 00000000
> > [  76.174342] r3 : 00000000 r2 : 008c6000 r1 : 00000000 r0 : c081db18
> > [  76.181249] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
> > [  76.188798] Control: 10c5387d Table: 2ce1404a DAC: 00000015
> > [  76.194880] Process bash (pid: 1100, stack limit = 0xecd32240)
> > [  76.201057] Stack: (0xecd33d98 to 0xecd34000)
> > [  76.205679] 3d80:                                                      ee331e10 00000000
> > [  76.214336] 3da0: c07d0c74 ffffffff 00000000 00000000 00000004 ee338bc0 00000000 c07d0f08
> > [  76.234928] 3dc0: 00000002 c004493c c07fb60c ffffffff ee338bc0 00000004 c07c5ff0 c0044b10
> > [  76.255517] 3de0: 00000000 00200200 eda3c080 ee338bc0 00000001 c086e41c 00000001 c0044b40
> > [  76.276076] 3e00: 00000000 c038c43c ee338bc0 c038cb4c c00ab890 c1093d4c c1093d4c 00000001
> > [  76.296554] 3e20: 00000051 00000015 00000001 c0871064 00000015 00000001 ee089f00 c07d17cc
> > [  76.317063] 3e40: 000f03b8 c038e468 c038e400 ffffffdc ecd33e88 c004493c 00000001 c07d03d4
> > [  76.337592] 3e60: c07d0c74 00000010 00000001 c0025e10 ecd33e88 c07d03d4 c07d0c74 c0567cf0
> > [  76.358112] 3e80: 000f03b8 c056a994 00000023 ecd33ea4 00000010 00000001 00000001 c07d03d4
> > [  76.378755] 3ea0: c07d0c74 00000000 c0825300 c0026308 c0828a2c 00000000 00000003 c0828a2c
> > [  76.399498] 3ec0: c00407e0 c005f058 000f03b8 c056a994 c06d91b8 ecd33eec 00000003 ecd33eec
> > [  76.420368] 3ee0: 00000000 00000003 c08158c8 00000003 ed80f000 ee089f00 00000004 c005f39c
> > [  76.441354] 3f00: 00000003 c06abb5c c0575d2c c005e1e4 00000004 00000004 ecd3f7c0 ecd3f7d8
> > [  76.462347] 3f20: ecd33f80 c059128c ee0f2868 c01f3140 00000004 c0123630 ed829080 00000004
> > [  76.483335] 3f40: 000ac408 ecd33f80 00000000 ecd32000 00000004 c00cfc10 edb71900 00000001
> > [  76.504286] 3f60: 0000000a 00000000 00000000 ed829080 000ac408 00000000 00000004 c00cffe4
> > [  76.525208] 3f80: 00000000 00000000 edab98c0 b6f3aa78 00000004 000ac408 00000004 c000f1e4
> > [  76.546215] 3fa0: 00000000 c000f060 b6f3aa78 00000004 00000001 000ac408 00000004 00000000
> > [  76.567344] 3fc0: b6f3aa78 00000004 000ac408 00000004 bed9995c 000a6094 00000000 000f03b8
> > [  76.588609] 3fe0: 00000000 bed998dc b6eaab77 b6ee125c 40070010 00000001 656d5f70 7063006d
> > [  76.610137] [<c038eae0>] (cpufreq_stat_notifier_policy+0x248/0x2e4) from [<c004493c>] (notifier_call_chain+0x44/0x84)
> > [  76.634459] [<c004493c>] (notifier_call_chain+0x44/0x84) from [<c0044b10>] (__blocking_notifier_call_chain+0x48/0x60)
> > [  76.658877] [<c0044b10>] (__blocking_notifier_call_chain+0x48/0x60) from [<c0044b40>] (blocking_notifier_call_chain+0x18/0x20)
> > [  76.684322] [<c0044b40>] (blocking_notifier_call_chain+0x18/0x20) from [<c038cb4c>] (__cpufreq_remove_dev.isra.13+0x158/0x4a8)
> > [  76.709933] [<c038cb4c>] (__cpufreq_remove_dev.isra.13+0x158/0x4a8) from [<c038e468>] (cpufreq_cpu_callback+0x68/0x70)
> > [  76.734970] [<c038e468>] (cpufreq_cpu_callback+0x68/0x70) from [<c004493c>] (notifier_call_chain+0x44/0x84)
> > [  76.759131] [<c004493c>] (notifier_call_chain+0x44/0x84) from [<c0025e10>] (__cpu_notify+0x28/0x44)
> > [  76.782720] [<c0025e10>] (__cpu_notify+0x28/0x44) from [<c0567cf0>] (_cpu_down+0x80/0x238)
> > [  76.805598] [<c0567cf0>] (_cpu_down+0x80/0x238) from [<c0026308>] (disable_nonboot_cpus+0x68/0xe8)
> > [  76.829346] [<c0026308>] (disable_nonboot_cpus+0x68/0xe8) from [<c005f058>] (suspend_devices_and_enter+0x160/0x2f8)
> > [  76.854747] [<c005f058>] (suspend_devices_and_enter+0x160/0x2f8) from [<c005f39c>] (pm_suspend+0x1ac/0x260)
> > [  76.879489] [<c005f39c>] (pm_suspend+0x1ac/0x260) from [<c005e1e4>] (state_store+0x6c/0xbc)
> > [  76.902912] [<c005e1e4>] (state_store+0x6c/0xbc) from [<c01f3140>] (kobj_attr_store+0x14/0x20)
> > [  76.926666] [<c01f3140>] (kobj_attr_store+0x14/0x20) from [<c0123630>] (sysfs_write_file+0x168/0x198)
> > [  76.951174] [<c0123630>] (sysfs_write_file+0x168/0x198) from [<c00cfc10>] (vfs_write+0xb0/0x194)
> > [  76.975189] [<c00cfc10>] (vfs_write+0xb0/0x194) from [<c00cffe4>] (SyS_write+0x3c/0x70)
> > [  76.998360] [<c00cffe4>] (SyS_write+0x3c/0x70) from [<c000f060>] (ret_fast_syscall+0x0/0x30)
> > [  77.022001] Code: e5952010 e7992102 e78a3002 e595300c (e5863000)
> > [  77.044143] ---[ end trace faeaf1120c9722e9 ]--- 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-09 20:01 ` Rafael J. Wysocki
@ 2013-09-09 20:01   ` Stephen Warren
  2013-09-09 20:24     ` Rafael J. Wysocki
  0 siblings, 1 reply; 38+ messages in thread
From: Stephen Warren @ 2013-09-09 20:01 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Viresh Kumar, linux-pm@vger.kernel.org

On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote:
> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote:
>> Viresh,
>>
>> I'm seeing the crash below when suspending my system for the second time.
>>
>> I can avoid this with the following patch, which adds a check which
>> already exists in all-but-one other places that the same lookup is made:
> 
> Which kernel did you test?

next-20130909.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-09 20:01   ` Stephen Warren
@ 2013-09-09 20:24     ` Rafael J. Wysocki
  2013-09-09 21:29       ` Stephen Warren
  0 siblings, 1 reply; 38+ messages in thread
From: Rafael J. Wysocki @ 2013-09-09 20:24 UTC (permalink / raw)
  To: Stephen Warren; +Cc: Viresh Kumar, linux-pm@vger.kernel.org

On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote:
> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote:
> > On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote:
> >> Viresh,
> >>
> >> I'm seeing the crash below when suspending my system for the second time.
> >>
> >> I can avoid this with the following patch, which adds a check which
> >> already exists in all-but-one other places that the same lookup is made:
> > 
> > Which kernel did you test?
> 
> next-20130909.

Is it reproducible with the current mainline?

Rafael


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-09 20:24     ` Rafael J. Wysocki
@ 2013-09-09 21:29       ` Stephen Warren
  2013-09-09 23:14         ` Rafael J. Wysocki
  0 siblings, 1 reply; 38+ messages in thread
From: Stephen Warren @ 2013-09-09 21:29 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Viresh Kumar, linux-pm@vger.kernel.org

On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote:
> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote:
>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote:
>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote:
>>>> Viresh,
>>>>
>>>> I'm seeing the crash below when suspending my system for the second time.
>>>>
>>>> I can avoid this with the following patch, which adds a check which
>>>> already exists in all-but-one other places that the same lookup is made:
>>>
>>> Which kernel did you test?
>>
>> next-20130909.
> 
> Is it reproducible with the current mainline?

This does not affect v3.11, but does affect current HEAD; 300893b "Merge
tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs".

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-09 21:29       ` Stephen Warren
@ 2013-09-09 23:14         ` Rafael J. Wysocki
  2013-09-10 20:53           ` Stephen Warren
  0 siblings, 1 reply; 38+ messages in thread
From: Rafael J. Wysocki @ 2013-09-09 23:14 UTC (permalink / raw)
  To: Stephen Warren; +Cc: Viresh Kumar, linux-pm@vger.kernel.org

On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote:
> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote:
> > On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote:
> >> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote:
> >>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote:
> >>>> Viresh,
> >>>>
> >>>> I'm seeing the crash below when suspending my system for the second time.
> >>>>
> >>>> I can avoid this with the following patch, which adds a check which
> >>>> already exists in all-but-one other places that the same lookup is made:
> >>>
> >>> Which kernel did you test?
> >>
> >> next-20130909.
> > 
> > Is it reproducible with the current mainline?
> 
> This does not affect v3.11, but does affect current HEAD; 300893b "Merge
> tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs".

What system does it break on?

Any chance to bisect cpufreq changes between 3.11 and the current HEAD?

Rafael


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-09 23:14         ` Rafael J. Wysocki
@ 2013-09-10 20:53           ` Stephen Warren
  2013-09-10 22:34             ` Rafael J. Wysocki
  0 siblings, 1 reply; 38+ messages in thread
From: Stephen Warren @ 2013-09-10 20:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Viresh Kumar, linux-pm@vger.kernel.org, Srivatsa S. Bhat

On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote:
> On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote:
>> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote:
>>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote:
>>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote:
>>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote:
>>>>>> Viresh,
>>>>>>
>>>>>> I'm seeing the crash below when suspending my system for the second time.
>>>>>>
>>>>>> I can avoid this with the following patch, which adds a check which
>>>>>> already exists in all-but-one other places that the same lookup is made:
>>>>>
>>>>> Which kernel did you test?
>>>>
>>>> next-20130909.
>>>
>>> Is it reproducible with the current mainline?
>>
>> This does not affect v3.11, but does affect current HEAD; 300893b "Merge
>> tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs".
> 
> What system does it break on?

A dual-core ARM system (NVIDIA Tegra20 SoC, Harmony board).

> Any chance to bisect cpufreq changes between 3.11 and the current HEAD?

Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown
during suspend/resume".

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-10 20:53           ` Stephen Warren
@ 2013-09-10 22:34             ` Rafael J. Wysocki
  2013-09-11 10:21               ` Srivatsa S. Bhat
  0 siblings, 1 reply; 38+ messages in thread
From: Rafael J. Wysocki @ 2013-09-10 22:34 UTC (permalink / raw)
  To: Stephen Warren, Srivatsa S. Bhat; +Cc: Viresh Kumar, linux-pm@vger.kernel.org

On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote:
> On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote:
> > On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote:
> >> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote:
> >>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote:
> >>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote:
> >>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote:
> >>>>>> Viresh,
> >>>>>>
> >>>>>> I'm seeing the crash below when suspending my system for the second time.
> >>>>>>
> >>>>>> I can avoid this with the following patch, which adds a check which
> >>>>>> already exists in all-but-one other places that the same lookup is made:
> >>>>>
> >>>>> Which kernel did you test?
> >>>>
> >>>> next-20130909.
> >>>
> >>> Is it reproducible with the current mainline?
> >>
> >> This does not affect v3.11, but does affect current HEAD; 300893b "Merge
> >> tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs".
> > 
> > What system does it break on?
> 
> A dual-core ARM system (NVIDIA Tegra20 SoC, Harmony board).
> 
> > Any chance to bisect cpufreq changes between 3.11 and the current HEAD?
> 
> Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown
> during suspend/resume".

Thanks!

Srivatsa, any chance to look into this?

Rafael


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-10 22:34             ` Rafael J. Wysocki
@ 2013-09-11 10:21               ` Srivatsa S. Bhat
  2013-09-11 10:44                 ` Viresh Kumar
                                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Srivatsa S. Bhat @ 2013-09-11 10:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Stephen Warren, Viresh Kumar, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote:
> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote:
>> On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote:
>>> On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote:
>>>> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote:
>>>>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote:
>>>>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote:
>>>>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote:
>>>>>>>> Viresh,
>>>>>>>>
>>>>>>>> I'm seeing the crash below when suspending my system for the second time.
>>>>>>>>
>>>>>>>> I can avoid this with the following patch, which adds a check which
>>>>>>>> already exists in all-but-one other places that the same lookup is made:
>>>>>>>
>>>>>>> Which kernel did you test?
>>>>>>
>>>>>> next-20130909.
>>>>>
>>>>> Is it reproducible with the current mainline?
>>>>
>>>> This does not affect v3.11, but does affect current HEAD; 300893b "Merge
>>>> tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs".
>>>
>>> What system does it break on?
>>
>> A dual-core ARM system (NVIDIA Tegra20 SoC, Harmony board).
>>
>>> Any chance to bisect cpufreq changes between 3.11 and the current HEAD?
>>
>> Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown
>> during suspend/resume".
> 
> Thanks!
> 
> Srivatsa, any chance to look into this?
>

Sure, Rafael. Thanks for CC'ing me.

Stephen, I went through the code and I think I found out what is going wrong.
Can you please try the following patch?

Regards,
Srivatsa S. Bhat

----------------------------------------------------------------------------

From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Subject: [PATCH] cpufreq: Fix crash in cpufreq-stats during suspend/resume

Stephen Warren reported that the cpufreq-stats code hits a NULL pointer
dereference during the second attempt to suspend a system. He also
pin-pointed the problem to commit 5302c3f "cpufreq: Perform light-weight
init/teardown during suspend/resume".

That commit actually ensured that the cpufreq-stats table and the
cpufreq-stats sysfs entries are *not* torn down (ie., not freed) during
suspend/resume, which makes it all the more surprising. However, it turns
out that the root-cause is not that we access an already freed memory, but
that the reference to the allocated memory gets moved around and we lose
track of that during resume, leading to the reported crash in a subsequent
suspend attempt.

In the suspend path, during CPU offline, the value of policy->cpu is updated
by choosing one of the surviving CPUs in that policy, as long as there is
atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is
invoked to update the reference to the stats structure by assigning it to
the new CPU. However, in the resume path, during CPU online, we end up
assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats
know about this. Thus the reference to the stats structure remains
(incorrectly) associated with the old CPU. So, in a subsequent suspend attempt,
during CPU offline, we end up accessing an incorrect location to get the
stats structure, which eventually leads to the NULL pointer dereference.

Fix this by letting cpufreq-stats know about the update of the policy->cpu
during CPU online in the resume path. (Also, move the update_policy_cpu()
function higher up in the file, so that __cpufreq_add_dev() can invoke
it).

Reported-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 drivers/cpufreq/cpufreq.c |   37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 5a64f66..62bdb95 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -947,6 +947,18 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
 	kfree(policy);
 }

+static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
+{
+	policy->last_cpu = policy->cpu;
+	policy->cpu = cpu;
+
+#ifdef CONFIG_CPU_FREQ_TABLE
+	cpufreq_frequency_table_update_policy_cpu(policy);
+#endif
+	blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
+			CPUFREQ_UPDATE_POLICY_CPU, policy);
+}
+
 static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
 			     bool frozen)
 {
@@ -1000,7 +1012,18 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
 	if (!policy)
 		goto nomem_out;

-	policy->cpu = cpu;
+
+	/*
+	 * In the resume path, since we restore a saved policy, the assignment
+	 * to policy->cpu is like an update of the existing policy, rather than
+	 * the creation of a brand new one. So we need to perform this update
+	 * by invoking update_policy_cpu().
+	 */
+	if (frozen && cpu != policy->cpu)
+		update_policy_cpu(policy, cpu);
+	else
+		policy->cpu = cpu;
+
 	policy->governor = CPUFREQ_DEFAULT_GOVERNOR;
 	cpumask_copy(policy->cpus, cpumask_of(cpu));

@@ -1092,18 +1115,6 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
 	return __cpufreq_add_dev(dev, sif, false);
 }

-static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
-{
-	policy->last_cpu = policy->cpu;
-	policy->cpu = cpu;
-
-#ifdef CONFIG_CPU_FREQ_TABLE
-	cpufreq_frequency_table_update_policy_cpu(policy);
-#endif
-	blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
-			CPUFREQ_UPDATE_POLICY_CPU, policy);
-}
-
 static int cpufreq_nominate_new_policy_cpu(struct cpufreq_policy *policy,
 					   unsigned int old_cpu, bool frozen)
 {

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 10:21               ` Srivatsa S. Bhat
@ 2013-09-11 10:44                 ` Viresh Kumar
  2013-09-11 10:45                   ` Viresh Kumar
  2013-09-11 11:10                   ` Srivatsa S. Bhat
  2013-09-11 11:09                 ` Srivatsa S. Bhat
  2013-09-11 16:05                 ` Stephen Warren
  2 siblings, 2 replies; 38+ messages in thread
From: Viresh Kumar @ 2013-09-11 10:44 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 11 September 2013 15:51, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote:
>> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote:

>>> Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown
>>> during suspend/resume".

Sorry Stephen, I was away on vacations and came back yesterday only..
And was badly stuck in something other CPUFreq bugs until now :)

> Sure, Rafael. Thanks for CC'ing me.

Thanks for jumping in and helping us out buddy!!!.

> Stephen, I went through the code and I think I found out what is going wrong.
> Can you please try the following patch?
>
> Regards,
> Srivatsa S. Bhat
>
> ----------------------------------------------------------------------------
>
>
> From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Subject: [PATCH] cpufreq: Fix crash in cpufreq-stats during suspend/resume
>
> Stephen Warren reported that the cpufreq-stats code hits a NULL pointer
> dereference during the second attempt to suspend a system. He also
> pin-pointed the problem to commit 5302c3f "cpufreq: Perform light-weight
> init/teardown during suspend/resume".
>
> That commit actually ensured that the cpufreq-stats table and the
> cpufreq-stats sysfs entries are *not* torn down (ie., not freed) during
> suspend/resume, which makes it all the more surprising. However, it turns
> out that the root-cause is not that we access an already freed memory, but
> that the reference to the allocated memory gets moved around and we lose
> track of that during resume, leading to the reported crash in a subsequent
> suspend attempt.
>
> In the suspend path, during CPU offline, the value of policy->cpu is updated
> by choosing one of the surviving CPUs in that policy, as long as there is
> atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is
> invoked to update the reference to the stats structure by assigning it to
> the new CPU. However, in the resume path, during CPU online, we end up
> assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats
> know about this. Thus the reference to the stats structure remains
> (incorrectly) associated with the old CPU. So, in a subsequent suspend attempt,
> during CPU offline, we end up accessing an incorrect location to get the
> stats structure, which eventually leads to the NULL pointer dereference.
>
> Fix this by letting cpufreq-stats know about the update of the policy->cpu
> during CPU online in the resume path. (Also, move the update_policy_cpu()
> function higher up in the file, so that __cpufreq_add_dev() can invoke
> it).

Observation looks good..

> Reported-by: Stephen Warren <swarren@nvidia.com>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
>
>  drivers/cpufreq/cpufreq.c |   37 ++++++++++++++++++++++++-------------
>  1 file changed, 24 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 5a64f66..62bdb95 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -947,6 +947,18 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
>         kfree(policy);
>  }
>
> +static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
> +{
> +       policy->last_cpu = policy->cpu;
> +       policy->cpu = cpu;
> +
> +#ifdef CONFIG_CPU_FREQ_TABLE
> +       cpufreq_frequency_table_update_policy_cpu(policy);
> +#endif
> +       blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
> +                       CPUFREQ_UPDATE_POLICY_CPU, policy);
> +}
> +
>  static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
>                              bool frozen)
>  {
> @@ -1000,7 +1012,18 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
>         if (!policy)
>                 goto nomem_out;
>
> -       policy->cpu = cpu;
> +
> +       /*
> +        * In the resume path, since we restore a saved policy, the assignment
> +        * to policy->cpu is like an update of the existing policy, rather than
> +        * the creation of a brand new one. So we need to perform this update
> +        * by invoking update_policy_cpu().
> +        */
> +       if (frozen && cpu != policy->cpu)
> +               update_policy_cpu(policy, cpu);
> +       else
> +               policy->cpu = cpu;
> +
>         policy->governor = CPUFREQ_DEFAULT_GOVERNOR;
>         cpumask_copy(policy->cpus, cpumask_of(cpu));
>
> @@ -1092,18 +1115,6 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
>         return __cpufreq_add_dev(dev, sif, false);
>  }
>
> -static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
> -{
> -       policy->last_cpu = policy->cpu;
> -       policy->cpu = cpu;
> -
> -#ifdef CONFIG_CPU_FREQ_TABLE
> -       cpufreq_frequency_table_update_policy_cpu(policy);
> -#endif
> -       blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
> -                       CPUFREQ_UPDATE_POLICY_CPU, policy);
> -}
> -

But I would have solved it differently :)

We don't really need to call update_policy_cpu() again and again
as we don't really need to update policy->cpu...

Rather it would be better to just move following inside
cpufreq_policy_alloc():

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 10:44                 ` Viresh Kumar
@ 2013-09-11 10:45                   ` Viresh Kumar
  2013-09-11 11:14                     ` Srivatsa S. Bhat
  2013-09-11 11:10                   ` Srivatsa S. Bhat
  1 sibling, 1 reply; 38+ messages in thread
From: Viresh Kumar @ 2013-09-11 10:45 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 11 September 2013 16:14, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> But I would have solved it differently :)
>
> We don't really need to call update_policy_cpu() again and again
> as we don't really need to update policy->cpu...
>
> Rather it would be better to just move following inside
> cpufreq_policy_alloc():

Half written mail sent :(

What about moving following to cpufreq_policy_alloc():

policy->cpu = cpu;

??

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 10:45                   ` Viresh Kumar
@ 2013-09-11 11:14                     ` Srivatsa S. Bhat
  2013-09-11 11:59                       ` Viresh Kumar
  0 siblings, 1 reply; 38+ messages in thread
From: Srivatsa S. Bhat @ 2013-09-11 11:14 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/11/2013 04:15 PM, Viresh Kumar wrote:
> On 11 September 2013 16:14, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> But I would have solved it differently :)
>>
>> We don't really need to call update_policy_cpu() again and again
>> as we don't really need to update policy->cpu...
>>
>> Rather it would be better to just move following inside
>> cpufreq_policy_alloc():
> 
> Half written mail sent :(
> 
> What about moving following to cpufreq_policy_alloc():
> 
> policy->cpu = cpu;
> 
> ??
> 

Hmm? The problem is not about merely updating the policy->cpu field; the
main issue is that the existing code was not letting the cpufreq-stats
code know that we updated the policy->cpu under the hood. It is important
for cpufreq-stats to know this because it maintains the reference to its
stats structure by associating it with the policy->cpu. So if policy->cpu
changes under the hood, it loses track of its reference. So we need to
keep that code informed about changes to policy->cpu. Thus, we need to call
update_policy_cpu() in the CPU online path (during resume). I don't see
how we can skip that.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 11:14                     ` Srivatsa S. Bhat
@ 2013-09-11 11:59                       ` Viresh Kumar
  2013-09-11 13:56                         ` Srivatsa S. Bhat
  2013-09-12  5:52                         ` Viresh Kumar
  0 siblings, 2 replies; 38+ messages in thread
From: Viresh Kumar @ 2013-09-11 11:59 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 11 September 2013 16:44, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Hmm? The problem is not about merely updating the policy->cpu field; the
> main issue is that the existing code was not letting the cpufreq-stats
> code know that we updated the policy->cpu under the hood. It is important
> for cpufreq-stats to know this because it maintains the reference to its
> stats structure by associating it with the policy->cpu. So if policy->cpu
> changes under the hood, it loses track of its reference. So we need to
> keep that code informed about changes to policy->cpu. Thus, we need to call
> update_policy_cpu() in the CPU online path (during resume). I don't see
> how we can skip that.

Okay.. There are two different ways in which cpufreq_add_dev() work
currently..

Boot cluster (i.e. policy with boot CPU)
---------------

Here cpufreq_remove_dev() is never called for boot cpu but all others.
And similarly cpufreq_add_dev() is never called for boot cpu but all others.

Now policy->cpu contains meaningful cpu at beginning of resume and
we don't need to modify that at all.. For all the remaining CPUs we
better call cpufreq_add_policy_cpu() rather..

Non-boot Cluster
---------------------

All CPUs here are removed and at the end policy->cpu contains the last
cpu removed.. So, for a cluster with cpu 2 and 3.... it will contain 3..

Not at resume we will add cpu2 first and so need to update policy->cpu
to 2.. But for all other CPUs in this cluster we return early from
cpufreq_add_dev() and call cpufreq_add_policy_cpu() as policy->cpus
was fixed by call to ->init() for the first cpu of this cluster..

And so we never reach the line: policy->cpu = cpu;

For the first cpu of non-boot cluster we need to call update_policy_cpu()
and not for others..

But for the boot cluster if we can call ->init() somehow at resume time,
then things would be fairly similar in both cases..

I am running of time now, as need to leave office now...
I hope I made the problem more clear or probably the way I see it :)

--
viresh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 11:59                       ` Viresh Kumar
@ 2013-09-11 13:56                         ` Srivatsa S. Bhat
  2013-09-12  5:52                         ` Viresh Kumar
  1 sibling, 0 replies; 38+ messages in thread
From: Srivatsa S. Bhat @ 2013-09-11 13:56 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/11/2013 05:29 PM, Viresh Kumar wrote:
> On 11 September 2013 16:44, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Hmm? The problem is not about merely updating the policy->cpu field; the
>> main issue is that the existing code was not letting the cpufreq-stats
>> code know that we updated the policy->cpu under the hood. It is important
>> for cpufreq-stats to know this because it maintains the reference to its
>> stats structure by associating it with the policy->cpu. So if policy->cpu
>> changes under the hood, it loses track of its reference. So we need to
>> keep that code informed about changes to policy->cpu. Thus, we need to call
>> update_policy_cpu() in the CPU online path (during resume). I don't see
>> how we can skip that.
> 
> Okay.. There are two different ways in which cpufreq_add_dev() work
> currently..
> 
> Boot cluster (i.e. policy with boot CPU)
> ---------------
> 
> Here cpufreq_remove_dev() is never called for boot cpu but all others.
> And similarly cpufreq_add_dev() is never called for boot cpu but all others.
> 
> Now policy->cpu contains meaningful cpu at beginning of resume and
> we don't need to modify that at all.. For all the remaining CPUs we
> better call cpufreq_add_policy_cpu() rather..
>

Yes, and the existing code already does that. And if you observe the placement
of the call to update_policy_cpu() in my patch, you'll find that it won't
be called in cases where we call cpufreq_add_policy_cpu(). Which is exactly
what you want, IIUC.

> Non-boot Cluster
> ---------------------
> 
> All CPUs here are removed and at the end policy->cpu contains the last
> cpu removed.. So, for a cluster with cpu 2 and 3.... it will contain 3..
> 
> Not at resume we will add cpu2 first and so need to update policy->cpu
> to 2.. But for all other CPUs in this cluster we return early from
> cpufreq_add_dev() and call cpufreq_add_policy_cpu() as policy->cpus
> was fixed by call to ->init() for the first cpu of this cluster..
> 
> And so we never reach the line: policy->cpu = cpu;
> 
> For the first cpu of non-boot cluster we need to call update_policy_cpu()
> and not for others..
> 

Yes, and that's precisely why I added the call to update_policy_cpu() only
near the statement 'policy->cpu = cpu'. All other cases don't need to call
that function, and in my patch, they indeed don't call that function.

A simple way to look at this is:
If policy->cpu is updated anywhere in the resume (cpu online) path, we need
to call update_policy_cpu(). Otherwise, we don't need to call that function.
This will cover both the scenarios you described above.

> 
> But for the boot cluster if we can call ->init() somehow at resume time,
> then things would be fairly similar in both cases..
> 
> I am running of time now, as need to leave office now...
> I hope I made the problem more clear or probably the way I see it :)
> 

Well, your explanation matches my understanding of the scenarios, and
I had written the patch keeping those scenarios in mind; so I believe the
patch is correct as it is. So what I didn't get is this: are you suggesting
that something is wrong in my patch, or are you just reiterating the different
scenarios involved so that I can recheck whether my patch is right?

If it is the former, please point me to the exact problem you found in
my patch. If it is the latter, I'm pretty sure my patch is right, I've
already checked it :-)

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 11:59                       ` Viresh Kumar
  2013-09-11 13:56                         ` Srivatsa S. Bhat
@ 2013-09-12  5:52                         ` Viresh Kumar
  2013-09-12  6:26                           ` Srivatsa S. Bhat
  1 sibling, 1 reply; 38+ messages in thread
From: Viresh Kumar @ 2013-09-12  5:52 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

Let me fix my mail first.. I was running out of time yesterday and so couldn't
frame things correctly :)

On 11 September 2013 17:29, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> Okay.. There are two different ways in which cpufreq_add_dev() work
> currently..
>
> Boot cluster (i.e. policy with boot CPU)
> ---------------
>
> Here cpufreq_remove_dev() is never called for boot cpu but all others.
> And similarly cpufreq_add_dev() is never called for boot cpu but all others.
>
> Now policy->cpu contains meaningful cpu at beginning of resume and
> we don't need to modify that at all.. For all the remaining CPUs we
> better call cpufreq_add_policy_cpu() rather..

And this should be done without your patch. Or actually we will simply
return from this place. Atleast for systems with single cluster, like Tegra.

policy->related_cpus is still valid after resume and we haven't removed
policy from the cpufreq_policy_list (though there is a bug which I have
fixed separately and sent it to you..).. So no change required for a single
cluster system..

> Non-boot Cluster
> ---------------------
>
> All CPUs here are removed and at the end policy->cpu contains the last
> cpu removed.. So, for a cluster with cpu 2 and 3.... it will contain 3..
>
> Now at resume we will add cpu2 first and so need to update policy->cpu
> to 2..

> But for all other CPUs in this cluster we return early from
> cpufreq_add_dev() and call cpufreq_add_policy_cpu() as policy->cpus
> was fixed by call to ->init() for the first cpu of this cluster..

This was wrong, we need a valid policy->related_cpus field which is always
valid and so we return early here too, but not for the first cpu of cluster.

> And so we never reach the line: policy->cpu = cpu;
>
> For the first cpu of non-boot cluster we need to call update_policy_cpu()
> and not for others..

that's correct, thought I have one more idea.. :)

> But for the boot cluster if we can call ->init() somehow at resume time,
> then things would be fairly similar in both cases..

Not required.. its all working already.. and so Stephen shouldn't need your
patch for Tegra, but rather my patches that fix other cpufreq bugs..

Now coming back to the ideas I have...
Same code will work if hotplug sequence is fixed a bit. Why aren't we doing
exact opposite of suspend in resume?

We are removing CPUs (leaving the boot cpu) in ascending order and then
adding them back in same order.. Why?

Why not remove CPUs in descending order and add in ascending order? Or
remove in ascending order and add in descending order?

That way policy->cpu will be updated with the right cpu and your patch wouldn't
be required..

I am not saying that this can't be hacked/fixed in cpufreq but suspend/resume
may also be fixed and that looks logically more correct to me..

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-12  5:52                         ` Viresh Kumar
@ 2013-09-12  6:26                           ` Srivatsa S. Bhat
  2013-09-12  6:41                             ` Viresh Kumar
  2013-09-12 15:55                             ` Stephen Warren
  0 siblings, 2 replies; 38+ messages in thread
From: Srivatsa S. Bhat @ 2013-09-12  6:26 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/12/2013 11:22 AM, Viresh Kumar wrote:
> Let me fix my mail first.. I was running out of time yesterday and so couldn't
> frame things correctly :)
> 
> On 11 September 2013 17:29, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> Okay.. There are two different ways in which cpufreq_add_dev() work
>> currently..
>>
>> Boot cluster (i.e. policy with boot CPU)
>> ---------------
>>
>> Here cpufreq_remove_dev() is never called for boot cpu but all others.
>> And similarly cpufreq_add_dev() is never called for boot cpu but all others.
>>
>> Now policy->cpu contains meaningful cpu at beginning of resume and
>> we don't need to modify that at all.. For all the remaining CPUs we
>> better call cpufreq_add_policy_cpu() rather..
> 
> And this should be done without your patch. Or actually we will simply
> return from this place. Atleast for systems with single cluster, like Tegra.
> 
> policy->related_cpus is still valid after resume and we haven't removed
> policy from the cpufreq_policy_list (though there is a bug which I have
> fixed separately and sent it to you..).. So no change required for a single
> cluster system..
> 
>> Non-boot Cluster
>> ---------------------
>>
>> All CPUs here are removed and at the end policy->cpu contains the last
>> cpu removed.. So, for a cluster with cpu 2 and 3.... it will contain 3..
>>
>> Now at resume we will add cpu2 first and so need to update policy->cpu
>> to 2..
> 
>> But for all other CPUs in this cluster we return early from
>> cpufreq_add_dev() and call cpufreq_add_policy_cpu() as policy->cpus
>> was fixed by call to ->init() for the first cpu of this cluster..
> 
> This was wrong, we need a valid policy->related_cpus field which is always
> valid and so we return early here too, but not for the first cpu of cluster.
> 
>> And so we never reach the line: policy->cpu = cpu;
>>
>> For the first cpu of non-boot cluster we need to call update_policy_cpu()
>> and not for others..
> 
> that's correct, thought I have one more idea.. :)
> 
>> But for the boot cluster if we can call ->init() somehow at resume time,
>> then things would be fairly similar in both cases..
> 
> Not required.. its all working already.. and so Stephen shouldn't need your
> patch for Tegra, but rather my patches that fix other cpufreq bugs..
> 
> Now coming back to the ideas I have...
> Same code will work if hotplug sequence is fixed a bit. Why aren't we doing
> exact opposite of suspend in resume?
> 
> We are removing CPUs (leaving the boot cpu) in ascending order and then
> adding them back in same order.. Why?
> 
> Why not remove CPUs in descending order and add in ascending order? Or
> remove in ascending order and add in descending order?
> 

I had the same thought when solving this bug.. We have had similar issues with
CPU hotplug notifiers too: why are they invoked in the same order during both
CPU down and up, instead of reversing the order? I even had a patchset to perform
reverse-invocation of notifiers.. http://lwn.net/Articles/508072/
... but people didn't find that very compelling to have.


> That way policy->cpu will be updated with the right cpu and your patch wouldn't
> be required..
> 
> I am not saying that this can't be hacked/fixed in cpufreq but suspend/resume
> may also be fixed and that looks logically more correct to me..
> 

It does to me too, but I think the reason nobody really bothered is because perhaps
not many other subsystems care about the order in which CPUs are torn down or
brought up; they just need the total number to match.. cpufreq is one exception
as we saw with this bug.


Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-12  6:26                           ` Srivatsa S. Bhat
@ 2013-09-12  6:41                             ` Viresh Kumar
  2013-09-12  6:46                               ` Srivatsa S. Bhat
  2013-09-12 15:55                             ` Stephen Warren
  1 sibling, 1 reply; 38+ messages in thread
From: Viresh Kumar @ 2013-09-12  6:41 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 12 September 2013 11:56, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> I had the same thought when solving this bug.. We have had similar issues with
> CPU hotplug notifiers too: why are they invoked in the same order during both
> CPU down and up, instead of reversing the order? I even had a patchset to perform
> reverse-invocation of notifiers.. http://lwn.net/Articles/508072/
> ... but people didn't find that very compelling to have.
>

> It does to me too, but I think the reason nobody really bothered is because perhaps
> not many other subsystems care about the order in which CPUs are torn down or
> brought up; they just need the total number to match.. cpufreq is one exception
> as we saw with this bug.

Probably its time to re-spin that series and make CPUFreq as one of the users
of that patchset.. Resume should be just opposite of suspend and so
that patchset
would make sense even if not many people care about it :)

Over that there is one more problem that I see, don't know if it is really a big
issue..

After a suspend/resume value of policy->cpu may get changed... And so the
hierarchy of sysfs cpufreq files too.. Folder that had links to other
CPUs folder
can now be actual folders instead of links and vice versa..

Don't know if this can break something ??

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-12  6:41                             ` Viresh Kumar
@ 2013-09-12  6:46                               ` Srivatsa S. Bhat
  2013-09-12  6:52                                 ` Viresh Kumar
  0 siblings, 1 reply; 38+ messages in thread
From: Srivatsa S. Bhat @ 2013-09-12  6:46 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/12/2013 12:11 PM, Viresh Kumar wrote:
> On 12 September 2013 11:56, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> I had the same thought when solving this bug.. We have had similar issues with
>> CPU hotplug notifiers too: why are they invoked in the same order during both
>> CPU down and up, instead of reversing the order? I even had a patchset to perform
>> reverse-invocation of notifiers.. http://lwn.net/Articles/508072/
>> ... but people didn't find that very compelling to have.
>>
> 
>> It does to me too, but I think the reason nobody really bothered is because perhaps
>> not many other subsystems care about the order in which CPUs are torn down or
>> brought up; they just need the total number to match.. cpufreq is one exception
>> as we saw with this bug.
> 
> Probably its time to re-spin that series and make CPUFreq as one of the users
> of that patchset.. Resume should be just opposite of suspend and so
> that patchset
> would make sense even if not many people care about it :)
> 
> Over that there is one more problem that I see, don't know if it is really a big
> issue..
> 
> After a suspend/resume value of policy->cpu may get changed... And so the
> hierarchy of sysfs cpufreq files too.. Folder that had links to other
> CPUs folder
> can now be actual folders instead of links and vice versa..
> 
> Don't know if this can break something ??
> 

Interesting observation :-) But we just managed to retain sysfs file permissions
across suspend/resume with a lot of trouble and regressions. That's probably
good enough for some time to come ;-) We can retain folder/links when somebody
really finds a need to do that ;-)

Of course, if we change the suspend/resume sequence and that fixes this, that
would be like getting it for free, nobody would say no to it ;-)

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-12  6:46                               ` Srivatsa S. Bhat
@ 2013-09-12  6:52                                 ` Viresh Kumar
  2013-09-12  7:14                                   ` Srivatsa S. Bhat
  0 siblings, 1 reply; 38+ messages in thread
From: Viresh Kumar @ 2013-09-12  6:52 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 12 September 2013 12:16, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Of course, if we change the suspend/resume sequence and that fixes this, that
> would be like getting it for free, nobody would say no to it ;-)

Not really :)

Policy with 4 CPUs, 0,1,2,3, policy->cpu currently set to 1, 2 or 3...

We will remove CPUs in order 3,2,1 and add back in 1,2,3... Or Vice Versa

policy->cpu after resume is 0 :)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-12  6:52                                 ` Viresh Kumar
@ 2013-09-12  7:14                                   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 38+ messages in thread
From: Srivatsa S. Bhat @ 2013-09-12  7:14 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/12/2013 12:22 PM, Viresh Kumar wrote:
> On 12 September 2013 12:16, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Of course, if we change the suspend/resume sequence and that fixes this, that
>> would be like getting it for free, nobody would say no to it ;-)
> 
> Not really :)
> 
> Policy with 4 CPUs, 0,1,2,3, policy->cpu currently set to 1, 2 or 3...
> 
> We will remove CPUs in order 3,2,1 and add back in 1,2,3... Or Vice Versa
> 
> policy->cpu after resume is 0 :)
> 

Ah, great counter-example! :-)

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-12  6:26                           ` Srivatsa S. Bhat
  2013-09-12  6:41                             ` Viresh Kumar
@ 2013-09-12 15:55                             ` Stephen Warren
  2013-09-12 17:26                               ` Srivatsa S. Bhat
  1 sibling, 1 reply; 38+ messages in thread
From: Stephen Warren @ 2013-09-12 15:55 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Viresh Kumar, Rafael J. Wysocki, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/12/2013 12:26 AM, Srivatsa S. Bhat wrote:
> On 09/12/2013 11:22 AM, Viresh Kumar wrote:
...
>> Now coming back to the ideas I have...
>> Same code will work if hotplug sequence is fixed a bit. Why aren't we doing
>> exact opposite of suspend in resume?
>>
>> We are removing CPUs (leaving the boot cpu) in ascending order and then
>> adding them back in same order.. Why?
>>
>> Why not remove CPUs in descending order and add in ascending order? Or
>> remove in ascending order and add in descending order?
> 
> I had the same thought when solving this bug.. We have had similar issues with
> CPU hotplug notifiers too: why are they invoked in the same order during both
> CPU down and up, instead of reversing the order? I even had a patchset to perform
> reverse-invocation of notifiers.. http://lwn.net/Articles/508072/
> ... but people didn't find that very compelling to have.

I'm not sure if you're talking about the order the CPUs get plugged back
in after resume, or the order of the (pair of?) notifiers that gets
called for each individual CPU.

Sorry if this is blindingly obvious, but with CPU hotplug, I can
manually unplug/re-plug CPUs in any order I feel like, and cpufreq had
better work if I do that. Given that, I don't think there's any
particular need for suspend/resume to unplug/re-plug CPUs in any
particular order for correctness at least, although perhaps it'd be nice
cosmetically for some reason?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-12 15:55                             ` Stephen Warren
@ 2013-09-12 17:26                               ` Srivatsa S. Bhat
  2013-09-13  4:26                                 ` Viresh Kumar
  0 siblings, 1 reply; 38+ messages in thread
From: Srivatsa S. Bhat @ 2013-09-12 17:26 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Viresh Kumar, Rafael J. Wysocki, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/12/2013 09:25 PM, Stephen Warren wrote:
> On 09/12/2013 12:26 AM, Srivatsa S. Bhat wrote:
>> On 09/12/2013 11:22 AM, Viresh Kumar wrote:
> ...
>>> Now coming back to the ideas I have...
>>> Same code will work if hotplug sequence is fixed a bit. Why aren't we doing
>>> exact opposite of suspend in resume?
>>>
>>> We are removing CPUs (leaving the boot cpu) in ascending order and then
>>> adding them back in same order.. Why?
>>>
>>> Why not remove CPUs in descending order and add in ascending order? Or
>>> remove in ascending order and add in descending order?
>>
>> I had the same thought when solving this bug.. We have had similar issues with
>> CPU hotplug notifiers too: why are they invoked in the same order during both
>> CPU down and up, instead of reversing the order? I even had a patchset to perform
>> reverse-invocation of notifiers.. http://lwn.net/Articles/508072/
>> ... but people didn't find that very compelling to have.
> 
> I'm not sure if you're talking about the order the CPUs get plugged back
> in after resume, or the order of the (pair of?) notifiers that gets
> called for each individual CPU.

Well, initially we were discussing about the order the CPUs get plugged back
in after resume, then I gave an example of a similar ordering issue in the
way the registered CPU hotplug notifiers are invoked: during *both* CPU online
and offline, the chain of notifiers are invoked in the same order, rather than
in the opposite order. To work around this unnatural ordering, subsystems have
resorted to splitting up their callbacks into multiple callbacks and then
assigning appropriate priorities etc., to them, to get the ordering right.
We could have done away with all that complexity/confusion if the core CPU
hotplug code had set the ordering right. And that's what my patchset tried to
do. Anyway, nevermind, as of now, subsystems do work around this suitably, so
there is no known bug as such at the present. Just that we could have probably
done it a better way, that's all.

> 
> Sorry if this is blindingly obvious, but with CPU hotplug, I can
> manually unplug/re-plug CPUs in any order I feel like, and cpufreq had
> better work if I do that. Given that, I don't think there's any
> particular need for suspend/resume to unplug/re-plug CPUs in any
> particular order for correctness at least, although perhaps it'd be nice
> cosmetically for some reason?
> 

You're absolutely right! Regular CPU hotplug is more demanding than
suspend/resume in the context we are discussing, since any CPU can be
hotplugged at any time and put back in any order. So code like cpufreq should
be prepared to work with any ordering. So yes, its not very compelling to
change the suspend/resume CPU hotplug ordering, since cpufreq has to deal
with the regular (and harsher) situation anyway.

That said, one subtle but key distinction between regular CPU hotplug and
suspend/resume is that, the kernel guarantees that the state of the system
after resume will be exactly the same as it was before suspend (atleast to
the extent that is practically possible). On the other hand, no such
guarantees are made for regular CPU hotplug, since its almost as if the user
is mutating the system at runtime!. However, suspend/resume as a concept
itself provides the above mentioned guarantee, and in fact, the very fact
that CPU hotplug is used under the hood for suspend/resume is just an
implementation detail, and should not affect the guarantees.

That's the reason we often pay special attention to CPU hotplug handling in
the suspend/resume path, such as preserving sysfs file permissions etc..
On those lines, Viresh and me were just wondering if 'fixing' the suspend/
resume CPU hotplug sequence would yield any additional benefits to better
serve this guarantee. That was the context in which this discussion happened.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-12 17:26                               ` Srivatsa S. Bhat
@ 2013-09-13  4:26                                 ` Viresh Kumar
  0 siblings, 0 replies; 38+ messages in thread
From: Viresh Kumar @ 2013-09-13  4:26 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Stephen Warren, Rafael J. Wysocki, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 12 September 2013 22:56, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 09/12/2013 09:25 PM, Stephen Warren wrote:
> Anyway, nevermind, as of now, subsystems do work around this suitably, so
> there is no known bug as such at the present. Just that we could have probably
> done it a better way, that's all.

Yeah, there is no bug as of now due to the number of hacks adopted by different
framework.. I believe we can still have a cleanup series to take care
of this stuff..
That would be some improvement and would be better for future.. Otherwise
this kind of problems would keep coming again and again..

> You're absolutely right! Regular CPU hotplug is more demanding than
> suspend/resume in the context we are discussing, since any CPU can be
> hotplugged at any time and put back in any order. So code like cpufreq should
> be prepared to work with any ordering.

And that part is well implemented and tested as far as I know..

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 10:44                 ` Viresh Kumar
  2013-09-11 10:45                   ` Viresh Kumar
@ 2013-09-11 11:10                   ` Srivatsa S. Bhat
  2013-09-11 11:15                     ` Viresh Kumar
  1 sibling, 1 reply; 38+ messages in thread
From: Srivatsa S. Bhat @ 2013-09-11 11:10 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/11/2013 04:14 PM, Viresh Kumar wrote:
> On 11 September 2013 15:51, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote:
>>> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote:
> 
>>>> Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown
>>>> during suspend/resume".
> 
> Sorry Stephen, I was away on vacations and came back yesterday only..
> And was badly stuck in something other CPUFreq bugs until now :)
> 
>> Sure, Rafael. Thanks for CC'ing me.
> 
> Thanks for jumping in and helping us out buddy!!!.
>

No problem :-) Besides, I am the one who broke it ;-(
 
Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 11:10                   ` Srivatsa S. Bhat
@ 2013-09-11 11:15                     ` Viresh Kumar
  2013-09-11 11:17                       ` Srivatsa S. Bhat
  0 siblings, 1 reply; 38+ messages in thread
From: Viresh Kumar @ 2013-09-11 11:15 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 11 September 2013 16:40, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 09/11/2013 04:14 PM, Viresh Kumar wrote:
>> On 11 September 2013 15:51, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote:
>>>> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote:
>>
>>>>> Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown
>>>>> during suspend/resume".
>>
>> Sorry Stephen, I was away on vacations and came back yesterday only..
>> And was badly stuck in something other CPUFreq bugs until now :)
>>
>>> Sure, Rafael. Thanks for CC'ing me.
>>
>> Thanks for jumping in and helping us out buddy!!!.
>>
>
> No problem :-) Besides, I am the one who broke it ;-(

I believe you missed some part of my mail ?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 11:15                     ` Viresh Kumar
@ 2013-09-11 11:17                       ` Srivatsa S. Bhat
  2013-09-11 11:41                         ` Viresh Kumar
  0 siblings, 1 reply; 38+ messages in thread
From: Srivatsa S. Bhat @ 2013-09-11 11:17 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/11/2013 04:45 PM, Viresh Kumar wrote:
> On 11 September 2013 16:40, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 09/11/2013 04:14 PM, Viresh Kumar wrote:
>>> On 11 September 2013 15:51, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>> On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote:
>>>>> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote:
>>>
>>>>>> Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown
>>>>>> during suspend/resume".
>>>
>>> Sorry Stephen, I was away on vacations and came back yesterday only..
>>> And was badly stuck in something other CPUFreq bugs until now :)
>>>
>>>> Sure, Rafael. Thanks for CC'ing me.
>>>
>>> Thanks for jumping in and helping us out buddy!!!.
>>>
>>
>> No problem :-) Besides, I am the one who broke it ;-(
> 
> I believe you missed some part of my mail ?
> 

See my next reply :-) I was composing it :-)

Man, you are *fast*! ;-)
 
Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 11:17                       ` Srivatsa S. Bhat
@ 2013-09-11 11:41                         ` Viresh Kumar
  0 siblings, 0 replies; 38+ messages in thread
From: Viresh Kumar @ 2013-09-11 11:41 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Rafael J. Wysocki, Stephen Warren, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 11 September 2013 16:47, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> See my next reply :-) I was composing it :-)
>
> Man, you are *fast*! ;-)

haha...

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 10:21               ` Srivatsa S. Bhat
  2013-09-11 10:44                 ` Viresh Kumar
@ 2013-09-11 11:09                 ` Srivatsa S. Bhat
  2013-09-11 16:05                 ` Stephen Warren
  2 siblings, 0 replies; 38+ messages in thread
From: Srivatsa S. Bhat @ 2013-09-11 11:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Stephen Warren, Viresh Kumar, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/11/2013 03:51 PM, Srivatsa S. Bhat wrote:
> On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote:
>> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote:
>>> On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote:
>>>> On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote:
>>>>> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote:
>>>>>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote:
>>>>>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote:
>>>>>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote:
>>>>>>>>> Viresh,
>>>>>>>>>
>>>>>>>>> I'm seeing the crash below when suspending my system for the second time.
>>>>>>>>>
>>>>>>>>> I can avoid this with the following patch, which adds a check which
>>>>>>>>> already exists in all-but-one other places that the same lookup is made:
>>>>>>>>
>>>>>>>> Which kernel did you test?
>>>>>>>
>>>>>>> next-20130909.
>>>>>>
>>>>>> Is it reproducible with the current mainline?
>>>>>
>>>>> This does not affect v3.11, but does affect current HEAD; 300893b "Merge
>>>>> tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs".
>>>>
>>>> What system does it break on?
>>>
>>> A dual-core ARM system (NVIDIA Tegra20 SoC, Harmony board).
>>>
>>>> Any chance to bisect cpufreq changes between 3.11 and the current HEAD?
>>>
>>> Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown
>>> during suspend/resume".
>>
>> Thanks!
>>
>> Srivatsa, any chance to look into this?
>>
> 
> Sure, Rafael. Thanks for CC'ing me.
> 
> Stephen, I went through the code and I think I found out what is going wrong.
> Can you please try the following patch?
> 
> Regards,
> Srivatsa S. Bhat
> 
> ----------------------------------------------------------------------------
> 
> 
> From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Subject: [PATCH] cpufreq: Fix crash in cpufreq-stats during suspend/resume
> 

And here is a follow-up patch, on top of this one. I hit the bug while working
on this patch, but it doesn't occur in practice, since none of the existing
callers call update_policy_cpu() with new == old. (I had called it like that
by mistake while working on the fix and was surprised by the level of problems
it caused; hence thought of adding a check).

Regards,
Srivatsa S. Bhat

-----------------------------------------------------------------------------
From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Subject: [PATCH] cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu

If update_policy_cpu() is invoked with the existing policy->cpu itself
as the new-cpu parameter, then a lot of things can go terribly wrong.

In its present form, update_policy_cpu() always assumes that the new-cpu
is different from policy->cpu and invokes other functions to perform their
respective updates. And those functions implement the actual update like
this:

per_cpu(..., new_cpu) = per_cpu(..., last_cpu);
per_cpu(..., last_cpu) = NULL;

Thus, when new_cpu == last_cpu, the final NULL assignment makes the per-cpu
references vanish into thin air! (memory leak). From there, it leads to more
problems: cpufreq_stats_create_table() now doesn't find the per-cpu reference
and hence tries to create a new sysfs-group; but sysfs already had created
the group earlier, so it complains that it cannot create a duplicate filename.
In short, the repercussions of a rather innocuous invocation of
update_policy_cpu() can turn out to be pretty nasty.

Ideally update_policy_cpu() should handle this situation (new == last)
gracefully, and not lead to such severe problems. So fix it by adding an
appropriate check.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 drivers/cpufreq/cpufreq.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 62bdb95..a61b7a1 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -949,6 +949,9 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
 
 static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
 {
+	if (cpu == policy->cpu)
+		return;
+
 	policy->last_cpu = policy->cpu;
 	policy->cpu = cpu;
 


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 10:21               ` Srivatsa S. Bhat
  2013-09-11 10:44                 ` Viresh Kumar
  2013-09-11 11:09                 ` Srivatsa S. Bhat
@ 2013-09-11 16:05                 ` Stephen Warren
  2013-09-11 18:03                   ` Srivatsa S. Bhat
  2 siblings, 1 reply; 38+ messages in thread
From: Stephen Warren @ 2013-09-11 16:05 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/11/2013 04:21 AM, Srivatsa S. Bhat wrote:
> On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote:
>> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote:
>>> On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote:
>>>> On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote:
>>>>> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote:
>>>>>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote:
>>>>>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote:
>>>>>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote:
>>>>>>>>> Viresh,
>>>>>>>>>
>>>>>>>>> I'm seeing the crash below when suspending my system for the second time.
...
> Stephen, I went through the code and I think I found out what is going wrong.
> Can you please try the following patch?

Unfortunately, I still see the exact same failure/backtrace with this
patch applied.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 16:05                 ` Stephen Warren
@ 2013-09-11 18:03                   ` Srivatsa S. Bhat
  2013-09-11 18:42                     ` Srivatsa S. Bhat
  0 siblings, 1 reply; 38+ messages in thread
From: Srivatsa S. Bhat @ 2013-09-11 18:03 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/11/2013 09:35 PM, Stephen Warren wrote:
> On 09/11/2013 04:21 AM, Srivatsa S. Bhat wrote:
>> On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote:
>>> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote:
>>>> On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote:
>>>>> On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote:
>>>>>> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote:
>>>>>>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote:
>>>>>>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote:
>>>>>>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote:
>>>>>>>>>> Viresh,
>>>>>>>>>>
>>>>>>>>>> I'm seeing the crash below when suspending my system for the second time.
> ...
>> Stephen, I went through the code and I think I found out what is going wrong.
>> Can you please try the following patch?
> 
> Unfortunately, I still see the exact same failure/backtrace with this
> patch applied.
> 

Oh, is it? Can you please give me the map of the related cpus on your
system? (ie., cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus for
each CPU.)

I must be missing something...
 
Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 18:03                   ` Srivatsa S. Bhat
@ 2013-09-11 18:42                     ` Srivatsa S. Bhat
  2013-09-11 19:03                       ` Stephen Warren
  2013-09-12  5:58                       ` Viresh Kumar
  0 siblings, 2 replies; 38+ messages in thread
From: Srivatsa S. Bhat @ 2013-09-11 18:42 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/11/2013 11:33 PM, Srivatsa S. Bhat wrote:
> On 09/11/2013 09:35 PM, Stephen Warren wrote:
>> On 09/11/2013 04:21 AM, Srivatsa S. Bhat wrote:
>>> On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote:
>>>> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote:
>>>>> On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote:
>>>>>> On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote:
>>>>>>> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote:
>>>>>>>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote:
>>>>>>>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote:
>>>>>>>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote:
>>>>>>>>>>> Viresh,
>>>>>>>>>>>
>>>>>>>>>>> I'm seeing the crash below when suspending my system for the second time.
>> ...
>>> Stephen, I went through the code and I think I found out what is going wrong.
>>> Can you please try the following patch?
>>
>> Unfortunately, I still see the exact same failure/backtrace with this
>> patch applied.
>>
> 
> Oh, is it? Can you please give me the map of the related cpus on your
> system? (ie., cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus for
> each CPU.)
> 
> I must be missing something...
>

OK, I took a second look at the code, and I suspect that applying the
second patch might help. So can you try by applying both the patches
please[1][2]?

Basically here is my hunch: say CPUs 2 and 3 are part of a policy and
3 is the policy->cpu. During suspend, CPU 2 will be taken offline first,
and we hit this code:

1199         if (cpu != policy->cpu && !frozen) {
1200                 sysfs_remove_link(&dev->kobj, "cpufreq");
1201         } else if (cpus > 1) {
1202 
1203                 new_cpu = cpufreq_nominate_new_policy_cpu(policy, cpu, frozen);
1204                 if (new_cpu >= 0) {
1205                         WARN_ON(lock_policy_rwsem_write(cpu));
1206                         update_policy_cpu(policy, new_cpu);
1207                         unlock_policy_rwsem_write(cpu);
1208 
1209                         if (!frozen) {
1210                                 pr_debug("%s: policy Kobject moved to cpu: %d "
1211                                          "from: %d\n",__func__, new_cpu, cpu);
1212                         }
1213                 }
1214         }

At this point, the first 'if' condition fails because frozen == true.
So it enters the else part. But, policy->cpu is actually 3, not 2,
and hence we invoke nominate_...() unnecessarily. That function returns
3 since that's the only CPU remaining in the mask, and so we call
update_policy_cpu() with new_cpu = 3, and old_cpu was also 3! And that
is the perfect recipe for disaster, with the current implementation of
update_policy_cpu().

And my second patch [2] tried to fix this exact problem, although I
didn't realize we actually had a case where we hit this in the current
code itself.

So please try by applying both the patches and let me know how it goes.
Thanks a lot for your testing efforts!

[1]. http://marc.info/?l=linux-kernel&m=137889516210816&w=2
[2]. http://marc.info/?l=linux-kernel&m=137889800511940&w=2
 
Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 18:42                     ` Srivatsa S. Bhat
@ 2013-09-11 19:03                       ` Stephen Warren
  2013-09-11 19:46                         ` Srivatsa S. Bhat
  2013-09-12  6:00                         ` Viresh Kumar
  2013-09-12  5:58                       ` Viresh Kumar
  1 sibling, 2 replies; 38+ messages in thread
From: Stephen Warren @ 2013-09-11 19:03 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/11/2013 12:42 PM, Srivatsa S. Bhat wrote:
...
> OK, I took a second look at the code, and I suspect that applying the
> second patch might help. So can you try by applying both the patches
> please[1][2]?
> 
...
> [1]. http://marc.info/?l=linux-kernel&m=137889516210816&w=2
> [2]. http://marc.info/?l=linux-kernel&m=137889800511940&w=2

Yes, with both of those patches applies, the problem is solved:-)

I was going to test the second patch originally, but it sounded like it
was more of a cleanup rather than a fix for my issue, so I didn't bother
when I found the problem wasn't solved by patch 1. Sorry!

For the record, I'm testing on a 2-CPU system, so I'm not sure whether
your explanation applies; it talks about CPUs 2 and 3 whereas I only
have CPUs 0 and 1, but perhaps your explanation applies equally to any
pair of CPUs?

For the record, here's the information you requested in the other email:

# cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus
0 1
0 1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 19:03                       ` Stephen Warren
@ 2013-09-11 19:46                         ` Srivatsa S. Bhat
  2013-09-11 20:07                           ` Stephen Warren
  2013-09-12  6:04                           ` Viresh Kumar
  2013-09-12  6:00                         ` Viresh Kumar
  1 sibling, 2 replies; 38+ messages in thread
From: Srivatsa S. Bhat @ 2013-09-11 19:46 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/12/2013 12:33 AM, Stephen Warren wrote:
> On 09/11/2013 12:42 PM, Srivatsa S. Bhat wrote:
> ...
>> OK, I took a second look at the code, and I suspect that applying the
>> second patch might help. So can you try by applying both the patches
>> please[1][2]?
>>
> ...
>> [1]. http://marc.info/?l=linux-kernel&m=137889516210816&w=2
>> [2]. http://marc.info/?l=linux-kernel&m=137889800511940&w=2
> 
> Yes, with both of those patches applies, the problem is solved:-)
> 
> I was going to test the second patch originally, but it sounded like it
> was more of a cleanup rather than a fix for my issue, so I didn't bother
> when I found the problem wasn't solved by patch 1. Sorry!
> 

Well, honestly, even I had intended the second patch as a cleanup and
hadn't asked you to test it ;-) Only when you reported that the first patch
failed to solve your problem, I realized that the second patch was
important too! :-) Thanks for testing!

> For the record, I'm testing on a 2-CPU system, so I'm not sure whether
> your explanation applies; it talks about CPUs 2 and 3 whereas I only
> have CPUs 0 and 1, but perhaps your explanation applies equally to any
> pair of CPUs?
> 

Yes, it applies to any pair of CPUs, as long as the CPU first taken down
is not the policy->cpu. In your case, it applies like this:
IIUC, CPU0 is the boot cpu, and hence it wont be taken offline using hotplug.
So only CPU 1 is taken offline during suspend. And if it is not the policy->cpu,
then it hits the very same bug that I described with the analogy of CPUs 2
and 3.

> For the record, here's the information you requested in the other email:
> 
> # cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus
> 0 1
> 0 1
> 

Thanks! It would have been more useful to somehow know which was the
policy->cpu. But looking at the problem, certainly CPU0 was the policy->cpu
in your case. Anyway, nevermind, good to know that the problem got solved
by the 2 patches :-) And more importantly, we now fully understand the
problems that can lead to the NULL deref and the solutions, as outlined below:

Problem 1 : The last surviving policy->cpu during suspend might not
be the one which is onlined during resume. So policy->cpu updates can
get missed by the cpufreq-stats code. This is solved by patch 1.

Problem 2 : If a CPU other than the policy->cpu goes down first during
suspend, then we end up spuriously updating the policy->cpu field, making
update_policy_cpu() go crazy. This is solved by patch 2.

Ideally, I think we should fix the weird if/else condition, since *that*
is the real culprit; and retain patch 2 as a cleanup.

Something like this:

From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Subject: [PATCH] cpufreq: Restructure if/else block to avoid unintended behavior

In __cpufreq_remove_dev_prepare(), the code which decides whether to remove
the sysfs link or nominate a new policy cpu, is governed by an if/else block
with a rather complex set of conditionals. Worse, they harbor a subtlety
which leads to certain unintended behavior.

The code looks like this:

        if (cpu != policy->cpu && !frozen) {
                sysfs_remove_link(&dev->kobj, "cpufreq");
        } else if (cpus > 1) {
		new_cpu = cpufreq_nominate_new_policy_cpu(...);
		...
		update_policy_cpu(..., new_cpu);
	}

The original intention was:
If the CPU going offline is not policy->cpu, just remove the link.
On the other hand, if the CPU going offline is the policy->cpu itself,
handover the policy->cpu job to some other surviving CPU in that policy.

But because the 'if' condition also includes the 'frozen' check, now there
are *two* possibilities by which we can enter the 'else' block:

1. cpu == policy->cpu (intended)
2. cpu != policy->cpu && frozen (unintended)

Due to the second (unintended) scenario, we end up spuriously nominating
a CPU as the policy->cpu, even when the existing policy->cpu is alive and
well. This can cause problems further down the line, especially when we end
up nominating the same policy->cpu as the new one (ie., old == new),
because it totally confuses update_policy_cpu().

To avoid this mess, restructure the if/else block to only do what was
originally intended, and thus prevent any unwelcome surprises.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 drivers/cpufreq/cpufreq.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 62bdb95..247842b 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1193,8 +1193,9 @@ static int __cpufreq_remove_dev_prepare(struct device *dev,
 		cpumask_clear_cpu(cpu, policy->cpus);
 	unlock_policy_rwsem_write(cpu);

-	if (cpu != policy->cpu && !frozen) {
-		sysfs_remove_link(&dev->kobj, "cpufreq");
+	if (cpu != policy->cpu) {
+		if (!frozen)
+			sysfs_remove_link(&dev->kobj, "cpufreq");
 	} else if (cpus > 1) {

 		new_cpu = cpufreq_nominate_new_policy_cpu(policy, cpu, frozen);

So can you see if patch 1 + this above fix solves your problem as well?
Then we can retain the original patch 2 as a cleanup, after these 2 patches.
This organization also makes the code look better and understandable.

Rafael, I'll post the 3 patches separately after knowing the results from
Stephen. You don't have to bother deciphering the patch ordering just yet ;-)

Regards,
Srivatsa S. Bhat

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 19:46                         ` Srivatsa S. Bhat
@ 2013-09-11 20:07                           ` Stephen Warren
  2013-09-11 20:05                             ` Srivatsa S. Bhat
  2013-09-12  6:04                           ` Viresh Kumar
  1 sibling, 1 reply; 38+ messages in thread
From: Stephen Warren @ 2013-09-11 20:07 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/11/2013 01:46 PM, Srivatsa S. Bhat wrote:
> On 09/12/2013 12:33 AM, Stephen Warren wrote:
>> On 09/11/2013 12:42 PM, Srivatsa S. Bhat wrote:
>> ...
>>> OK, I took a second look at the code, and I suspect that applying the
>>> second patch might help. So can you try by applying both the patches
>>> please[1][2]?
>>>
>> ...
>>> [1]. http://marc.info/?l=linux-kernel&m=137889516210816&w=2
>>> [2]. http://marc.info/?l=linux-kernel&m=137889800511940&w=2
>>
>> Yes, with both of those patches applies, the problem is solved:-)
>>
>> I was going to test the second patch originally, but it sounded like it
>> was more of a cleanup rather than a fix for my issue, so I didn't bother
>> when I found the problem wasn't solved by patch 1. Sorry!
>>
> 
> Well, honestly, even I had intended the second patch as a cleanup and
> hadn't asked you to test it ;-) Only when you reported that the first patch
> failed to solve your problem, I realized that the second patch was
> important too! :-) Thanks for testing!
> 
>> For the record, I'm testing on a 2-CPU system, so I'm not sure whether
>> your explanation applies; it talks about CPUs 2 and 3 whereas I only
>> have CPUs 0 and 1, but perhaps your explanation applies equally to any
>> pair of CPUs?
>>
> 
> Yes, it applies to any pair of CPUs, as long as the CPU first taken down
> is not the policy->cpu. In your case, it applies like this:
> IIUC, CPU0 is the boot cpu, and hence it wont be taken offline using hotplug.
> So only CPU 1 is taken offline during suspend. And if it is not the policy->cpu,
> then it hits the very same bug that I described with the analogy of CPUs 2
> and 3.
> 
>> For the record, here's the information you requested in the other email:
>>
>> # cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus
>> 0 1
>> 0 1
> 
> Thanks! It would have been more useful to somehow know which was the
> policy->cpu. But looking at the problem, certainly CPU0 was the policy->cpu
> in your case.

Yes, I believe CPU0 since,

> # ls -l /sys/devices/system/cpu/cpu1/cpufreq
> lrwxrwxrwx 1 root root 0 Jan  1 00:01 /sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq

and cpu0/cpufreq/ has all the files in it.

...
> So can you see if patch 1 + this above fix solves your problem as well?
> Then we can retain the original patch 2 as a cleanup, after these 2 patches.
> This organization also makes the code look better and understandable.

Yes, both patch 1+3 and 1+3+2 work fine.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 20:07                           ` Stephen Warren
@ 2013-09-11 20:05                             ` Srivatsa S. Bhat
  0 siblings, 0 replies; 38+ messages in thread
From: Srivatsa S. Bhat @ 2013-09-11 20:05 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 09/12/2013 01:37 AM, Stephen Warren wrote:
> On 09/11/2013 01:46 PM, Srivatsa S. Bhat wrote:
>> On 09/12/2013 12:33 AM, Stephen Warren wrote:
>>> On 09/11/2013 12:42 PM, Srivatsa S. Bhat wrote:
>>> ...
>>>> OK, I took a second look at the code, and I suspect that applying the
>>>> second patch might help. So can you try by applying both the patches
>>>> please[1][2]?
>>>>
>>> ...
>>>> [1]. http://marc.info/?l=linux-kernel&m=137889516210816&w=2
>>>> [2]. http://marc.info/?l=linux-kernel&m=137889800511940&w=2
>>>
>>> Yes, with both of those patches applies, the problem is solved:-)
>>>
>>> I was going to test the second patch originally, but it sounded like it
>>> was more of a cleanup rather than a fix for my issue, so I didn't bother
>>> when I found the problem wasn't solved by patch 1. Sorry!
>>>
>>
>> Well, honestly, even I had intended the second patch as a cleanup and
>> hadn't asked you to test it ;-) Only when you reported that the first patch
>> failed to solve your problem, I realized that the second patch was
>> important too! :-) Thanks for testing!
>>
>>> For the record, I'm testing on a 2-CPU system, so I'm not sure whether
>>> your explanation applies; it talks about CPUs 2 and 3 whereas I only
>>> have CPUs 0 and 1, but perhaps your explanation applies equally to any
>>> pair of CPUs?
>>>
>>
>> Yes, it applies to any pair of CPUs, as long as the CPU first taken down
>> is not the policy->cpu. In your case, it applies like this:
>> IIUC, CPU0 is the boot cpu, and hence it wont be taken offline using hotplug.
>> So only CPU 1 is taken offline during suspend. And if it is not the policy->cpu,
>> then it hits the very same bug that I described with the analogy of CPUs 2
>> and 3.
>>
>>> For the record, here's the information you requested in the other email:
>>>
>>> # cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus
>>> 0 1
>>> 0 1
>>
>> Thanks! It would have been more useful to somehow know which was the
>> policy->cpu. But looking at the problem, certainly CPU0 was the policy->cpu
>> in your case.
> 
> Yes, I believe CPU0 since,
> 
>> # ls -l /sys/devices/system/cpu/cpu1/cpufreq
>> lrwxrwxrwx 1 root root 0 Jan  1 00:01 /sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq
> 
> and cpu0/cpufreq/ has all the files in it.
> 
> ...

Ah, nice!

>> So can you see if patch 1 + this above fix solves your problem as well?
>> Then we can retain the original patch 2 as a cleanup, after these 2 patches.
>> This organization also makes the code look better and understandable.
> 
> Yes, both patch 1+3 and 1+3+2 work fine.
> 

Cool! Thanks a lot for all your testing efforts Stephen! :-)

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 19:46                         ` Srivatsa S. Bhat
  2013-09-11 20:07                           ` Stephen Warren
@ 2013-09-12  6:04                           ` Viresh Kumar
  1 sibling, 0 replies; 38+ messages in thread
From: Viresh Kumar @ 2013-09-12  6:04 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Stephen Warren, Rafael J. Wysocki, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 12 September 2013 01:16, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Subject: [PATCH] cpufreq: Restructure if/else block to avoid unintended behavior
>
> In __cpufreq_remove_dev_prepare(), the code which decides whether to remove
> the sysfs link or nominate a new policy cpu, is governed by an if/else block
> with a rather complex set of conditionals. Worse, they harbor a subtlety
> which leads to certain unintended behavior.
>
> The code looks like this:
>
>         if (cpu != policy->cpu && !frozen) {
>                 sysfs_remove_link(&dev->kobj, "cpufreq");
>         } else if (cpus > 1) {
>                 new_cpu = cpufreq_nominate_new_policy_cpu(...);
>                 ...
>                 update_policy_cpu(..., new_cpu);
>         }
>
> The original intention was:
> If the CPU going offline is not policy->cpu, just remove the link.
> On the other hand, if the CPU going offline is the policy->cpu itself,
> handover the policy->cpu job to some other surviving CPU in that policy.
>
> But because the 'if' condition also includes the 'frozen' check, now there
> are *two* possibilities by which we can enter the 'else' block:
>
> 1. cpu == policy->cpu (intended)
> 2. cpu != policy->cpu && frozen (unintended)
>
> Due to the second (unintended) scenario, we end up spuriously nominating
> a CPU as the policy->cpu, even when the existing policy->cpu is alive and
> well. This can cause problems further down the line, especially when we end
> up nominating the same policy->cpu as the new one (ie., old == new),
> because it totally confuses update_policy_cpu().
>
> To avoid this mess, restructure the if/else block to only do what was
> originally intended, and thus prevent any unwelcome surprises.
>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
>
>  drivers/cpufreq/cpufreq.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 62bdb95..247842b 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1193,8 +1193,9 @@ static int __cpufreq_remove_dev_prepare(struct device *dev,
>                 cpumask_clear_cpu(cpu, policy->cpus);
>         unlock_policy_rwsem_write(cpu);
>
> -       if (cpu != policy->cpu && !frozen) {
> -               sysfs_remove_link(&dev->kobj, "cpufreq");
> +       if (cpu != policy->cpu) {
> +               if (!frozen)
> +                       sysfs_remove_link(&dev->kobj, "cpufreq");
>         } else if (cpus > 1) {
>
>                 new_cpu = cpufreq_nominate_new_policy_cpu(policy, cpu, frozen);

Ahh, I wrote exactly the same crap.. Rafael please take Srivatsa's patch
here :)

> So can you see if patch 1 + this above fix solves your problem as well?
> Then we can retain the original patch 2 as a cleanup, after these 2 patches.

Why do we need 2 now? We should never hit that case I would say.. And If we
do, there is some other bug in our code which we have hidden :)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 19:03                       ` Stephen Warren
  2013-09-11 19:46                         ` Srivatsa S. Bhat
@ 2013-09-12  6:00                         ` Viresh Kumar
  1 sibling, 0 replies; 38+ messages in thread
From: Viresh Kumar @ 2013-09-12  6:00 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Srivatsa S. Bhat, Rafael J. Wysocki, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 12 September 2013 00:33, Stephen Warren <swarren@wwwdotorg.org> wrote:
> For the record, I'm testing on a 2-CPU system, so I'm not sure whether
> your explanation applies; it talks about CPUs 2 and 3 whereas I only
> have CPUs 0 and 1, but perhaps your explanation applies equally to any
> pair of CPUs?

It does apply to your system as well.. We remove cpu 1 in suspend and so
try to nominate cpu 0 as policy->cpu, which it already is :)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: cpufreq_stats NULL deref on second system suspend
  2013-09-11 18:42                     ` Srivatsa S. Bhat
  2013-09-11 19:03                       ` Stephen Warren
@ 2013-09-12  5:58                       ` Viresh Kumar
  1 sibling, 0 replies; 38+ messages in thread
From: Viresh Kumar @ 2013-09-12  5:58 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Stephen Warren, Rafael J. Wysocki, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cpufreq

On 12 September 2013 00:12, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> OK, I took a second look at the code, and I suspect that applying the
> second patch might help. So can you try by applying both the patches
> please[1][2]?
>
> Basically here is my hunch: say CPUs 2 and 3 are part of a policy and
> 3 is the policy->cpu. During suspend, CPU 2 will be taken offline first,
> and we hit this code:
>
> 1199         if (cpu != policy->cpu && !frozen) {
> 1200                 sysfs_remove_link(&dev->kobj, "cpufreq");
> 1201         } else if (cpus > 1) {
> 1202
> 1203                 new_cpu = cpufreq_nominate_new_policy_cpu(policy, cpu, frozen);
> 1204                 if (new_cpu >= 0) {
> 1205                         WARN_ON(lock_policy_rwsem_write(cpu));
> 1206                         update_policy_cpu(policy, new_cpu);
> 1207                         unlock_policy_rwsem_write(cpu);
> 1208
> 1209                         if (!frozen) {
> 1210                                 pr_debug("%s: policy Kobject moved to cpu: %d "
> 1211                                          "from: %d\n",__func__, new_cpu, cpu);
> 1212                         }
> 1213                 }
> 1214         }
>
> At this point, the first 'if' condition fails because frozen == true.
> So it enters the else part. But, policy->cpu is actually 3, not 2,
> and hence we invoke nominate_...() unnecessarily. That function returns
> 3 since that's the only CPU remaining in the mask, and so we call
> update_policy_cpu() with new_cpu = 3, and old_cpu was also 3! And that
> is the perfect recipe for disaster, with the current implementation of
> update_policy_cpu().

The problem here is not the wrong implementation of update_policy_cpu()
but

1199         if (cpu != policy->cpu && !frozen) {

Though I have fixed it before looking into your replies :)

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2013-09-13  4:26 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-09 19:22 cpufreq_stats NULL deref on second system suspend Stephen Warren
2013-09-09 20:01 ` Rafael J. Wysocki
2013-09-09 20:01   ` Stephen Warren
2013-09-09 20:24     ` Rafael J. Wysocki
2013-09-09 21:29       ` Stephen Warren
2013-09-09 23:14         ` Rafael J. Wysocki
2013-09-10 20:53           ` Stephen Warren
2013-09-10 22:34             ` Rafael J. Wysocki
2013-09-11 10:21               ` Srivatsa S. Bhat
2013-09-11 10:44                 ` Viresh Kumar
2013-09-11 10:45                   ` Viresh Kumar
2013-09-11 11:14                     ` Srivatsa S. Bhat
2013-09-11 11:59                       ` Viresh Kumar
2013-09-11 13:56                         ` Srivatsa S. Bhat
2013-09-12  5:52                         ` Viresh Kumar
2013-09-12  6:26                           ` Srivatsa S. Bhat
2013-09-12  6:41                             ` Viresh Kumar
2013-09-12  6:46                               ` Srivatsa S. Bhat
2013-09-12  6:52                                 ` Viresh Kumar
2013-09-12  7:14                                   ` Srivatsa S. Bhat
2013-09-12 15:55                             ` Stephen Warren
2013-09-12 17:26                               ` Srivatsa S. Bhat
2013-09-13  4:26                                 ` Viresh Kumar
2013-09-11 11:10                   ` Srivatsa S. Bhat
2013-09-11 11:15                     ` Viresh Kumar
2013-09-11 11:17                       ` Srivatsa S. Bhat
2013-09-11 11:41                         ` Viresh Kumar
2013-09-11 11:09                 ` Srivatsa S. Bhat
2013-09-11 16:05                 ` Stephen Warren
2013-09-11 18:03                   ` Srivatsa S. Bhat
2013-09-11 18:42                     ` Srivatsa S. Bhat
2013-09-11 19:03                       ` Stephen Warren
2013-09-11 19:46                         ` Srivatsa S. Bhat
2013-09-11 20:07                           ` Stephen Warren
2013-09-11 20:05                             ` Srivatsa S. Bhat
2013-09-12  6:04                           ` Viresh Kumar
2013-09-12  6:00                         ` Viresh Kumar
2013-09-12  5:58                       ` Viresh Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).