All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Warren <swarren-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
To: Daniel Lezcano <daniel.lezcano-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Cc: Joseph Lo <josephl-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>,
	"linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org"
	<linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>
Subject: Re: [PATCH] ARM: tegra: cpuidle: use CPUIDLE_FLAG_TIMER_STOP flag
Date: Wed, 17 Jul 2013 16:01:41 -0600	[thread overview]
Message-ID: <51E71445.4070306@wwwdotorg.org> (raw)
In-Reply-To: <51E7108B.5030504-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

On 07/17/2013 03:45 PM, Daniel Lezcano wrote:
> On 07/17/2013 10:31 PM, Stephen Warren wrote:
>> On 07/17/2013 04:15 AM, Joseph Lo wrote:
>>> On Wed, 2013-07-17 at 03:51 +0800, Stephen Warren wrote:
>>>> On 07/16/2013 05:17 AM, Joseph Lo wrote:
>>>>> On Tue, 2013-07-16 at 02:04 +0800, Stephen Warren wrote:
>>>>>> On 06/25/2013 03:23 AM, Joseph Lo wrote:
>>>>>>> Use the CPUIDLE_FLAG_TIMER_STOP and let the cpuidle framework
>>>>>>> to handle the CLOCK_EVT_NOTIFY_BROADCAST_ENTER/EXIT when entering
>>>>>>> this state.
>> ... [ discussion of issues with Joesph's patches applied]
>>>
>>> OK. I did more stress tests last night and today. I found it cause by
>>> the patch "ARM: tegra: cpuidle: use CPUIDLE_FLAG_TIMER_STOP flag" and
>>> only impact the Tegra20 platform. The hot plug regression seems due to
>>> this patch. After dropping this patch on top of v3.11-rc1, the Tegra20
>>> can back to normal.
>>>
>>> And the hop plug and suspend stress test can pass on Tegra30/114 too.
>>>
>>> Can the other two patch series for Tegra114 to support CPU idle power
>>> down mode and system suspend still moving forward, not be blocked by
>>> this patch?
>>>
>>> Looks the CPUIDLE_FLAG_TIMER_STOP flag still cause some other issue for
>>> hot plug on Tegra20, I will continue to check this. You can just drop
>>> this patch.
>>
>> OK, if I drop that patch, then everything on Tegra20 and Tegra30 seems
>> fine again.
>>
>> However, I've found some new and exciting issue on Tegra114!
>>
>> With unmodified v3.11-rc1, I can do the following without issue:
>>
>> * Unplug/replug CPUs, so that I had all combinations of CPU 1, 2, 3
>> plugged/unpplugged (with CPU 0 always plugged).
>>
>> * Unplug/replug CPUs, so that I had all combinations of CPU 0, 1, 2, 3
>> plugged/unpplugged (with the obvious exception of never having all CPUs
>> unplugged).
>>
>> However, if I try this with your Tegra114 cpuidle and suspend patches
>> applied, I see the following issues:
>>
>> 1) If I boot, unplug CPU 0, then replug CPU 0, the system immediately
>> hard-hangs.
>>
>> 2) If I run the hotplug test script, leaving CPU 0 always present, I
>> sometimes see:
>>
>>> root@localhost:~# for i in `seq 1 50`; do echo ITERATION $i; ./cpuonline.py; done
>>> ITERATION 1
>>> echo 0 > /sys/devices/system/cpu/cpu2/online
>>> [  458.910054] CPU2: shutdown
>>> echo 0 > /sys/devices/system/cpu/cpu1/online
>>> [  461.004371] CPU1: shutdown
>>> echo 0 > /sys/devices/system/cpu/cpu3/online
>>> [  463.027341] CPU3: shutdown
>>> echo 1 > /sys/devices/system/cpu/cpu1/online
>>> [  465.061412] CPU1: Booted secondary processor
>>> echo 1 > /sys/devices/system/cpu/cpu2/online
>>> [  467.095313] CPU2: Booted secondary processor
>>> [  467.113243] ------------[ cut here ]------------
>>> [  467.117948] WARNING: CPU: 2 PID: 0 at kernel/time/tick-broadcast.c:667 tick_broadcast_oneshot_control+0x19c/0x1c4()
>>> [  467.128352] Modules linked in:
>>> [  467.131455] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.11.0-rc1-00022-g7487363-dirty #49
>>> [  467.139678] [<c0015620>] (unwind_backtrace+0x0/0xf8) from [<c001154c>] (show_stack+0x10/0x14)
>>> [  467.148228] [<c001154c>] (show_stack+0x10/0x14) from [<c05135a8>] (dump_stack+0x80/0xc4)
>>> [  467.156336] [<c05135a8>] (dump_stack+0x80/0xc4) from [<c0024590>] (warn_slowpath_common+0x64/0x88)
>>> [  467.165300] [<c0024590>] (warn_slowpath_common+0x64/0x88) from [<c00245d0>] (warn_slowpath_null+0x1c/0x24)
>>> [  467.174959] [<c00245d0>] (warn_slowpath_null+0x1c/0x24) from [<c00695e4>] (tick_broadcast_oneshot_control+0x19c/0x1c4)
>>> [  467.185659] [<c00695e4>] (tick_broadcast_oneshot_control+0x19c/0x1c4) from [<c0067cdc>] (clockevents_notify+0x1b0/0x1dc)
>>> [  467.196538] [<c0067cdc>] (clockevents_notify+0x1b0/0x1dc) from [<c034f348>] (cpuidle_idle_call+0x11c/0x168)
>>> [  467.206292] [<c034f348>] (cpuidle_idle_call+0x11c/0x168) from [<c000f134>] (arch_cpu_idle+0x8/0x38)
>>> [  467.215359] [<c000f134>] (arch_cpu_idle+0x8/0x38) from [<c0061038>] (cpu_startup_entry+0x60/0x134)
>>> [  467.224325] [<c0061038>] (cpu_startup_entry+0x60/0x134) from [<800083d8>] (0x800083d8)
>>> [  467.232227] ---[ end trace ea579be22a00e7fb ]---
>>> echo 0 > /sys/devices/system/cpu/cpu1/online
>>> [  469.126682] CPU1: shutdown
>>
>> I have found no solution for (1) (although I didn't look hard!).
>>
>> (2) can be solved with the following (at least 50 iterations of my test
>> script worked with this patch applied):
> 
> Actually this warning is resulting from a bug in the tick broadcast code
> and has been solved with commit:
> 
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=timers/urgent&id=ea8deb8dfa6b0e8d1b3d1051585706739b46656c
> 
> This patch has been merged in timers/urgent branch but still need to
> merged with timers/core.
> 
> The patch below does not fix the warning but prevents the tick warning
> to occur. Applying the patch above will fix your problem.

Yes, I was imprecise when I said solve; I simply meant that making that
change prevented me from seeing that issue any more.

That patch is already in v3.11-rc1, and I was using that as a base when
I applied Joseph's patches. So, it doesn't solve this issue.

WARNING: multiple messages have this Message-ID (diff)
From: swarren@wwwdotorg.org (Stephen Warren)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] ARM: tegra: cpuidle: use CPUIDLE_FLAG_TIMER_STOP flag
Date: Wed, 17 Jul 2013 16:01:41 -0600	[thread overview]
Message-ID: <51E71445.4070306@wwwdotorg.org> (raw)
In-Reply-To: <51E7108B.5030504@linaro.org>

On 07/17/2013 03:45 PM, Daniel Lezcano wrote:
> On 07/17/2013 10:31 PM, Stephen Warren wrote:
>> On 07/17/2013 04:15 AM, Joseph Lo wrote:
>>> On Wed, 2013-07-17 at 03:51 +0800, Stephen Warren wrote:
>>>> On 07/16/2013 05:17 AM, Joseph Lo wrote:
>>>>> On Tue, 2013-07-16 at 02:04 +0800, Stephen Warren wrote:
>>>>>> On 06/25/2013 03:23 AM, Joseph Lo wrote:
>>>>>>> Use the CPUIDLE_FLAG_TIMER_STOP and let the cpuidle framework
>>>>>>> to handle the CLOCK_EVT_NOTIFY_BROADCAST_ENTER/EXIT when entering
>>>>>>> this state.
>> ... [ discussion of issues with Joesph's patches applied]
>>>
>>> OK. I did more stress tests last night and today. I found it cause by
>>> the patch "ARM: tegra: cpuidle: use CPUIDLE_FLAG_TIMER_STOP flag" and
>>> only impact the Tegra20 platform. The hot plug regression seems due to
>>> this patch. After dropping this patch on top of v3.11-rc1, the Tegra20
>>> can back to normal.
>>>
>>> And the hop plug and suspend stress test can pass on Tegra30/114 too.
>>>
>>> Can the other two patch series for Tegra114 to support CPU idle power
>>> down mode and system suspend still moving forward, not be blocked by
>>> this patch?
>>>
>>> Looks the CPUIDLE_FLAG_TIMER_STOP flag still cause some other issue for
>>> hot plug on Tegra20, I will continue to check this. You can just drop
>>> this patch.
>>
>> OK, if I drop that patch, then everything on Tegra20 and Tegra30 seems
>> fine again.
>>
>> However, I've found some new and exciting issue on Tegra114!
>>
>> With unmodified v3.11-rc1, I can do the following without issue:
>>
>> * Unplug/replug CPUs, so that I had all combinations of CPU 1, 2, 3
>> plugged/unpplugged (with CPU 0 always plugged).
>>
>> * Unplug/replug CPUs, so that I had all combinations of CPU 0, 1, 2, 3
>> plugged/unpplugged (with the obvious exception of never having all CPUs
>> unplugged).
>>
>> However, if I try this with your Tegra114 cpuidle and suspend patches
>> applied, I see the following issues:
>>
>> 1) If I boot, unplug CPU 0, then replug CPU 0, the system immediately
>> hard-hangs.
>>
>> 2) If I run the hotplug test script, leaving CPU 0 always present, I
>> sometimes see:
>>
>>> root at localhost:~# for i in `seq 1 50`; do echo ITERATION $i; ./cpuonline.py; done
>>> ITERATION 1
>>> echo 0 > /sys/devices/system/cpu/cpu2/online
>>> [  458.910054] CPU2: shutdown
>>> echo 0 > /sys/devices/system/cpu/cpu1/online
>>> [  461.004371] CPU1: shutdown
>>> echo 0 > /sys/devices/system/cpu/cpu3/online
>>> [  463.027341] CPU3: shutdown
>>> echo 1 > /sys/devices/system/cpu/cpu1/online
>>> [  465.061412] CPU1: Booted secondary processor
>>> echo 1 > /sys/devices/system/cpu/cpu2/online
>>> [  467.095313] CPU2: Booted secondary processor
>>> [  467.113243] ------------[ cut here ]------------
>>> [  467.117948] WARNING: CPU: 2 PID: 0 at kernel/time/tick-broadcast.c:667 tick_broadcast_oneshot_control+0x19c/0x1c4()
>>> [  467.128352] Modules linked in:
>>> [  467.131455] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.11.0-rc1-00022-g7487363-dirty #49
>>> [  467.139678] [<c0015620>] (unwind_backtrace+0x0/0xf8) from [<c001154c>] (show_stack+0x10/0x14)
>>> [  467.148228] [<c001154c>] (show_stack+0x10/0x14) from [<c05135a8>] (dump_stack+0x80/0xc4)
>>> [  467.156336] [<c05135a8>] (dump_stack+0x80/0xc4) from [<c0024590>] (warn_slowpath_common+0x64/0x88)
>>> [  467.165300] [<c0024590>] (warn_slowpath_common+0x64/0x88) from [<c00245d0>] (warn_slowpath_null+0x1c/0x24)
>>> [  467.174959] [<c00245d0>] (warn_slowpath_null+0x1c/0x24) from [<c00695e4>] (tick_broadcast_oneshot_control+0x19c/0x1c4)
>>> [  467.185659] [<c00695e4>] (tick_broadcast_oneshot_control+0x19c/0x1c4) from [<c0067cdc>] (clockevents_notify+0x1b0/0x1dc)
>>> [  467.196538] [<c0067cdc>] (clockevents_notify+0x1b0/0x1dc) from [<c034f348>] (cpuidle_idle_call+0x11c/0x168)
>>> [  467.206292] [<c034f348>] (cpuidle_idle_call+0x11c/0x168) from [<c000f134>] (arch_cpu_idle+0x8/0x38)
>>> [  467.215359] [<c000f134>] (arch_cpu_idle+0x8/0x38) from [<c0061038>] (cpu_startup_entry+0x60/0x134)
>>> [  467.224325] [<c0061038>] (cpu_startup_entry+0x60/0x134) from [<800083d8>] (0x800083d8)
>>> [  467.232227] ---[ end trace ea579be22a00e7fb ]---
>>> echo 0 > /sys/devices/system/cpu/cpu1/online
>>> [  469.126682] CPU1: shutdown
>>
>> I have found no solution for (1) (although I didn't look hard!).
>>
>> (2) can be solved with the following (at least 50 iterations of my test
>> script worked with this patch applied):
> 
> Actually this warning is resulting from a bug in the tick broadcast code
> and has been solved with commit:
> 
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=timers/urgent&id=ea8deb8dfa6b0e8d1b3d1051585706739b46656c
> 
> This patch has been merged in timers/urgent branch but still need to
> merged with timers/core.
> 
> The patch below does not fix the warning but prevents the tick warning
> to occur. Applying the patch above will fix your problem.

Yes, I was imprecise when I said solve; I simply meant that making that
change prevented me from seeing that issue any more.

That patch is already in v3.11-rc1, and I was using that as a base when
I applied Joseph's patches. So, it doesn't solve this issue.

  parent reply	other threads:[~2013-07-17 22:01 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-25  9:23 [PATCH] ARM: tegra: cpuidle: use CPUIDLE_FLAG_TIMER_STOP flag Joseph Lo
2013-06-25  9:23 ` Joseph Lo
     [not found] ` <1372152228-16199-1-git-send-email-josephl-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2013-06-25 15:12   ` Stephen Warren
2013-06-25 15:12     ` Stephen Warren
     [not found]     ` <51C9B36A.3040808-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-06-26 11:11       ` Joseph Lo
2013-06-26 11:11         ` Joseph Lo
2013-07-15 18:04   ` Stephen Warren
2013-07-15 18:04     ` Stephen Warren
     [not found]     ` <51E439BC.9030608-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-07-16 10:19       ` Peter De Schrijver
2013-07-16 10:19         ` Peter De Schrijver
2013-07-16 11:17       ` Joseph Lo
2013-07-16 11:17         ` Joseph Lo
     [not found]         ` <1373973447.8538.80.camel-yx3yKKdKkHfc7b1ADBJPm0n48jw8i0AO@public.gmane.org>
2013-07-16 12:11           ` Daniel Lezcano
2013-07-16 12:11             ` Daniel Lezcano
     [not found]             ` <51E53858.6090207-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2013-07-17  6:19               ` Joseph Lo
2013-07-17  6:19                 ` Joseph Lo
2013-07-16 19:51           ` Stephen Warren
2013-07-16 19:51             ` Stephen Warren
     [not found]             ` <51E5A438.10004-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-07-17 10:15               ` Joseph Lo
2013-07-17 10:15                 ` Joseph Lo
     [not found]                 ` <1374056130.10997.16.camel-yx3yKKdKkHfc7b1ADBJPm0n48jw8i0AO@public.gmane.org>
2013-07-17 10:21                   ` Daniel Lezcano
2013-07-17 10:21                     ` Daniel Lezcano
     [not found]                     ` <51E6701E.2070909-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2013-07-17 10:29                       ` Joseph Lo
2013-07-17 10:29                         ` Joseph Lo
2013-07-17 20:31                   ` Stephen Warren
2013-07-17 20:31                     ` Stephen Warren
     [not found]                     ` <51E6FF0B.5000708-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-07-17 21:45                       ` Daniel Lezcano
2013-07-17 21:45                         ` Daniel Lezcano
     [not found]                         ` <51E7108B.5030504-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2013-07-17 22:01                           ` Stephen Warren [this message]
2013-07-17 22:01                             ` Stephen Warren
2013-07-18 11:08                       ` Joseph Lo
2013-07-18 11:08                         ` Joseph Lo
     [not found]                         ` <1374145726.5610.73.camel-yx3yKKdKkHfc7b1ADBJPm0n48jw8i0AO@public.gmane.org>
2013-07-18 12:41                           ` Daniel Lezcano
2013-07-18 12:41                             ` Daniel Lezcano
     [not found]                             ` <51E7E27B.9090605-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2013-07-19  7:14                               ` Joseph Lo
2013-07-19  7:14                                 ` Joseph Lo
     [not found]                                 ` <1374218064.24607.1.camel-yx3yKKdKkHfc7b1ADBJPm0n48jw8i0AO@public.gmane.org>
2013-07-19 10:52                                   ` Daniel Lezcano
2013-07-19 10:52                                     ` Daniel Lezcano
2013-07-22  3:15                                     ` Joseph Lo
2013-07-22  3:15                                       ` Joseph Lo
     [not found]                                       ` <1374462916.15946.14.camel-yx3yKKdKkHfc7b1ADBJPm0n48jw8i0AO@public.gmane.org>
2013-07-22  4:16                                         ` Daniel Lezcano
2013-07-22  4:16                                           ` Daniel Lezcano
     [not found]                                           ` <51ECB223.5000002-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2013-07-22  4:24                                             ` Joseph Lo
2013-07-22  4:24                                               ` Joseph Lo
     [not found]                                               ` <1374467085.15946.16.camel-yx3yKKdKkHfc7b1ADBJPm0n48jw8i0AO@public.gmane.org>
2013-07-22  4:32                                                 ` Daniel Lezcano
2013-07-22  4:32                                                   ` Daniel Lezcano
     [not found]                                                   ` <51ECB5C1.600-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2013-07-22  4:43                                                     ` Joseph Lo
2013-07-22  4:43                                                       ` Joseph Lo
     [not found]                                                       ` <1374468208.15946.17.camel-yx3yKKdKkHfc7b1ADBJPm0n48jw8i0AO@public.gmane.org>
2013-07-22  4:44                                                         ` Daniel Lezcano
2013-07-22  4:44                                                           ` Daniel Lezcano
2013-07-19  9:29                               ` Joseph Lo
2013-07-19  9:29                                 ` Joseph Lo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51E71445.4070306@wwwdotorg.org \
    --to=swarren-3lzwwm7+weoh9zmkesr00q@public.gmane.org \
    --cc=daniel.lezcano-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
    --cc=josephl-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org \
    --cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.