public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [3.6-rc1] sched: clock counts too quickly
@ 2012-08-03 14:18 Tetsuo Handa
  2012-08-04 11:24 ` Tetsuo Handa
  0 siblings, 1 reply; 8+ messages in thread
From: Tetsuo Handa @ 2012-08-03 14:18 UTC (permalink / raw)
  To: linux-kernel

Hello.

Under unknown condition, clock starts counting up too quickly which is enough
to confuse hangcheck timer to print stall warning.

I don't know the condition but it seems that
"[sched_delayed] sched: RT throttling activated" message is printed
when time jump starts and the CPU usage is high (running my testing program).

Below are examples which happened within a few minutes despite the timestamp
field of the log.

Example 1:

[  516.710080] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem
[  516.721264] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null)
[  517.118131] warning: process `ccs_file_test' used the deprecated sysctl system call with 3.5.56.
[ 1617.117747] [sched_delayed] sched: RT throttling activated
[ 1636.759901] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem
[ 1636.764755] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null)
[ 1644.236465] BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]
[ 1644.236503] Modules linked in: ipv6 binfmt_misc ccsecurity
[ 1644.236503] irq event stamp: 2190088
[ 1644.236503] hardirqs last  enabled at (2190087): [<c109ee94>] rcu_idle_exit+0x54/0xc0
[ 1644.236503] hardirqs last disabled at (2190088): [<c13ed3a0>] _raw_spin_lock_irq+0x10/0x80
[ 1644.236503] softirqs last  enabled at (2190084): [<c103a7b2>] __do_softirq+0xf2/0x160
[ 1644.236503] softirqs last disabled at (2190053): [<c1004d07>] do_softirq+0x87/0xd0
[ 1644.236503] Pid: 0, comm: swapper/0 Not tainted 3.6.0-rc1 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
[ 1644.236503] EIP: 0060:[<c1009f71>] EFLAGS: 00000246 CPU: 0
[ 1644.236503] EIP is at default_idle+0x21/0x40
[ 1644.236503] EAX: 00000000 EBX: ffffffff ECX: 00000000 EDX: 00000000
[ 1644.236503] ESI: c15ae3b0 EDI: 01d8d000 EBP: c156df98 ESP: c156df98
[ 1644.236503]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 1644.236503] CR0: 8005003b CR2: b75df9db CR3: 17185000 CR4: 000006d0
[ 1644.236503] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 1644.236503] DR6: ffff0ff0 DR7: 00000400

Example 2:

[  125.371118]  [<c13ee58c>] sysenter_do_call+0x12/0x32
[  138.869458] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem
[  138.881184] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null)
[  139.222763] warning: process `ccs_file_test' used the deprecated sysctl system call with 3.5.56.
[ 1239.068731] [sched_delayed] sched: RT throttling activated
[ 2356.123229] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem
[ 2356.125715] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null)
[ 2484.942962] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem
[ 2484.947182] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null)
[ 3602.556243] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem
[ 3602.566267] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null)
[ 3834.980066] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem
[ 3834.983705] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null)
[ 3855.531019] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem
[ 3855.535266] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null)
[ 3898.597722] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem
[ 3898.601656] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null)
[ 6117.070638] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem
[ 6117.073990] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null)
[ 6136.209395] BUG: soft lockup - CPU#0 stuck for 22s! [ccs_cond_test:8618]
[ 6136.209441] Modules linked in: ipv6 binfmt_misc ccsecurity
[ 6136.209441] irq event stamp: 0
[ 6136.209441] hardirqs last  enabled at (0): [<  (null)>]   (null)
[ 6136.209441] hardirqs last disabled at (0): [<c1030add>] copy_process+0x26d/0xf30
[ 6136.209441] softirqs last  enabled at (0): [<c1030add>] copy_process+0x26d/0xf30

Config is at http://I-love.SAKURA.ne.jp/tmp/config-3.6-rc1 .

Regards.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [3.6-rc1] sched: clock counts too quickly
  2012-08-03 14:18 [3.6-rc1] sched: clock counts too quickly Tetsuo Handa
@ 2012-08-04 11:24 ` Tetsuo Handa
  2012-08-04 12:10   ` Tetsuo Handa
  0 siblings, 1 reply; 8+ messages in thread
From: Tetsuo Handa @ 2012-08-04 11:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, peterz, tglx

Hello.

Tetsuo Handa wrote:
> Under unknown condition, clock starts counting up too quickly which is enough
> to confuse hangcheck timer to print stall warning.
I wrote a test program.

---------- test program start ----------
#include <string.h>
#include <sys/timex.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
        while (1) {
                struct timeval tv;
                struct timezone tz;
                struct timex buf;
                gettimeofday(&tv, &tz);
                settimeofday(&tv, &tz);
                memset(&buf, 0, sizeof(buf));
                buf.modes = 0x100;
                adjtimex(&buf);
        }
        return 0;
}
---------- test program end ----------

Running this program as root user on 3.6-rc1 causes clock jumping.

buf.modes = 0x100 is just a random choice because I don't know what arguments
are appropriate for adjtimex(). But I guess this is a regression because
I didn't trigger this problem until 3.5 and I can always trigger this problem
in 3.6-rc1. Maybe something is wrong with per-cpu clock management?

Example log 1:

[  109.311730] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem
[  109.322127] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null)
[ 1293.082319] [sched_delayed] sched: RT throttling activated
[ 7890.377400] BUG: soft lockup - CPU#0 stuck for 1024s! [a.out:3819]
[ 7890.377400] Modules linked in: ipv6 binfmt_misc
[ 8989.892529] irq event stamp: 86970
[ 8989.892529] hardirqs last  enabled at (86969): [<c1070bf5>] do_settimeofday+0x105/0x130
[ 8989.892529] hardirqs last disabled at (86970): [<c13ed980>] apic_timer_interrupt+0x28/0x34
[ 8989.892529] softirqs last  enabled at (84344): [<c103a7a2>] __do_softirq+0xf2/0x160
[ 8989.892529] softirqs last disabled at (84339): [<c1004d07>] do_softirq+0x87/0xd0
[ 8989.892529] Pid: 3819, comm: a.out Not tainted 3.6.0-rc1-dirty #36 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
[ 8989.892529] EIP: 0060:[<c1070bf9>] EFLAGS: 00000293 CPU: 0
[ 8989.892529] EIP is at do_settimeofday+0x109/0x130
[ 8989.892529] EAX: da2f2160 EBX: ddcaff90 ECX: 00000000 EDX: 00000001
[ 8989.892529] ESI: 167a5ee1 EDI: 00000000 EBP: ddcaff64 ESP: ddcaff44
[ 8989.892529]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 8989.892529] CR0: 8005003b CR2: bfeddfdc CR3: 1a10f000 CR4: 000006d0
[ 8989.892529] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 8989.892529] DR6: ffff0ff0 DR7: 00000400
[ 8989.892529] Process a.out (pid: 3819, ti=ddcae000 task=da2f2160 task.ti=ddcae000)
[ 8989.892529] Stack:
[ 8989.892529]  167a5ee1 00000000 00000293 afe30360 167a5ee1 ddcaff90 00000000 00000000
[ 8989.892529]  ddcaff80 c1039b8b bfa25ca4 bfa25c9c bfa25ca4 bfa25c9c ddcaff88 ddcaffac
[ 8989.892529]  c1039c1e fffffde4 00000000 501d1b72 00000bb8 501d1b72 00000003 bfa25ca4
[ 8989.892529] Call Trace:
[ 8989.892529]  [<c1039b8b>] do_sys_settimeofday+0x7b/0xe0
[ 8989.892529]  [<c1039c1e>] sys_settimeofday+0x2e/0x90
[ 8989.892529]  [<c13eddcc>] sysenter_do_call+0x12/0x32
[ 8989.892529] Code: 2f 78 c1 e8 8a e6 ff ff b8 90 2f 78 c1 83 05 8c 2f 78 c1 01 e8 d9 c9 37 00 f7 45 e8 00 02 00 00 74 20 e8 db a5 00 00 ff 75 e8 9d <e8> 72 61 fe ff 8b 5d f4 31 c0 8b 75 f8 8b 7d fc 89 ec 5d c3 8d
[ 8989.892529] Kernel panic - not syncing: softlockup: hung tasks
[ 8989.892529] Pid: 3819, comm: a.out Not tainted 3.6.0-rc1-dirty #36
[ 8989.892529] Call Trace:
[ 8989.892529]  [<c13e9c5c>] panic+0x87/0x196
[ 8989.892529]  [<c1098281>] watchdog_timer_fn+0x171/0x180
[ 8989.892529]  [<c10568b8>] hrtimer_run_queues+0xb8/0x270
[ 8989.892529]  [<c1098110>] ? touch_softlockup_watchdog_sync+0x30/0x30
[ 8989.892529]  [<c1040978>] run_local_timers+0x8/0x20
[ 8989.892529]  [<c10409bf>] update_process_times+0x2f/0x70
[ 8989.892529]  [<c1075625>] tick_periodic+0x25/0x80
[ 8989.892529]  [<c1075699>] tick_handle_periodic+0x19/0x80
[ 8989.892529]  [<c1022bcf>] smp_apic_timer_interrupt+0x4f/0x90
[ 8989.892529]  [<c11e35a4>] ? trace_hardirqs_off_thunk+0xc/0x18
[ 8989.892529]  [<c13ed987>] apic_timer_interrupt+0x2f/0x34
[ 8989.892529]  [<c1070bf9>] ? do_settimeofday+0x109/0x130
[ 8989.892529]  [<c1039b8b>] do_sys_settimeofday+0x7b/0xe0
[ 8989.892529]  [<c1039c1e>] sys_settimeofday+0x2e/0x90
[ 8989.892529]  [<c13eddcc>] sysenter_do_call+0x12/0x32
[ 8989.892529] ------------[ cut here ]------------
[ 8989.892529] WARNING: at kernel/lockdep.c:2592 trace_hardirqs_on_caller+0x15d/0x1c0()
[ 8989.892529] Hardware name: VMware Virtual Platform
[ 8989.892529] Modules linked in: ipv6 binfmt_misc
[ 8989.892529] Pid: 3819, comm: a.out Not tainted 3.6.0-rc1-dirty #36
[ 8989.892529] Call Trace:
[ 8989.892529]  [<c103215d>] warn_slowpath_common+0x6d/0xa0
[ 8989.892529]  [<c107b16d>] ? trace_hardirqs_on_caller+0x15d/0x1c0
[ 8989.892529]  [<c107b16d>] ? trace_hardirqs_on_caller+0x15d/0x1c0
[ 8989.892529]  [<c13e9d2c>] ? panic+0x157/0x196
[ 8989.892529]  [<c10321ad>] warn_slowpath_null+0x1d/0x20
[ 8989.892529]  [<c107b16d>] trace_hardirqs_on_caller+0x15d/0x1c0
[ 8989.892529]  [<c107b1db>] trace_hardirqs_on+0xb/0x10
[ 8989.892529]  [<c13e9d2c>] panic+0x157/0x196
[ 8989.892529]  [<c1098281>] watchdog_timer_fn+0x171/0x180
[ 8989.892529]  [<c10568b8>] hrtimer_run_queues+0xb8/0x270
[ 8989.892529]  [<c1098110>] ? touch_softlockup_watchdog_sync+0x30/0x30
[ 8989.892529]  [<c1040978>] run_local_timers+0x8/0x20
[ 8989.892529]  [<c10409bf>] update_process_times+0x2f/0x70
[ 8989.892529]  [<c1075625>] tick_periodic+0x25/0x80
[ 8989.892529]  [<c1075699>] tick_handle_periodic+0x19/0x80
[ 8989.892529]  [<c1022bcf>] smp_apic_timer_interrupt+0x4f/0x90
[ 8989.892529]  [<c11e35a4>] ? trace_hardirqs_off_thunk+0xc/0x18
[ 8989.892529]  [<c13ed987>] apic_timer_interrupt+0x2f/0x34
[ 8989.892529]  [<c1070bf9>] ? do_settimeofday+0x109/0x130
[ 8989.892529]  [<c1039b8b>] do_sys_settimeofday+0x7b/0xe0
[ 8989.892529]  [<c1039c1e>] sys_settimeofday+0x2e/0x90
[ 8989.892529]  [<c13eddcc>] sysenter_do_call+0x12/0x32
[ 8989.892529] ---[ end trace 2cc59ba688ae11e6 ]---

Example log 2:

[   32.966527] CPA self-test:
[   32.968959]  4k 131072 large 0 gb 0 x 131072[c0000000-dffff000] miss 0
[   32.981918]  4k 131072 large 0 gb 0 x 131072[c0000000-dffff000] miss 0
[   32.994237]  4k 131072 large 0 gb 0 x 131072[c0000000-dffff000] miss 0
[   32.996998] ok.
[ 1144.112407] [sched_delayed] sched: RT throttling activated
[ 2247.683216] BUG: soft lockup - CPU#0 stuck for 1024s! [smbd:3530]
[ 2247.683216] Modules linked in: ipv6 binfmt_misc
[ 2247.683216] irq event stamp: 45100
[ 2247.683216] hardirqs last  enabled at (45099): [<c10b3f95>] get_page_from_freelist+0x355/0x580
[ 2247.683216] hardirqs last disabled at (45100): [<c13ed980>] apic_timer_interrupt+0x28/0x34
[ 2247.683216] softirqs last  enabled at (43784): [<c103a7a2>] __do_softirq+0xf2/0x160
[ 2247.683216] softirqs last disabled at (43771): [74781.658009] BUG: soft lockup - CPU#1 stuck for 66529s! [a.out:3581]
[74781.658022] Modules linked in: ipv6 binfmt_misc
[74781.658038] irq event stamp: 28028
[74781.658038] hardirqs last  enabled at (28027): [<c1070ac7>] timekeeping_inject_offset+0xc7/0xf0
[74781.658038] hardirqs last disabled at (28028): [<c13ed980>] apic_timer_interrupt+0x28/0x34
[74781.658038] softirqs last  enabled at (27372): [<c103a7a2>] __do_softirq+0xf2/0x160
[74781.658038] softirqs last disabled at (27369): [<c1004d07>] do_softirq+0x87/0xd0
[74781.658038] Pid: 3581, comm: a.out Not tainted 3.6.0-rc1-dirty #36 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
[74781.658038] EIP: 0060:[<c1070ac9>] EFLAGS: 00000293 CPU: 1
[74781.658038] EIP is at timekeeping_inject_offset+0xc9/0xf0
[74781.658038] EAX: da2ae760 EBX: 00000293 ECX: 00000000 EDX: 00000001
[74781.658038] ESI: 00000000 EDI: da339f20 EBP: da339edc ESP: da339ec4
[74781.658038]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[74781.658038] CR0: 8005003b CR2: b773a070 CR3: 1e0f5000 CR4: 000006d0
[74781.658038] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[74781.658038] DR6: ffff0ff0 DR7: 00000400
[74781.658038] Process a.out (pid: 3581, ti=da338000 task=da2ae760 task.ti=da338000)
[74781.658038] Stack:
[74781.658038]  078ac4bb 00000000 afe2ff10 078ac4bb da339f20 bf8f2088 da339f18 c1071299
[74781.658038]  da339f20 da339f04 c10c6d00 00000001 00000002 00000000 00000000 00000080
[74781.658038]  da339f18 c11e38a4 da339f20 bf8f2088 bf8f2000 da339fac c103a05f 00000100
[74781.658038] Call Trace:
[74781.658038]  [<c1071299>] do_adjtimex+0x509/0x5e0
[74781.658038]  [<c10c6d00>] ? might_fault+0x90/0xa0
[74781.658038]  [<c11e38a4>] ? _copy_from_user+0x34/0x80
[74781.658038]  [<c103a05f>] sys_adjtimex+0x4f/0x80
[74781.658038]  [<c13eddcc>] sysenter_do_call+0x12/0x32
[74781.658038] Code: 64 2f 78 c1 b8 01 00 00 00 e8 b4 e7 ff ff b8 90 2f 78 c1 83 05 8c 2f 78 c1 01 e8 03 cb 37 00 f6 c7 02 74 16 e8 09 a7 00 00 53 9d <e8> a2 62 fe ff 31 c0 83 c4 10 5b 5e 5d c3 90 53 9d e8 91 6d 00
[74781.658038] Kernel panic - not syncing: softlockup: hung tasks
[74781.658038] Pid: 3581, comm: a.out Not tainted 3.6.0-rc1-dirty #36
[74781.658038] Call Trace:
[74781.658038]  [<c13e9c5c>] panic+0x87/0x196
[74781.658038]  [<c1098281>] watchdog_timer_fn+0x171/0x180
[74781.658038]  [<c10568b8>] hrtimer_run_queues+0xb8/0x270
[74781.658038]  [<c1062552>] ? sched_clock_local+0xb2/0x190
[74781.658038]  [<c1098110>] ? touch_softlockup_watchdog_sync+0x30/0x30
[74781.658038]  [<c1040978>] run_local_timers+0x8/0x20
[74781.658038]  [<c10409bf>] update_process_times+0x2f/0x70
[74781.658038]  [<c1075625>] tick_periodic+0x25/0x80
[74781.658038]  [<c1075699>] tick_handle_periodic+0x19/0x80
[74781.658038]  [<c1022bcf>] smp_apic_timer_interrupt+0x4f/0x90
[74781.658038]  [<c11e35a4>] ? trace_hardirqs_off_thunk+0xc/0x18
[74781.658038]  [<c13ed987>] apic_timer_interrupt+0x2f/0x34
[74781.658038]  [<c1070ac9>] ? timekeeping_inject_offset+0xc9/0xf0
[74781.658038]  [<c1071299>] do_adjtimex+0x509/0x5e0
[74781.658038]  [<c10c6d00>] ? might_fault+0x90/0xa0
[74781.658038]  [<c11e38a4>] ? _copy_from_user+0x34/0x80
[74781.658038]  [<c103a05f>] sys_adjtimex+0x4f/0x80
[74781.658038]  [<c13eddcc>] sysenter_do_call+0x12/0x32
[ 2247.683216] [<c1004d07>] do_softirq+0x87/0xd0
[ 2247.683216] Pid: 3530, comm: smbd Not tainted 3.6.0-rc1-dirty #36 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
[ 2247.683216] EIP: 0060:[<c102cb93>] EFLAGS: 00000202 CPU: 0
[ 2247.683216] EIP is at kernel_map_pages+0xa3/0x100
[ 2247.683216] EAX: 000006d0 EBX: dfb11b9c ECX: 00000650 EDX: 00000202
[ 2247.683216] ESI: 00000001 EDI: d756fd20 EBP: d756fd30 ESP: d756fcf8
[ 2247.683216]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 2247.683216] CR0: 8005003b CR2: b70e7320 CR3: 175dc000 CR4: 000006d0
[ 2247.683216] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 2247.683216] DR6: ffff0ff0 DR7: 00000400
[ 2247.683216] Process smbd (pid: 3530, ti=d756e000 task=de2b0520 task.ti=d756e000)
[ 2247.683216] Stack:
[ 2247.683216]  00000001 d756fd20 00000003 00000000 00000001 00000001 00019b6f 00000000
[ 2247.683216]  00000000 00000000 d9b70000 00000001 00000001 00000000 d756fd9c c10b3e58
[ 2247.683216]  00000001 00000041 ffffffff 00000001 001974cc 00000000 ffffffff 00000000
[ 2247.683216] Call Trace:
[ 2247.683216]  [<c10b3e58>] get_page_from_freelist+0x218/0x580
[ 2247.683216]  [<c10b42b6>] __alloc_pages_nodemask+0xf6/0x650
[ 2247.683216]  [<c10db7e2>] cache_alloc_refill+0x3d2/0x800
[ 2247.683216]  [<c10d0340>] ? anon_vma_fork+0x50/0xd0
[ 2247.683216]  [<c10db3ca>] kmem_cache_alloc+0x12a/0x170
[ 2247.683216]  [<c10d0340>] anon_vma_fork+0x50/0xd0
[ 2247.683216]  [<c1030572>] dup_mm+0x1a2/0x3c0
[ 2247.683216]  [<c1030f97>] copy_process+0x727/0xf30
[ 2247.683216]  [<c1031893>] do_fork+0x53/0x280
[ 2247.683216]  [<c10c6cb6>] ? might_fault+0x46/0xa0
[ 2247.683216]  [<c10c6d00>] ? might_fault+0x90/0xa0
[ 2247.683216]  [<c10c6cb6>] ? might_fault+0x46/0xa0
[ 2247.683216]  [<c11e3923>] ? copy_to_user+0x33/0x70
[ 2247.683216]  [<c100a17f>] sys_clone+0x2f/0x40
[ 2247.683216]  [<c13edf0d>] ptregs_clone+0x15/0x28
[ 2247.683216]  [<c13eddcc>] ? sysenter_do_call+0x12/0x32
[ 2247.683216] Code: 20 75 15 0f 20 d8 0f 22 d8 8b 5d f4 8b 75 f8 8b 7d fc 89 ec 5d c3 66 90 9c 5a fa 0f 20 e0 89 c1 80 e1 7f 0f 22 e1 0f 22 e0 52 9d <8b> 5d f4 8b 75 f8 8b 7d fc 89 ec 5d c3 83 3d 9c 80 ca c1 02 74

Example log 3:

[   33.144462] CPA self-test:
[   33.147052]  4k 131072 large 0 gb 0 x 131072[c0000000-dffff000] miss 0
[   33.159948]  4k 131072 large 0 gb 0 x 131072[c0000000-dffff000] miss 0
[   33.170339]  4k 131072 large 0 gb 0 x 131072[c0000000-dffff000] miss 0
[   33.173502] ok.
[ 9935.135720] [sched_delayed] sched: RT throttling activated
[11034.133140] BUG: soft lockup - CPU#0 stuck for 1024s! [a.out:3585]
[11034.133140] Modules linked in: ipv6 binfmt_misc
[11034.133140] irq event stamp: 14150
[11034.133140] hardirqs last  enabled at (14149): [<c1070bf5>] do_settimeofday+0x105/0x130
[11034.133140] hardirqs last disabled at (14150): [<c13ed980>] apic_timer_interrupt+0x28/0x34
[11034.133140] softirqs last  enabled at (12282): [<c103a7a2>] __do_softirq+0xf2/0x160
[11034.133140] softirqs last disabled at (12267): [<c1004d07>] do_softirq+0x87/0xd0
[11034.133140] Pid: 3585, comm: a.out Not tainted 3.6.0-rc1-dirty #36 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
[11034.133140] EIP: 0060:[<c1070bf9>] EFLAGS: 00000293 CPU: 0
[11034.133140] EIP is at do_settimeofday+0x109/0x130
[11034.133140] EAX: d9a52260 EBX: ddca9f90 ECX: 00000000 EDX: 00000001
[11034.133140] ESI: 2564b881 EDI: 00000000 EBP: ddca9f64 ESP: ddca9f44
[11034.133140]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[11034.133140] CR0: 8005003b CR2: b77b2000 CR3: 1e4f0000 CR4: 000006d0
[11034.133140] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[11034.133140] DR6: ffff0ff0 DR7: 00000400
[11034.133140] Process a.out (pid: 3585, ti=ddca8000 task=d9a52260 task.ti=ddca8000)
[11034.133140] Stack:
[11034.133140]  2564b881 00000000 00000293 afe2fc28 2564b881 ddca9f90 00000000 00000000
[11034.133140]  ddca9f80 c1039b8b bf9ceaa8 bf9ceaa0 bf9ceaa8 bf9ceaa0 ddca9f88 ddca9fac
[11034.133140]  c1039c1e fffffde4 00000000 501d2ef1 00009088 501d2ef1 00000025 bf9ceaa8
[11034.133140] Call Trace:
[11034.133140]  [<c1039b8b>] do_sys_settimeofday+0x7b/0xe0
[11034.133140]  [<c1039c1e>] sys_settimeofday+0x2e/0x90
[11034.133140]  [<c13eddcc>] sysenter_do_call+0x12/0x32
[11034.133140] Code: 2f 78 c1 e8 8a e6 ff ff b8 90 2f 78 c1 83 05 8c 2f 78 c1 01 e8 d9 c9 37 00 f7 45 e8 00 02 00 00 74 20 e8 db a5 00 00 ff 75 e8 9d <e8> 72 61 fe ff 8b 5d f4 31 c0 8b 75 f8 8b 7d fc 89 ec 5d c3 8d
[11034.133140] Kernel panic - not syncing: softlockup: hung tasks
[11034.133140] Pid: 3585, comm: a.out Not tainted 3.6.0-rc1-dirty #36
[11034.133140] Call Trace:
[11034.133140]  [<c13e9c5c>] panic+0x87/0x196
[11034.133140]  [<c1098281>] watchdog_timer_fn+0x171/0x180
[11034.133140]  [<c10568b8>] hrtimer_run_queues+0xb8/0x270
[11034.133140]  [<c1098110>] ? touch_softlockup_watchdog_sync+0x30/0x30
[12133.210650]  [<c1040978>] run_local_timers+0x8/0x20
[12133.210650]  [<c10409bf>] update_process_times+0x2f/0x70
[12133.210650]  [<c1075625>] tick_periodic+0x25/0x80
[12133.210650]  [<c1075699>] tick_handle_periodic+0x19/0x80
[12133.210650]  [<c1022bcf>] smp_apic_timer_interrupt+0x4f/0x90
[12133.210650]  [<c11e35a4>] ? trace_hardirqs_off_thunk+0xc/0x18
[12133.210650]  [<c13ed987>] apic_timer_interrupt+0x2f/0x34
[12133.210650]  [<c1070bf9>] ? do_settimeofday+0x109/0x130
[12133.210650]  [<c1039b8b>] do_sys_settimeofday+0x7b/0xe0
[12133.210650]  [<c1039c1e>] sys_settimeofday+0x2e/0x90
[12133.210650]  [<c13eddcc>] sysenter_do_call+0x12/0x32
[12133.210650] ------------[ cut here ]------------
[12133.210650] WARNING: at kernel/lockdep.c:2592 trace_hardirqs_on_caller+0x15d/0x1c0()
[12133.210650] Hardware name: VMware Virtual Platform
[12133.210650] Modules linked in: ipv6 binfmt_misc
[12133.210650] Pid: 3585, comm: a.out Not tainted 3.6.0-rc1-dirty #36
[12133.210650] Call Trace:
[12133.210650]  [<c103215d>] warn_slowpath_common+0x6d/0xa0
[12133.210650]  [<c107b16d>] ? trace_hardirqs_on_caller+0x15d/0x1c0
[12133.210650]  [<c107b16d>] ? trace_hardirqs_on_caller+0x15d/0x1c0
[12133.210650]  [<c13e9d2c>] ? panic+0x157/0x196
[12133.210650]  [<c10321ad>] warn_slowpath_null+0x1d/0x20
[12133.210650]  [<c107b16d>] trace_hardirqs_on_caller+0x15d/0x1c0
[12133.210650]  [<c107b1db>] trace_hardirqs_on+0xb/0x10
[12133.210650]  [<c13e9d2c>] panic+0x157/0x196
[12133.210650]  [<c1098281>] watchdog_timer_fn+0x171/0x180
[12133.210650]  [<c10568b8>] hrtimer_run_queues+0xb8/0x270
[12133.210650]  [<c1098110>] ? touch_softlockup_watchdog_sync+0x30/0x30
[12133.210650]  [<c1040978>] run_local_timers+0x8/0x20
[12133.210650]  [<c10409bf>] update_process_times+0x2f/0x70
[12133.210650]  [<c1075625>] tick_periodic+0x25/0x80
[12133.210650]  [<c1075699>] tick_handle_periodic+0x19/0x80
[12133.210650]  [<c1022bcf>] smp_apic_timer_interrupt+0x4f/0x90
[12133.210650]  [<c11e35a4>] ? trace_hardirqs_off_thunk+0xc/0x18
[12133.210650]  [<c13ed987>] apic_timer_interrupt+0x2f/0x34
[12133.210650]  [<c1070bf9>] ? do_settimeofday+0x109/0x130
[12133.210650]  [<c1039b8b>] do_sys_settimeofday+0x7b/0xe0
[12133.210650]  [<c1039c1e>] sys_settimeofday+0x2e/0x90
[12133.210650]  [<c13eddcc>] sysenter_do_call+0x12/0x32
[12133.210650] ---[ end trace 70fa0f99035de9f7 ]---

Regards.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [3.6-rc1] sched: clock counts too quickly
  2012-08-04 11:24 ` Tetsuo Handa
@ 2012-08-04 12:10   ` Tetsuo Handa
  2012-08-04 13:46     ` Tetsuo Handa
  0 siblings, 1 reply; 8+ messages in thread
From: Tetsuo Handa @ 2012-08-04 12:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, peterz, tglx

Tetsuo Handa wrote:
> > Under unknown condition, clock starts counting up too quickly which is enough
> > to confuse hangcheck timer to print stall warning.
> I wrote a test program.

I suspected that this is a problem within adjtimex() because strace reported clock
jump at adjtimex(). But I've just hit this problem with settimeofday() alone.

---------- test program 2 start ----------
#include <sys/timex.h>

int main(int argc, char *argv[])
{
        while (1) {
                struct timeval tv;
                struct timezone tz;
                gettimeofday(&tv, &tz);
                settimeofday(&tv, &tz);
        }
        return 0;
}
---------- test program 2 end ----------

Regards.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [3.6-rc1] sched: clock counts too quickly
  2012-08-04 12:10   ` Tetsuo Handa
@ 2012-08-04 13:46     ` Tetsuo Handa
  2012-08-04 19:21       ` [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust() Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Tetsuo Handa @ 2012-08-04 13:46 UTC (permalink / raw)
  To: john.stultz, mingo, tglx
  Cc: a.p.zijlstra, richardcochran, prarit, linux-kernel

Hello.

Bisected to 2a8c0883 "time: Move xtime_nsec adjustment underflow handling
timekeeping_adjust".

Would you check?

Regards.

Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > > Under unknown condition, clock starts counting up too quickly which is enough
> > > to confuse hangcheck timer to print stall warning.
> > I wrote a test program.
> 
> I suspected that this is a problem within adjtimex() because strace reported clock
> jump at adjtimex(). But I've just hit this problem with settimeofday() alone.
> 
> ---------- test program 2 start ----------
> #include <sys/timex.h>
> 
> int main(int argc, char *argv[])
> {
>         while (1) {
>                 struct timeval tv;
>                 struct timezone tz;
>                 gettimeofday(&tv, &tz);
>                 settimeofday(&tv, &tz);
>         }
>         return 0;
> }
> ---------- test program 2 end ----------
> 
> Regards.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust()
  2012-08-04 13:46     ` Tetsuo Handa
@ 2012-08-04 19:21       ` Ingo Molnar
  2012-08-05  0:50         ` Tetsuo Handa
                           ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Ingo Molnar @ 2012-08-04 19:21 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: john.stultz, tglx, a.p.zijlstra, richardcochran, prarit,
	linux-kernel


* Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:

> Hello.
> 
> Bisected to 2a8c0883 "time: Move xtime_nsec adjustment 
> underflow handling timekeeping_adjust".
> 
> Would you check?

probably caused by this stray return in timekeeping_adjust():

        } else
                return;

which should really be a proper goto - like the (totally 
untested!) patch below.

Does this fix the bug for you?

Thanks,

	Ingo

-------------

Subject: time: Fix adjustment cleanup bug in timekeeping_adjust()

Also make the flow more readable by properly balancing curly 
braces.

Signed-off-by: Ingo Molnar <mingo@kernel.org>


diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 2988bc8..e16af19 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -923,20 +923,22 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 		if (likely(error <= interval))
 			adj = 1;
 		else
-			adj = timekeeping_bigadjust(tk, error, &interval,
-							&offset);
-	} else if (error < -interval) {
-		/* See comment above, this is just switched for the negative */
-		error >>= 2;
-		if (likely(error >= -interval)) {
-			adj = -1;
-			interval = -interval;
-			offset = -offset;
-		} else
-			adj = timekeeping_bigadjust(tk, error, &interval,
-							&offset);
-	} else
-		return;
+			adj = timekeeping_bigadjust(tk, error, &interval, &offset);
+	} else {
+		if (error < -interval) {
+			/* See comment above, this is just switched for the negative */
+			error >>= 2;
+			if (likely(error >= -interval)) {
+				adj = -1;
+				interval = -interval;
+				offset = -offset;
+			} else {
+				adj = timekeeping_bigadjust(tk, error, &interval, &offset);
+			}
+		} else {
+			goto out_adjust;
+		}
+	}
 
 	if (unlikely(tk->clock->maxadj &&
 		(tk->mult + adj > tk->clock->mult + tk->clock->maxadj))) {
@@ -999,6 +1001,7 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 	tk->xtime_nsec -= offset;
 	tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
 
+out_adjust:
 	/*
 	 * It may be possible that when we entered this function, xtime_nsec
 	 * was very small.  Further, if we're slightly speeding the clocksource

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust()
  2012-08-04 19:21       ` [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust() Ingo Molnar
@ 2012-08-05  0:50         ` Tetsuo Handa
  2012-08-05 17:00         ` [tip:timers/urgent] " tip-bot for Ingo Molnar
  2012-08-06 17:21         ` [PATCH] " John Stultz
  2 siblings, 0 replies; 8+ messages in thread
From: Tetsuo Handa @ 2012-08-05  0:50 UTC (permalink / raw)
  To: mingo; +Cc: john.stultz, tglx, a.p.zijlstra, richardcochran, prarit,
	linux-kernel

Ingo Molnar wrote:
> > Bisected to 2a8c0883 "time: Move xtime_nsec adjustment 
> > underflow handling timekeeping_adjust".
> > 
> > Would you check?
> 
> probably caused by this stray return in timekeeping_adjust():
> 
>         } else
>                 return;
> 
> which should really be a proper goto - like the (totally 
> untested!) patch below.
> 
> Does this fix the bug for you?
> 
> Thanks,
> 
> 	Ingo
> 
Yes, this fixed the bug.

Thank you.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:timers/urgent] time: Fix adjustment cleanup bug in timekeeping_adjust()
  2012-08-04 19:21       ` [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust() Ingo Molnar
  2012-08-05  0:50         ` Tetsuo Handa
@ 2012-08-05 17:00         ` tip-bot for Ingo Molnar
  2012-08-06 17:21         ` [PATCH] " John Stultz
  2 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Ingo Molnar @ 2012-08-05 17:00 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx

Commit-ID:  1d17d17484d40f2d5b35c79518597a2b25296996
Gitweb:     http://git.kernel.org/tip/1d17d17484d40f2d5b35c79518597a2b25296996
Author:     Ingo Molnar <mingo@kernel.org>
AuthorDate: Sat, 4 Aug 2012 21:21:14 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 5 Aug 2012 12:37:14 +0200

time: Fix adjustment cleanup bug in timekeeping_adjust()

Tetsuo Handa reported that sporadically the system clock starts
counting up too quickly which is enough to confuse the hangcheck
timer to print a bogus stall warning.

Commit 2a8c0883 "time: Move xtime_nsec adjustment underflow handling
timekeeping_adjust" overlooked this exit path:

        } else
                return;

which should really be a proper exit sequence, fixing the bug as a
side effect.

Also make the flow more readable by properly balancing curly
braces.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: john.stultz@linaro.org
Cc: a.p.zijlstra@chello.nl
Cc: richardcochran@gmail.com
Cc: prarit@redhat.com
Link: http://lkml.kernel.org/r/20120804192114.GA28347@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/time/timekeeping.c |   31 +++++++++++++++++--------------
 1 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 2988bc8..e16af19 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -923,20 +923,22 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 		if (likely(error <= interval))
 			adj = 1;
 		else
-			adj = timekeeping_bigadjust(tk, error, &interval,
-							&offset);
-	} else if (error < -interval) {
-		/* See comment above, this is just switched for the negative */
-		error >>= 2;
-		if (likely(error >= -interval)) {
-			adj = -1;
-			interval = -interval;
-			offset = -offset;
-		} else
-			adj = timekeeping_bigadjust(tk, error, &interval,
-							&offset);
-	} else
-		return;
+			adj = timekeeping_bigadjust(tk, error, &interval, &offset);
+	} else {
+		if (error < -interval) {
+			/* See comment above, this is just switched for the negative */
+			error >>= 2;
+			if (likely(error >= -interval)) {
+				adj = -1;
+				interval = -interval;
+				offset = -offset;
+			} else {
+				adj = timekeeping_bigadjust(tk, error, &interval, &offset);
+			}
+		} else {
+			goto out_adjust;
+		}
+	}
 
 	if (unlikely(tk->clock->maxadj &&
 		(tk->mult + adj > tk->clock->mult + tk->clock->maxadj))) {
@@ -999,6 +1001,7 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 	tk->xtime_nsec -= offset;
 	tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
 
+out_adjust:
 	/*
 	 * It may be possible that when we entered this function, xtime_nsec
 	 * was very small.  Further, if we're slightly speeding the clocksource

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust()
  2012-08-04 19:21       ` [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust() Ingo Molnar
  2012-08-05  0:50         ` Tetsuo Handa
  2012-08-05 17:00         ` [tip:timers/urgent] " tip-bot for Ingo Molnar
@ 2012-08-06 17:21         ` John Stultz
  2 siblings, 0 replies; 8+ messages in thread
From: John Stultz @ 2012-08-06 17:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tetsuo Handa, tglx, a.p.zijlstra, richardcochran, prarit,
	linux-kernel

On 08/04/2012 12:21 PM, Ingo Molnar wrote:
> * Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>
>> Hello.
>>
>> Bisected to 2a8c0883 "time: Move xtime_nsec adjustment
>> underflow handling timekeeping_adjust".
>>
>> Would you check?
> probably caused by this stray return in timekeeping_adjust():
>
>          } else
>                  return;
>
> which should really be a proper goto - like the (totally
> untested!) patch below.

Ooof.   Your analysis looks correct, thanks for fixing this, Ingo!
And thanks for catching the issue Tetsuo!

thanks
-john


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-08-06 17:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-03 14:18 [3.6-rc1] sched: clock counts too quickly Tetsuo Handa
2012-08-04 11:24 ` Tetsuo Handa
2012-08-04 12:10   ` Tetsuo Handa
2012-08-04 13:46     ` Tetsuo Handa
2012-08-04 19:21       ` [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust() Ingo Molnar
2012-08-05  0:50         ` Tetsuo Handa
2012-08-05 17:00         ` [tip:timers/urgent] " tip-bot for Ingo Molnar
2012-08-06 17:21         ` [PATCH] " John Stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox