* [3.6-rc1] sched: clock counts too quickly @ 2012-08-03 14:18 Tetsuo Handa 2012-08-04 11:24 ` Tetsuo Handa 0 siblings, 1 reply; 8+ messages in thread From: Tetsuo Handa @ 2012-08-03 14:18 UTC (permalink / raw) To: linux-kernel Hello. Under unknown condition, clock starts counting up too quickly which is enough to confuse hangcheck timer to print stall warning. I don't know the condition but it seems that "[sched_delayed] sched: RT throttling activated" message is printed when time jump starts and the CPU usage is high (running my testing program). Below are examples which happened within a few minutes despite the timestamp field of the log. Example 1: [ 516.710080] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem [ 516.721264] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null) [ 517.118131] warning: process `ccs_file_test' used the deprecated sysctl system call with 3.5.56. [ 1617.117747] [sched_delayed] sched: RT throttling activated [ 1636.759901] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem [ 1636.764755] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null) [ 1644.236465] BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0] [ 1644.236503] Modules linked in: ipv6 binfmt_misc ccsecurity [ 1644.236503] irq event stamp: 2190088 [ 1644.236503] hardirqs last enabled at (2190087): [<c109ee94>] rcu_idle_exit+0x54/0xc0 [ 1644.236503] hardirqs last disabled at (2190088): [<c13ed3a0>] _raw_spin_lock_irq+0x10/0x80 [ 1644.236503] softirqs last enabled at (2190084): [<c103a7b2>] __do_softirq+0xf2/0x160 [ 1644.236503] softirqs last disabled at (2190053): [<c1004d07>] do_softirq+0x87/0xd0 [ 1644.236503] Pid: 0, comm: swapper/0 Not tainted 3.6.0-rc1 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform [ 1644.236503] EIP: 0060:[<c1009f71>] EFLAGS: 00000246 CPU: 0 [ 1644.236503] EIP is at default_idle+0x21/0x40 [ 1644.236503] EAX: 00000000 EBX: ffffffff ECX: 00000000 EDX: 00000000 [ 1644.236503] ESI: c15ae3b0 EDI: 01d8d000 EBP: c156df98 ESP: c156df98 [ 1644.236503] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 1644.236503] CR0: 8005003b CR2: b75df9db CR3: 17185000 CR4: 000006d0 [ 1644.236503] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 1644.236503] DR6: ffff0ff0 DR7: 00000400 Example 2: [ 125.371118] [<c13ee58c>] sysenter_do_call+0x12/0x32 [ 138.869458] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem [ 138.881184] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null) [ 139.222763] warning: process `ccs_file_test' used the deprecated sysctl system call with 3.5.56. [ 1239.068731] [sched_delayed] sched: RT throttling activated [ 2356.123229] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem [ 2356.125715] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null) [ 2484.942962] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem [ 2484.947182] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null) [ 3602.556243] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem [ 3602.566267] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null) [ 3834.980066] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem [ 3834.983705] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null) [ 3855.531019] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem [ 3855.535266] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null) [ 3898.597722] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem [ 3898.601656] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null) [ 6117.070638] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem [ 6117.073990] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null) [ 6136.209395] BUG: soft lockup - CPU#0 stuck for 22s! [ccs_cond_test:8618] [ 6136.209441] Modules linked in: ipv6 binfmt_misc ccsecurity [ 6136.209441] irq event stamp: 0 [ 6136.209441] hardirqs last enabled at (0): [< (null)>] (null) [ 6136.209441] hardirqs last disabled at (0): [<c1030add>] copy_process+0x26d/0xf30 [ 6136.209441] softirqs last enabled at (0): [<c1030add>] copy_process+0x26d/0xf30 Config is at http://I-love.SAKURA.ne.jp/tmp/config-3.6-rc1 . Regards. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [3.6-rc1] sched: clock counts too quickly 2012-08-03 14:18 [3.6-rc1] sched: clock counts too quickly Tetsuo Handa @ 2012-08-04 11:24 ` Tetsuo Handa 2012-08-04 12:10 ` Tetsuo Handa 0 siblings, 1 reply; 8+ messages in thread From: Tetsuo Handa @ 2012-08-04 11:24 UTC (permalink / raw) To: linux-kernel; +Cc: mingo, peterz, tglx Hello. Tetsuo Handa wrote: > Under unknown condition, clock starts counting up too quickly which is enough > to confuse hangcheck timer to print stall warning. I wrote a test program. ---------- test program start ---------- #include <string.h> #include <sys/timex.h> #include <unistd.h> int main(int argc, char *argv[]) { while (1) { struct timeval tv; struct timezone tz; struct timex buf; gettimeofday(&tv, &tz); settimeofday(&tv, &tz); memset(&buf, 0, sizeof(buf)); buf.modes = 0x100; adjtimex(&buf); } return 0; } ---------- test program end ---------- Running this program as root user on 3.6-rc1 causes clock jumping. buf.modes = 0x100 is just a random choice because I don't know what arguments are appropriate for adjtimex(). But I guess this is a regression because I didn't trigger this problem until 3.5 and I can always trigger this problem in 3.6-rc1. Maybe something is wrong with per-cpu clock management? Example log 1: [ 109.311730] EXT4-fs (ram0): mounting ext2 file system using the ext4 subsystem [ 109.322127] EXT4-fs (ram0): mounted filesystem without journal. Opts: (null) [ 1293.082319] [sched_delayed] sched: RT throttling activated [ 7890.377400] BUG: soft lockup - CPU#0 stuck for 1024s! [a.out:3819] [ 7890.377400] Modules linked in: ipv6 binfmt_misc [ 8989.892529] irq event stamp: 86970 [ 8989.892529] hardirqs last enabled at (86969): [<c1070bf5>] do_settimeofday+0x105/0x130 [ 8989.892529] hardirqs last disabled at (86970): [<c13ed980>] apic_timer_interrupt+0x28/0x34 [ 8989.892529] softirqs last enabled at (84344): [<c103a7a2>] __do_softirq+0xf2/0x160 [ 8989.892529] softirqs last disabled at (84339): [<c1004d07>] do_softirq+0x87/0xd0 [ 8989.892529] Pid: 3819, comm: a.out Not tainted 3.6.0-rc1-dirty #36 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform [ 8989.892529] EIP: 0060:[<c1070bf9>] EFLAGS: 00000293 CPU: 0 [ 8989.892529] EIP is at do_settimeofday+0x109/0x130 [ 8989.892529] EAX: da2f2160 EBX: ddcaff90 ECX: 00000000 EDX: 00000001 [ 8989.892529] ESI: 167a5ee1 EDI: 00000000 EBP: ddcaff64 ESP: ddcaff44 [ 8989.892529] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 8989.892529] CR0: 8005003b CR2: bfeddfdc CR3: 1a10f000 CR4: 000006d0 [ 8989.892529] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 8989.892529] DR6: ffff0ff0 DR7: 00000400 [ 8989.892529] Process a.out (pid: 3819, ti=ddcae000 task=da2f2160 task.ti=ddcae000) [ 8989.892529] Stack: [ 8989.892529] 167a5ee1 00000000 00000293 afe30360 167a5ee1 ddcaff90 00000000 00000000 [ 8989.892529] ddcaff80 c1039b8b bfa25ca4 bfa25c9c bfa25ca4 bfa25c9c ddcaff88 ddcaffac [ 8989.892529] c1039c1e fffffde4 00000000 501d1b72 00000bb8 501d1b72 00000003 bfa25ca4 [ 8989.892529] Call Trace: [ 8989.892529] [<c1039b8b>] do_sys_settimeofday+0x7b/0xe0 [ 8989.892529] [<c1039c1e>] sys_settimeofday+0x2e/0x90 [ 8989.892529] [<c13eddcc>] sysenter_do_call+0x12/0x32 [ 8989.892529] Code: 2f 78 c1 e8 8a e6 ff ff b8 90 2f 78 c1 83 05 8c 2f 78 c1 01 e8 d9 c9 37 00 f7 45 e8 00 02 00 00 74 20 e8 db a5 00 00 ff 75 e8 9d <e8> 72 61 fe ff 8b 5d f4 31 c0 8b 75 f8 8b 7d fc 89 ec 5d c3 8d [ 8989.892529] Kernel panic - not syncing: softlockup: hung tasks [ 8989.892529] Pid: 3819, comm: a.out Not tainted 3.6.0-rc1-dirty #36 [ 8989.892529] Call Trace: [ 8989.892529] [<c13e9c5c>] panic+0x87/0x196 [ 8989.892529] [<c1098281>] watchdog_timer_fn+0x171/0x180 [ 8989.892529] [<c10568b8>] hrtimer_run_queues+0xb8/0x270 [ 8989.892529] [<c1098110>] ? touch_softlockup_watchdog_sync+0x30/0x30 [ 8989.892529] [<c1040978>] run_local_timers+0x8/0x20 [ 8989.892529] [<c10409bf>] update_process_times+0x2f/0x70 [ 8989.892529] [<c1075625>] tick_periodic+0x25/0x80 [ 8989.892529] [<c1075699>] tick_handle_periodic+0x19/0x80 [ 8989.892529] [<c1022bcf>] smp_apic_timer_interrupt+0x4f/0x90 [ 8989.892529] [<c11e35a4>] ? trace_hardirqs_off_thunk+0xc/0x18 [ 8989.892529] [<c13ed987>] apic_timer_interrupt+0x2f/0x34 [ 8989.892529] [<c1070bf9>] ? do_settimeofday+0x109/0x130 [ 8989.892529] [<c1039b8b>] do_sys_settimeofday+0x7b/0xe0 [ 8989.892529] [<c1039c1e>] sys_settimeofday+0x2e/0x90 [ 8989.892529] [<c13eddcc>] sysenter_do_call+0x12/0x32 [ 8989.892529] ------------[ cut here ]------------ [ 8989.892529] WARNING: at kernel/lockdep.c:2592 trace_hardirqs_on_caller+0x15d/0x1c0() [ 8989.892529] Hardware name: VMware Virtual Platform [ 8989.892529] Modules linked in: ipv6 binfmt_misc [ 8989.892529] Pid: 3819, comm: a.out Not tainted 3.6.0-rc1-dirty #36 [ 8989.892529] Call Trace: [ 8989.892529] [<c103215d>] warn_slowpath_common+0x6d/0xa0 [ 8989.892529] [<c107b16d>] ? trace_hardirqs_on_caller+0x15d/0x1c0 [ 8989.892529] [<c107b16d>] ? trace_hardirqs_on_caller+0x15d/0x1c0 [ 8989.892529] [<c13e9d2c>] ? panic+0x157/0x196 [ 8989.892529] [<c10321ad>] warn_slowpath_null+0x1d/0x20 [ 8989.892529] [<c107b16d>] trace_hardirqs_on_caller+0x15d/0x1c0 [ 8989.892529] [<c107b1db>] trace_hardirqs_on+0xb/0x10 [ 8989.892529] [<c13e9d2c>] panic+0x157/0x196 [ 8989.892529] [<c1098281>] watchdog_timer_fn+0x171/0x180 [ 8989.892529] [<c10568b8>] hrtimer_run_queues+0xb8/0x270 [ 8989.892529] [<c1098110>] ? touch_softlockup_watchdog_sync+0x30/0x30 [ 8989.892529] [<c1040978>] run_local_timers+0x8/0x20 [ 8989.892529] [<c10409bf>] update_process_times+0x2f/0x70 [ 8989.892529] [<c1075625>] tick_periodic+0x25/0x80 [ 8989.892529] [<c1075699>] tick_handle_periodic+0x19/0x80 [ 8989.892529] [<c1022bcf>] smp_apic_timer_interrupt+0x4f/0x90 [ 8989.892529] [<c11e35a4>] ? trace_hardirqs_off_thunk+0xc/0x18 [ 8989.892529] [<c13ed987>] apic_timer_interrupt+0x2f/0x34 [ 8989.892529] [<c1070bf9>] ? do_settimeofday+0x109/0x130 [ 8989.892529] [<c1039b8b>] do_sys_settimeofday+0x7b/0xe0 [ 8989.892529] [<c1039c1e>] sys_settimeofday+0x2e/0x90 [ 8989.892529] [<c13eddcc>] sysenter_do_call+0x12/0x32 [ 8989.892529] ---[ end trace 2cc59ba688ae11e6 ]--- Example log 2: [ 32.966527] CPA self-test: [ 32.968959] 4k 131072 large 0 gb 0 x 131072[c0000000-dffff000] miss 0 [ 32.981918] 4k 131072 large 0 gb 0 x 131072[c0000000-dffff000] miss 0 [ 32.994237] 4k 131072 large 0 gb 0 x 131072[c0000000-dffff000] miss 0 [ 32.996998] ok. [ 1144.112407] [sched_delayed] sched: RT throttling activated [ 2247.683216] BUG: soft lockup - CPU#0 stuck for 1024s! [smbd:3530] [ 2247.683216] Modules linked in: ipv6 binfmt_misc [ 2247.683216] irq event stamp: 45100 [ 2247.683216] hardirqs last enabled at (45099): [<c10b3f95>] get_page_from_freelist+0x355/0x580 [ 2247.683216] hardirqs last disabled at (45100): [<c13ed980>] apic_timer_interrupt+0x28/0x34 [ 2247.683216] softirqs last enabled at (43784): [<c103a7a2>] __do_softirq+0xf2/0x160 [ 2247.683216] softirqs last disabled at (43771): [74781.658009] BUG: soft lockup - CPU#1 stuck for 66529s! [a.out:3581] [74781.658022] Modules linked in: ipv6 binfmt_misc [74781.658038] irq event stamp: 28028 [74781.658038] hardirqs last enabled at (28027): [<c1070ac7>] timekeeping_inject_offset+0xc7/0xf0 [74781.658038] hardirqs last disabled at (28028): [<c13ed980>] apic_timer_interrupt+0x28/0x34 [74781.658038] softirqs last enabled at (27372): [<c103a7a2>] __do_softirq+0xf2/0x160 [74781.658038] softirqs last disabled at (27369): [<c1004d07>] do_softirq+0x87/0xd0 [74781.658038] Pid: 3581, comm: a.out Not tainted 3.6.0-rc1-dirty #36 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform [74781.658038] EIP: 0060:[<c1070ac9>] EFLAGS: 00000293 CPU: 1 [74781.658038] EIP is at timekeeping_inject_offset+0xc9/0xf0 [74781.658038] EAX: da2ae760 EBX: 00000293 ECX: 00000000 EDX: 00000001 [74781.658038] ESI: 00000000 EDI: da339f20 EBP: da339edc ESP: da339ec4 [74781.658038] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [74781.658038] CR0: 8005003b CR2: b773a070 CR3: 1e0f5000 CR4: 000006d0 [74781.658038] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [74781.658038] DR6: ffff0ff0 DR7: 00000400 [74781.658038] Process a.out (pid: 3581, ti=da338000 task=da2ae760 task.ti=da338000) [74781.658038] Stack: [74781.658038] 078ac4bb 00000000 afe2ff10 078ac4bb da339f20 bf8f2088 da339f18 c1071299 [74781.658038] da339f20 da339f04 c10c6d00 00000001 00000002 00000000 00000000 00000080 [74781.658038] da339f18 c11e38a4 da339f20 bf8f2088 bf8f2000 da339fac c103a05f 00000100 [74781.658038] Call Trace: [74781.658038] [<c1071299>] do_adjtimex+0x509/0x5e0 [74781.658038] [<c10c6d00>] ? might_fault+0x90/0xa0 [74781.658038] [<c11e38a4>] ? _copy_from_user+0x34/0x80 [74781.658038] [<c103a05f>] sys_adjtimex+0x4f/0x80 [74781.658038] [<c13eddcc>] sysenter_do_call+0x12/0x32 [74781.658038] Code: 64 2f 78 c1 b8 01 00 00 00 e8 b4 e7 ff ff b8 90 2f 78 c1 83 05 8c 2f 78 c1 01 e8 03 cb 37 00 f6 c7 02 74 16 e8 09 a7 00 00 53 9d <e8> a2 62 fe ff 31 c0 83 c4 10 5b 5e 5d c3 90 53 9d e8 91 6d 00 [74781.658038] Kernel panic - not syncing: softlockup: hung tasks [74781.658038] Pid: 3581, comm: a.out Not tainted 3.6.0-rc1-dirty #36 [74781.658038] Call Trace: [74781.658038] [<c13e9c5c>] panic+0x87/0x196 [74781.658038] [<c1098281>] watchdog_timer_fn+0x171/0x180 [74781.658038] [<c10568b8>] hrtimer_run_queues+0xb8/0x270 [74781.658038] [<c1062552>] ? sched_clock_local+0xb2/0x190 [74781.658038] [<c1098110>] ? touch_softlockup_watchdog_sync+0x30/0x30 [74781.658038] [<c1040978>] run_local_timers+0x8/0x20 [74781.658038] [<c10409bf>] update_process_times+0x2f/0x70 [74781.658038] [<c1075625>] tick_periodic+0x25/0x80 [74781.658038] [<c1075699>] tick_handle_periodic+0x19/0x80 [74781.658038] [<c1022bcf>] smp_apic_timer_interrupt+0x4f/0x90 [74781.658038] [<c11e35a4>] ? trace_hardirqs_off_thunk+0xc/0x18 [74781.658038] [<c13ed987>] apic_timer_interrupt+0x2f/0x34 [74781.658038] [<c1070ac9>] ? timekeeping_inject_offset+0xc9/0xf0 [74781.658038] [<c1071299>] do_adjtimex+0x509/0x5e0 [74781.658038] [<c10c6d00>] ? might_fault+0x90/0xa0 [74781.658038] [<c11e38a4>] ? _copy_from_user+0x34/0x80 [74781.658038] [<c103a05f>] sys_adjtimex+0x4f/0x80 [74781.658038] [<c13eddcc>] sysenter_do_call+0x12/0x32 [ 2247.683216] [<c1004d07>] do_softirq+0x87/0xd0 [ 2247.683216] Pid: 3530, comm: smbd Not tainted 3.6.0-rc1-dirty #36 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform [ 2247.683216] EIP: 0060:[<c102cb93>] EFLAGS: 00000202 CPU: 0 [ 2247.683216] EIP is at kernel_map_pages+0xa3/0x100 [ 2247.683216] EAX: 000006d0 EBX: dfb11b9c ECX: 00000650 EDX: 00000202 [ 2247.683216] ESI: 00000001 EDI: d756fd20 EBP: d756fd30 ESP: d756fcf8 [ 2247.683216] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 2247.683216] CR0: 8005003b CR2: b70e7320 CR3: 175dc000 CR4: 000006d0 [ 2247.683216] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 2247.683216] DR6: ffff0ff0 DR7: 00000400 [ 2247.683216] Process smbd (pid: 3530, ti=d756e000 task=de2b0520 task.ti=d756e000) [ 2247.683216] Stack: [ 2247.683216] 00000001 d756fd20 00000003 00000000 00000001 00000001 00019b6f 00000000 [ 2247.683216] 00000000 00000000 d9b70000 00000001 00000001 00000000 d756fd9c c10b3e58 [ 2247.683216] 00000001 00000041 ffffffff 00000001 001974cc 00000000 ffffffff 00000000 [ 2247.683216] Call Trace: [ 2247.683216] [<c10b3e58>] get_page_from_freelist+0x218/0x580 [ 2247.683216] [<c10b42b6>] __alloc_pages_nodemask+0xf6/0x650 [ 2247.683216] [<c10db7e2>] cache_alloc_refill+0x3d2/0x800 [ 2247.683216] [<c10d0340>] ? anon_vma_fork+0x50/0xd0 [ 2247.683216] [<c10db3ca>] kmem_cache_alloc+0x12a/0x170 [ 2247.683216] [<c10d0340>] anon_vma_fork+0x50/0xd0 [ 2247.683216] [<c1030572>] dup_mm+0x1a2/0x3c0 [ 2247.683216] [<c1030f97>] copy_process+0x727/0xf30 [ 2247.683216] [<c1031893>] do_fork+0x53/0x280 [ 2247.683216] [<c10c6cb6>] ? might_fault+0x46/0xa0 [ 2247.683216] [<c10c6d00>] ? might_fault+0x90/0xa0 [ 2247.683216] [<c10c6cb6>] ? might_fault+0x46/0xa0 [ 2247.683216] [<c11e3923>] ? copy_to_user+0x33/0x70 [ 2247.683216] [<c100a17f>] sys_clone+0x2f/0x40 [ 2247.683216] [<c13edf0d>] ptregs_clone+0x15/0x28 [ 2247.683216] [<c13eddcc>] ? sysenter_do_call+0x12/0x32 [ 2247.683216] Code: 20 75 15 0f 20 d8 0f 22 d8 8b 5d f4 8b 75 f8 8b 7d fc 89 ec 5d c3 66 90 9c 5a fa 0f 20 e0 89 c1 80 e1 7f 0f 22 e1 0f 22 e0 52 9d <8b> 5d f4 8b 75 f8 8b 7d fc 89 ec 5d c3 83 3d 9c 80 ca c1 02 74 Example log 3: [ 33.144462] CPA self-test: [ 33.147052] 4k 131072 large 0 gb 0 x 131072[c0000000-dffff000] miss 0 [ 33.159948] 4k 131072 large 0 gb 0 x 131072[c0000000-dffff000] miss 0 [ 33.170339] 4k 131072 large 0 gb 0 x 131072[c0000000-dffff000] miss 0 [ 33.173502] ok. [ 9935.135720] [sched_delayed] sched: RT throttling activated [11034.133140] BUG: soft lockup - CPU#0 stuck for 1024s! [a.out:3585] [11034.133140] Modules linked in: ipv6 binfmt_misc [11034.133140] irq event stamp: 14150 [11034.133140] hardirqs last enabled at (14149): [<c1070bf5>] do_settimeofday+0x105/0x130 [11034.133140] hardirqs last disabled at (14150): [<c13ed980>] apic_timer_interrupt+0x28/0x34 [11034.133140] softirqs last enabled at (12282): [<c103a7a2>] __do_softirq+0xf2/0x160 [11034.133140] softirqs last disabled at (12267): [<c1004d07>] do_softirq+0x87/0xd0 [11034.133140] Pid: 3585, comm: a.out Not tainted 3.6.0-rc1-dirty #36 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform [11034.133140] EIP: 0060:[<c1070bf9>] EFLAGS: 00000293 CPU: 0 [11034.133140] EIP is at do_settimeofday+0x109/0x130 [11034.133140] EAX: d9a52260 EBX: ddca9f90 ECX: 00000000 EDX: 00000001 [11034.133140] ESI: 2564b881 EDI: 00000000 EBP: ddca9f64 ESP: ddca9f44 [11034.133140] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [11034.133140] CR0: 8005003b CR2: b77b2000 CR3: 1e4f0000 CR4: 000006d0 [11034.133140] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [11034.133140] DR6: ffff0ff0 DR7: 00000400 [11034.133140] Process a.out (pid: 3585, ti=ddca8000 task=d9a52260 task.ti=ddca8000) [11034.133140] Stack: [11034.133140] 2564b881 00000000 00000293 afe2fc28 2564b881 ddca9f90 00000000 00000000 [11034.133140] ddca9f80 c1039b8b bf9ceaa8 bf9ceaa0 bf9ceaa8 bf9ceaa0 ddca9f88 ddca9fac [11034.133140] c1039c1e fffffde4 00000000 501d2ef1 00009088 501d2ef1 00000025 bf9ceaa8 [11034.133140] Call Trace: [11034.133140] [<c1039b8b>] do_sys_settimeofday+0x7b/0xe0 [11034.133140] [<c1039c1e>] sys_settimeofday+0x2e/0x90 [11034.133140] [<c13eddcc>] sysenter_do_call+0x12/0x32 [11034.133140] Code: 2f 78 c1 e8 8a e6 ff ff b8 90 2f 78 c1 83 05 8c 2f 78 c1 01 e8 d9 c9 37 00 f7 45 e8 00 02 00 00 74 20 e8 db a5 00 00 ff 75 e8 9d <e8> 72 61 fe ff 8b 5d f4 31 c0 8b 75 f8 8b 7d fc 89 ec 5d c3 8d [11034.133140] Kernel panic - not syncing: softlockup: hung tasks [11034.133140] Pid: 3585, comm: a.out Not tainted 3.6.0-rc1-dirty #36 [11034.133140] Call Trace: [11034.133140] [<c13e9c5c>] panic+0x87/0x196 [11034.133140] [<c1098281>] watchdog_timer_fn+0x171/0x180 [11034.133140] [<c10568b8>] hrtimer_run_queues+0xb8/0x270 [11034.133140] [<c1098110>] ? touch_softlockup_watchdog_sync+0x30/0x30 [12133.210650] [<c1040978>] run_local_timers+0x8/0x20 [12133.210650] [<c10409bf>] update_process_times+0x2f/0x70 [12133.210650] [<c1075625>] tick_periodic+0x25/0x80 [12133.210650] [<c1075699>] tick_handle_periodic+0x19/0x80 [12133.210650] [<c1022bcf>] smp_apic_timer_interrupt+0x4f/0x90 [12133.210650] [<c11e35a4>] ? trace_hardirqs_off_thunk+0xc/0x18 [12133.210650] [<c13ed987>] apic_timer_interrupt+0x2f/0x34 [12133.210650] [<c1070bf9>] ? do_settimeofday+0x109/0x130 [12133.210650] [<c1039b8b>] do_sys_settimeofday+0x7b/0xe0 [12133.210650] [<c1039c1e>] sys_settimeofday+0x2e/0x90 [12133.210650] [<c13eddcc>] sysenter_do_call+0x12/0x32 [12133.210650] ------------[ cut here ]------------ [12133.210650] WARNING: at kernel/lockdep.c:2592 trace_hardirqs_on_caller+0x15d/0x1c0() [12133.210650] Hardware name: VMware Virtual Platform [12133.210650] Modules linked in: ipv6 binfmt_misc [12133.210650] Pid: 3585, comm: a.out Not tainted 3.6.0-rc1-dirty #36 [12133.210650] Call Trace: [12133.210650] [<c103215d>] warn_slowpath_common+0x6d/0xa0 [12133.210650] [<c107b16d>] ? trace_hardirqs_on_caller+0x15d/0x1c0 [12133.210650] [<c107b16d>] ? trace_hardirqs_on_caller+0x15d/0x1c0 [12133.210650] [<c13e9d2c>] ? panic+0x157/0x196 [12133.210650] [<c10321ad>] warn_slowpath_null+0x1d/0x20 [12133.210650] [<c107b16d>] trace_hardirqs_on_caller+0x15d/0x1c0 [12133.210650] [<c107b1db>] trace_hardirqs_on+0xb/0x10 [12133.210650] [<c13e9d2c>] panic+0x157/0x196 [12133.210650] [<c1098281>] watchdog_timer_fn+0x171/0x180 [12133.210650] [<c10568b8>] hrtimer_run_queues+0xb8/0x270 [12133.210650] [<c1098110>] ? touch_softlockup_watchdog_sync+0x30/0x30 [12133.210650] [<c1040978>] run_local_timers+0x8/0x20 [12133.210650] [<c10409bf>] update_process_times+0x2f/0x70 [12133.210650] [<c1075625>] tick_periodic+0x25/0x80 [12133.210650] [<c1075699>] tick_handle_periodic+0x19/0x80 [12133.210650] [<c1022bcf>] smp_apic_timer_interrupt+0x4f/0x90 [12133.210650] [<c11e35a4>] ? trace_hardirqs_off_thunk+0xc/0x18 [12133.210650] [<c13ed987>] apic_timer_interrupt+0x2f/0x34 [12133.210650] [<c1070bf9>] ? do_settimeofday+0x109/0x130 [12133.210650] [<c1039b8b>] do_sys_settimeofday+0x7b/0xe0 [12133.210650] [<c1039c1e>] sys_settimeofday+0x2e/0x90 [12133.210650] [<c13eddcc>] sysenter_do_call+0x12/0x32 [12133.210650] ---[ end trace 70fa0f99035de9f7 ]--- Regards. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [3.6-rc1] sched: clock counts too quickly 2012-08-04 11:24 ` Tetsuo Handa @ 2012-08-04 12:10 ` Tetsuo Handa 2012-08-04 13:46 ` Tetsuo Handa 0 siblings, 1 reply; 8+ messages in thread From: Tetsuo Handa @ 2012-08-04 12:10 UTC (permalink / raw) To: linux-kernel; +Cc: mingo, peterz, tglx Tetsuo Handa wrote: > > Under unknown condition, clock starts counting up too quickly which is enough > > to confuse hangcheck timer to print stall warning. > I wrote a test program. I suspected that this is a problem within adjtimex() because strace reported clock jump at adjtimex(). But I've just hit this problem with settimeofday() alone. ---------- test program 2 start ---------- #include <sys/timex.h> int main(int argc, char *argv[]) { while (1) { struct timeval tv; struct timezone tz; gettimeofday(&tv, &tz); settimeofday(&tv, &tz); } return 0; } ---------- test program 2 end ---------- Regards. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [3.6-rc1] sched: clock counts too quickly 2012-08-04 12:10 ` Tetsuo Handa @ 2012-08-04 13:46 ` Tetsuo Handa 2012-08-04 19:21 ` [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust() Ingo Molnar 0 siblings, 1 reply; 8+ messages in thread From: Tetsuo Handa @ 2012-08-04 13:46 UTC (permalink / raw) To: john.stultz, mingo, tglx Cc: a.p.zijlstra, richardcochran, prarit, linux-kernel Hello. Bisected to 2a8c0883 "time: Move xtime_nsec adjustment underflow handling timekeeping_adjust". Would you check? Regards. Tetsuo Handa wrote: > Tetsuo Handa wrote: > > > Under unknown condition, clock starts counting up too quickly which is enough > > > to confuse hangcheck timer to print stall warning. > > I wrote a test program. > > I suspected that this is a problem within adjtimex() because strace reported clock > jump at adjtimex(). But I've just hit this problem with settimeofday() alone. > > ---------- test program 2 start ---------- > #include <sys/timex.h> > > int main(int argc, char *argv[]) > { > while (1) { > struct timeval tv; > struct timezone tz; > gettimeofday(&tv, &tz); > settimeofday(&tv, &tz); > } > return 0; > } > ---------- test program 2 end ---------- > > Regards. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust() 2012-08-04 13:46 ` Tetsuo Handa @ 2012-08-04 19:21 ` Ingo Molnar 2012-08-05 0:50 ` Tetsuo Handa ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Ingo Molnar @ 2012-08-04 19:21 UTC (permalink / raw) To: Tetsuo Handa Cc: john.stultz, tglx, a.p.zijlstra, richardcochran, prarit, linux-kernel * Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > Hello. > > Bisected to 2a8c0883 "time: Move xtime_nsec adjustment > underflow handling timekeeping_adjust". > > Would you check? probably caused by this stray return in timekeeping_adjust(): } else return; which should really be a proper goto - like the (totally untested!) patch below. Does this fix the bug for you? Thanks, Ingo ------------- Subject: time: Fix adjustment cleanup bug in timekeeping_adjust() Also make the flow more readable by properly balancing curly braces. Signed-off-by: Ingo Molnar <mingo@kernel.org> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 2988bc8..e16af19 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -923,20 +923,22 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset) if (likely(error <= interval)) adj = 1; else - adj = timekeeping_bigadjust(tk, error, &interval, - &offset); - } else if (error < -interval) { - /* See comment above, this is just switched for the negative */ - error >>= 2; - if (likely(error >= -interval)) { - adj = -1; - interval = -interval; - offset = -offset; - } else - adj = timekeeping_bigadjust(tk, error, &interval, - &offset); - } else - return; + adj = timekeeping_bigadjust(tk, error, &interval, &offset); + } else { + if (error < -interval) { + /* See comment above, this is just switched for the negative */ + error >>= 2; + if (likely(error >= -interval)) { + adj = -1; + interval = -interval; + offset = -offset; + } else { + adj = timekeeping_bigadjust(tk, error, &interval, &offset); + } + } else { + goto out_adjust; + } + } if (unlikely(tk->clock->maxadj && (tk->mult + adj > tk->clock->mult + tk->clock->maxadj))) { @@ -999,6 +1001,7 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset) tk->xtime_nsec -= offset; tk->ntp_error -= (interval - offset) << tk->ntp_error_shift; +out_adjust: /* * It may be possible that when we entered this function, xtime_nsec * was very small. Further, if we're slightly speeding the clocksource ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust() 2012-08-04 19:21 ` [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust() Ingo Molnar @ 2012-08-05 0:50 ` Tetsuo Handa 2012-08-05 17:00 ` [tip:timers/urgent] " tip-bot for Ingo Molnar 2012-08-06 17:21 ` [PATCH] " John Stultz 2 siblings, 0 replies; 8+ messages in thread From: Tetsuo Handa @ 2012-08-05 0:50 UTC (permalink / raw) To: mingo; +Cc: john.stultz, tglx, a.p.zijlstra, richardcochran, prarit, linux-kernel Ingo Molnar wrote: > > Bisected to 2a8c0883 "time: Move xtime_nsec adjustment > > underflow handling timekeeping_adjust". > > > > Would you check? > > probably caused by this stray return in timekeeping_adjust(): > > } else > return; > > which should really be a proper goto - like the (totally > untested!) patch below. > > Does this fix the bug for you? > > Thanks, > > Ingo > Yes, this fixed the bug. Thank you. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [tip:timers/urgent] time: Fix adjustment cleanup bug in timekeeping_adjust() 2012-08-04 19:21 ` [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust() Ingo Molnar 2012-08-05 0:50 ` Tetsuo Handa @ 2012-08-05 17:00 ` tip-bot for Ingo Molnar 2012-08-06 17:21 ` [PATCH] " John Stultz 2 siblings, 0 replies; 8+ messages in thread From: tip-bot for Ingo Molnar @ 2012-08-05 17:00 UTC (permalink / raw) To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx Commit-ID: 1d17d17484d40f2d5b35c79518597a2b25296996 Gitweb: http://git.kernel.org/tip/1d17d17484d40f2d5b35c79518597a2b25296996 Author: Ingo Molnar <mingo@kernel.org> AuthorDate: Sat, 4 Aug 2012 21:21:14 +0200 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Sun, 5 Aug 2012 12:37:14 +0200 time: Fix adjustment cleanup bug in timekeeping_adjust() Tetsuo Handa reported that sporadically the system clock starts counting up too quickly which is enough to confuse the hangcheck timer to print a bogus stall warning. Commit 2a8c0883 "time: Move xtime_nsec adjustment underflow handling timekeeping_adjust" overlooked this exit path: } else return; which should really be a proper exit sequence, fixing the bug as a side effect. Also make the flow more readable by properly balancing curly braces. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: john.stultz@linaro.org Cc: a.p.zijlstra@chello.nl Cc: richardcochran@gmail.com Cc: prarit@redhat.com Link: http://lkml.kernel.org/r/20120804192114.GA28347@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/time/timekeeping.c | 31 +++++++++++++++++-------------- 1 files changed, 17 insertions(+), 14 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 2988bc8..e16af19 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -923,20 +923,22 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset) if (likely(error <= interval)) adj = 1; else - adj = timekeeping_bigadjust(tk, error, &interval, - &offset); - } else if (error < -interval) { - /* See comment above, this is just switched for the negative */ - error >>= 2; - if (likely(error >= -interval)) { - adj = -1; - interval = -interval; - offset = -offset; - } else - adj = timekeeping_bigadjust(tk, error, &interval, - &offset); - } else - return; + adj = timekeeping_bigadjust(tk, error, &interval, &offset); + } else { + if (error < -interval) { + /* See comment above, this is just switched for the negative */ + error >>= 2; + if (likely(error >= -interval)) { + adj = -1; + interval = -interval; + offset = -offset; + } else { + adj = timekeeping_bigadjust(tk, error, &interval, &offset); + } + } else { + goto out_adjust; + } + } if (unlikely(tk->clock->maxadj && (tk->mult + adj > tk->clock->mult + tk->clock->maxadj))) { @@ -999,6 +1001,7 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset) tk->xtime_nsec -= offset; tk->ntp_error -= (interval - offset) << tk->ntp_error_shift; +out_adjust: /* * It may be possible that when we entered this function, xtime_nsec * was very small. Further, if we're slightly speeding the clocksource ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust() 2012-08-04 19:21 ` [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust() Ingo Molnar 2012-08-05 0:50 ` Tetsuo Handa 2012-08-05 17:00 ` [tip:timers/urgent] " tip-bot for Ingo Molnar @ 2012-08-06 17:21 ` John Stultz 2 siblings, 0 replies; 8+ messages in thread From: John Stultz @ 2012-08-06 17:21 UTC (permalink / raw) To: Ingo Molnar Cc: Tetsuo Handa, tglx, a.p.zijlstra, richardcochran, prarit, linux-kernel On 08/04/2012 12:21 PM, Ingo Molnar wrote: > * Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > >> Hello. >> >> Bisected to 2a8c0883 "time: Move xtime_nsec adjustment >> underflow handling timekeeping_adjust". >> >> Would you check? > probably caused by this stray return in timekeeping_adjust(): > > } else > return; > > which should really be a proper goto - like the (totally > untested!) patch below. Ooof. Your analysis looks correct, thanks for fixing this, Ingo! And thanks for catching the issue Tetsuo! thanks -john ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-08-06 17:21 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-08-03 14:18 [3.6-rc1] sched: clock counts too quickly Tetsuo Handa 2012-08-04 11:24 ` Tetsuo Handa 2012-08-04 12:10 ` Tetsuo Handa 2012-08-04 13:46 ` Tetsuo Handa 2012-08-04 19:21 ` [PATCH] time: Fix adjustment cleanup bug in timekeeping_adjust() Ingo Molnar 2012-08-05 0:50 ` Tetsuo Handa 2012-08-05 17:00 ` [tip:timers/urgent] " tip-bot for Ingo Molnar 2012-08-06 17:21 ` [PATCH] " John Stultz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox