From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758909Ab1JGHIr (ORCPT ); Fri, 7 Oct 2011 03:08:47 -0400 Received: from peace.netnation.com ([204.174.223.2]:48283 "EHLO peace.netnation.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755016Ab1JGHIp (ORCPT ); Fri, 7 Oct 2011 03:08:45 -0400 Date: Fri, 7 Oct 2011 00:08:42 -0700 From: Simon Kirby To: Linus Torvalds , Peter Zijlstra Cc: Linux Kernel Mailing List Subject: Re: Linux 3.1-rc9 Message-ID: <20111007070842.GA27555@hostway.ca> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 04, 2011 at 06:40:14PM -0700, Linus Torvalds wrote: > Peter Zijlstra (1): > posix-cpu-timers: Cure SMP wobbles Hello! I upgraded a few boxes from 3.1-rc6+fixes to 3.1-rc9 (actually 538d2882), and now they're hard locking every 15 minutes. Below is a serial console capture of the lockup. I suspect this is from d670ec13. I'll confirm that they stop crashing with that commit reverted... [ 1717.560007] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 [ 1717.560007] Pid: 18034, comm: php Not tainted 3.1.0-rc9-hw+ #45 [ 1717.560007] Call Trace: [ 1717.560007] [] panic+0xba/0x1fb [ 1717.560007] [] ? native_sched_clock+0x20/0x80 [ 1717.560007] [] ? sched_clock+0x9/0x10 [ 1717.560007] [] watchdog_overflow_callback+0xb1/0xc0 [ 1717.560007] [] __perf_event_overflow+0xa2/0x1f0 [ 1717.560007] [] ? perf_event_update_userpage+0x11/0xc0 [ 1717.560007] [] perf_event_overflow+0x14/0x20 [ 1717.560007] [] intel_pmu_handle_irq+0x351/0x5f0 [ 1717.560007] [] perf_event_nmi_handler+0x36/0xb0 [ 1717.560007] [] notifier_call_chain+0x3f/0x80 [ 1717.560007] [] atomic_notifier_call_chain+0x15/0x20 [ 1717.560007] [] notify_die+0x2e/0x30 [ 1717.560007] [] do_nmi+0xa2/0x250 [ 1717.560007] [] nmi+0x20/0x30 [ 1717.560007] [] ? __write_lock_failed+0xd/0x20 [ 1717.560007] <> [] _raw_write_lock_irq+0x19/0x20 [ 1717.560007] [] copy_process+0xb23/0x1270 [ 1717.560007] [] do_fork+0xb2/0x2f0 [ 1717.560007] [] sys_clone+0x23/0x30 [ 1717.560007] [] stub_clone+0x13/0x20 [ 1717.560007] [] ? system_call_fastpath+0x16/0x1b [ 1717.560005] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3 [ 1717.560005] Pid: 18038, comm: httpd Not tainted 3.1.0-rc9-hw+ #45 [ 1717.560005] Call Trace: [ 1717.560005] [] panic+0xba/0x1fb [ 1717.560005] [] ? native_sched_clock+0x20/0x80 [ 1717.560005] [] ? sched_clock+0x9/0x10 [ 1717.560005] [] watchdog_overflow_callback+0xb1/0xc0 [ 1717.560005] [] __perf_event_overflow+0xa2/0x1f0 [ 1717.560005] [] ? perf_event_update_userpage+0x11/0xc0 [ 1717.560005] [] perf_event_overflow+0x14/0x20 [ 1717.560005] [] intel_pmu_handle_irq+0x351/0x5f0 [ 1717.560005] [] perf_event_nmi_handler+0x36/0xb0 [ 1717.560005] [] notifier_call_chain+0x3f/0x80 [ 1717.560005] [] atomic_notifier_call_chain+0x15/0x20 [ 1717.560005] [] notify_die+0x2e/0x30 [ 1717.560005] [] do_nmi+0xa2/0x250 [ 1717.560005] [] nmi+0x20/0x30 [ 1717.560005] [] ? _raw_spin_lock+0x14/0x20 [ 1717.560005] <> [] task_rq_lock+0x55/0xa0 [ 1717.560005] [] task_sched_runtime+0x24/0x90 [ 1717.560005] [] thread_group_cputime+0x74/0xb0 [ 1717.560005] [] thread_group_cputimer+0xa6/0xf0 [ 1717.560005] [] cpu_timer_sample_group+0x28/0x90 [ 1717.560005] [] set_process_cpu_timer+0x33/0x110 [ 1717.560005] [] update_rlimit_cpu+0x3a/0x60 [ 1717.560005] [] do_prlimit+0xfe/0x1f0 [ 1717.560005] [] sys_setrlimit+0x46/0x60 [ 1717.560005] [] system_call_fastpath+0x16/0x1b [ 1717.564005] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1 [ 1717.564005] Pid: 8, comm: migration/1 Not tainted 3.1.0-rc9-hw+ #45 [ 1717.564005] Call Trace: [ 1717.564005] [] panic+0xba/0x1fb [ 1717.564005] [] ? native_sched_clock+0x20/0x80 [ 1717.564005] [] ? sched_clock+0x9/0x10 [ 1717.564005] [] watchdog_overflow_callback+0xb1/0xc0 [ 1717.564005] [] __perf_event_overflow+0xa2/0x1f0 [ 1717.564005] [] ? perf_event_update_userpage+0x11/0xc0 [ 1717.564005] [] perf_event_overflow+0x14/0x20 [ 1717.564005] [] intel_pmu_handle_irq+0x351/0x5f0 [ 1717.564005] [] perf_event_nmi_handler+0x36/0xb0 [ 1717.564005] [] notifier_call_chain+0x3f/0x80 [ 1717.564005] [] atomic_notifier_call_chain+0x15/0x20 [ 1717.564005] [] notify_die+0x2e/0x30 [ 1717.564005] [] do_nmi+0xa2/0x250 [ 1717.564005] [] nmi+0x20/0x30 [ 1717.564005] [] ? _raw_spin_lock+0x10/0x20 [ 1717.564005] <> [] double_rq_lock+0x4d/0x60 [ 1717.564005] [] __migrate_task+0x78/0x120 [ 1717.564005] [] ? __migrate_task+0x120/0x120 [ 1717.564005] [] migration_cpu_stop+0x1e/0x30 [ 1717.564005] [] cpu_stopper_thread+0xcc/0x190 [ 1717.564005] [] ? default_wake_function+0xd/0x10 [ 1717.564005] [] ? __wake_up_common+0x5a/0x90 [ 1717.564005] [] ? cgroup_release_agent+0x1d0/0x1d0 [ 1717.564005] [] ? cgroup_release_agent+0x1d0/0x1d0 [ 1717.564005] [] kthread+0x96/0xb0 [ 1717.564005] [] kernel_thread_helper+0x4/0x10 [ 1717.564005] [] ? kthread_worker_fn+0x190/0x190 [ 1717.564005] [] ? gs_change+0x13/0x13 [ 1717.560007] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2 [ 1717.560007] Pid: 15190, comm: httpd Not tainted 3.1.0-rc9-hw+ #45 [ 1717.560007] Call Trace: [ 1717.560007] [] panic+0xba/0x1fb [ 1717.560007] [] ? native_sched_clock+0x20/0x80 [ 1717.560007] [] ? sched_clock+0x9/0x10 [ 1717.560007] [] watchdog_overflow_callback+0xb1/0xc0 [ 1717.560007] [] __perf_event_overflow+0xa2/0x1f0 [ 1717.560007] [] ? perf_event_update_userpage+0x11/0xc0 [ 1717.560007] [] perf_event_overflow+0x14/0x20 [ 1717.560007] [] intel_pmu_handle_irq+0x351/0x5f0 [ 1717.560007] [] perf_event_nmi_handler+0x36/0xb0 [ 1717.560007] [] notifier_call_chain+0x3f/0x80 [ 1717.560007] [] atomic_notifier_call_chain+0x15/0x20 [ 1717.560007] [] notify_die+0x2e/0x30 [ 1717.560007] [] do_nmi+0xa2/0x250 [ 1717.560007] [] nmi+0x20/0x30 [ 1717.560007] [] ? _raw_spin_lock+0x14/0x20 [ 1717.560007] <> [] update_curr+0x174/0x1a0 [ 1717.560007] [] enqueue_task_fair+0x5c/0x520 [ 1717.560007] [] enqueue_task+0x61/0x70 [ 1717.560007] [] activate_task+0x29/0x40 [ 1717.560007] [] wake_up_new_task+0xb9/0x160 [ 1717.560007] [] do_fork+0x146/0x2f0 [ 1717.560007] [] ? fd_install+0x30/0x60 [ 1717.560007] [] sys_clone+0x23/0x30 [ 1717.560007] [] stub_clone+0x13/0x20 [ 1717.560007] [] ? system_call_fastpath+0x16/0x1b Config: http://0x.ca/sim/ref/3.1-rc9/config Simon-