From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757879Ab1JGRsx (ORCPT ); Fri, 7 Oct 2011 13:48:53 -0400 Received: from peace.netnation.com ([204.174.223.2]:57782 "EHLO peace.netnation.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750923Ab1JGRsu (ORCPT ); Fri, 7 Oct 2011 13:48:50 -0400 Date: Fri, 7 Oct 2011 10:48:48 -0700 From: Simon Kirby To: Linus Torvalds , Peter Zijlstra Cc: Linux Kernel Mailing List Subject: Re: Linux 3.1-rc9 Message-ID: <20111007174848.GA11011@hostway.ca> References: <20111007070842.GA27555@hostway.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111007070842.GA27555@hostway.ca> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 07, 2011 at 12:08:42AM -0700, Simon Kirby wrote: > On Tue, Oct 04, 2011 at 06:40:14PM -0700, Linus Torvalds wrote: > > > Peter Zijlstra (1): > > posix-cpu-timers: Cure SMP wobbles > > Hello! > > I upgraded a few boxes from 3.1-rc6+fixes to 3.1-rc9 (actually 538d2882), > and now they're hard locking every 15 minutes. Below is a serial console > capture of the lockup. I suspect this is from d670ec13. I'll confirm that > they stop crashing with that commit reverted... Yes, they stopped locking up with d670ec13 reverted. Simon- > [ 1717.560007] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 > [ 1717.560007] Pid: 18034, comm: php Not tainted 3.1.0-rc9-hw+ #45 > [ 1717.560007] Call Trace: > [ 1717.560007] [] panic+0xba/0x1fb > [ 1717.560007] [] ? native_sched_clock+0x20/0x80 > [ 1717.560007] [] ? sched_clock+0x9/0x10 > [ 1717.560007] [] watchdog_overflow_callback+0xb1/0xc0 > [ 1717.560007] [] __perf_event_overflow+0xa2/0x1f0 > [ 1717.560007] [] ? perf_event_update_userpage+0x11/0xc0 > [ 1717.560007] [] perf_event_overflow+0x14/0x20 > [ 1717.560007] [] intel_pmu_handle_irq+0x351/0x5f0 > [ 1717.560007] [] perf_event_nmi_handler+0x36/0xb0 > [ 1717.560007] [] notifier_call_chain+0x3f/0x80 > [ 1717.560007] [] atomic_notifier_call_chain+0x15/0x20 > [ 1717.560007] [] notify_die+0x2e/0x30 > [ 1717.560007] [] do_nmi+0xa2/0x250 > [ 1717.560007] [] nmi+0x20/0x30 > [ 1717.560007] [] ? __write_lock_failed+0xd/0x20 > [ 1717.560007] <> [] _raw_write_lock_irq+0x19/0x20 > [ 1717.560007] [] copy_process+0xb23/0x1270 > [ 1717.560007] [] do_fork+0xb2/0x2f0 > [ 1717.560007] [] sys_clone+0x23/0x30 > [ 1717.560007] [] stub_clone+0x13/0x20 > [ 1717.560007] [] ? system_call_fastpath+0x16/0x1b > [ 1717.560005] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3 > [ 1717.560005] Pid: 18038, comm: httpd Not tainted 3.1.0-rc9-hw+ #45 > [ 1717.560005] Call Trace: > [ 1717.560005] [] panic+0xba/0x1fb > [ 1717.560005] [] ? native_sched_clock+0x20/0x80 > [ 1717.560005] [] ? sched_clock+0x9/0x10 > [ 1717.560005] [] watchdog_overflow_callback+0xb1/0xc0 > [ 1717.560005] [] __perf_event_overflow+0xa2/0x1f0 > [ 1717.560005] [] ? perf_event_update_userpage+0x11/0xc0 > [ 1717.560005] [] perf_event_overflow+0x14/0x20 > [ 1717.560005] [] intel_pmu_handle_irq+0x351/0x5f0 > [ 1717.560005] [] perf_event_nmi_handler+0x36/0xb0 > [ 1717.560005] [] notifier_call_chain+0x3f/0x80 > [ 1717.560005] [] atomic_notifier_call_chain+0x15/0x20 > [ 1717.560005] [] notify_die+0x2e/0x30 > [ 1717.560005] [] do_nmi+0xa2/0x250 > [ 1717.560005] [] nmi+0x20/0x30 > [ 1717.560005] [] ? _raw_spin_lock+0x14/0x20 > [ 1717.560005] <> [] task_rq_lock+0x55/0xa0 > [ 1717.560005] [] task_sched_runtime+0x24/0x90 > [ 1717.560005] [] thread_group_cputime+0x74/0xb0 > [ 1717.560005] [] thread_group_cputimer+0xa6/0xf0 > [ 1717.560005] [] cpu_timer_sample_group+0x28/0x90 > [ 1717.560005] [] set_process_cpu_timer+0x33/0x110 > [ 1717.560005] [] update_rlimit_cpu+0x3a/0x60 > [ 1717.560005] [] do_prlimit+0xfe/0x1f0 > [ 1717.560005] [] sys_setrlimit+0x46/0x60 > [ 1717.560005] [] system_call_fastpath+0x16/0x1b > [ 1717.564005] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1 > [ 1717.564005] Pid: 8, comm: migration/1 Not tainted 3.1.0-rc9-hw+ #45 > [ 1717.564005] Call Trace: > [ 1717.564005] [] panic+0xba/0x1fb > [ 1717.564005] [] ? native_sched_clock+0x20/0x80 > [ 1717.564005] [] ? sched_clock+0x9/0x10 > [ 1717.564005] [] watchdog_overflow_callback+0xb1/0xc0 > [ 1717.564005] [] __perf_event_overflow+0xa2/0x1f0 > [ 1717.564005] [] ? perf_event_update_userpage+0x11/0xc0 > [ 1717.564005] [] perf_event_overflow+0x14/0x20 > [ 1717.564005] [] intel_pmu_handle_irq+0x351/0x5f0 > [ 1717.564005] [] perf_event_nmi_handler+0x36/0xb0 > [ 1717.564005] [] notifier_call_chain+0x3f/0x80 > [ 1717.564005] [] atomic_notifier_call_chain+0x15/0x20 > [ 1717.564005] [] notify_die+0x2e/0x30 > [ 1717.564005] [] do_nmi+0xa2/0x250 > [ 1717.564005] [] nmi+0x20/0x30 > [ 1717.564005] [] ? _raw_spin_lock+0x10/0x20 > [ 1717.564005] <> [] double_rq_lock+0x4d/0x60 > [ 1717.564005] [] __migrate_task+0x78/0x120 > [ 1717.564005] [] ? __migrate_task+0x120/0x120 > [ 1717.564005] [] migration_cpu_stop+0x1e/0x30 > [ 1717.564005] [] cpu_stopper_thread+0xcc/0x190 > [ 1717.564005] [] ? default_wake_function+0xd/0x10 > [ 1717.564005] [] ? __wake_up_common+0x5a/0x90 > [ 1717.564005] [] ? cgroup_release_agent+0x1d0/0x1d0 > [ 1717.564005] [] ? cgroup_release_agent+0x1d0/0x1d0 > [ 1717.564005] [] kthread+0x96/0xb0 > [ 1717.564005] [] kernel_thread_helper+0x4/0x10 > [ 1717.564005] [] ? kthread_worker_fn+0x190/0x190 > [ 1717.564005] [] ? gs_change+0x13/0x13 > [ 1717.560007] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2 > [ 1717.560007] Pid: 15190, comm: httpd Not tainted 3.1.0-rc9-hw+ #45 > [ 1717.560007] Call Trace: > [ 1717.560007] [] panic+0xba/0x1fb > [ 1717.560007] [] ? native_sched_clock+0x20/0x80 > [ 1717.560007] [] ? sched_clock+0x9/0x10 > [ 1717.560007] [] watchdog_overflow_callback+0xb1/0xc0 > [ 1717.560007] [] __perf_event_overflow+0xa2/0x1f0 > [ 1717.560007] [] ? perf_event_update_userpage+0x11/0xc0 > [ 1717.560007] [] perf_event_overflow+0x14/0x20 > [ 1717.560007] [] intel_pmu_handle_irq+0x351/0x5f0 > [ 1717.560007] [] perf_event_nmi_handler+0x36/0xb0 > [ 1717.560007] [] notifier_call_chain+0x3f/0x80 > [ 1717.560007] [] atomic_notifier_call_chain+0x15/0x20 > [ 1717.560007] [] notify_die+0x2e/0x30 > [ 1717.560007] [] do_nmi+0xa2/0x250 > [ 1717.560007] [] nmi+0x20/0x30 > [ 1717.560007] [] ? _raw_spin_lock+0x14/0x20 > [ 1717.560007] <> [] update_curr+0x174/0x1a0 > [ 1717.560007] [] enqueue_task_fair+0x5c/0x520 > [ 1717.560007] [] enqueue_task+0x61/0x70 > [ 1717.560007] [] activate_task+0x29/0x40 > [ 1717.560007] [] wake_up_new_task+0xb9/0x160 > [ 1717.560007] [] do_fork+0x146/0x2f0 > [ 1717.560007] [] ? fd_install+0x30/0x60 > [ 1717.560007] [] sys_clone+0x23/0x30 > [ 1717.560007] [] stub_clone+0x13/0x20 > [ 1717.560007] [] ? system_call_fastpath+0x16/0x1b > > Config: http://0x.ca/sim/ref/3.1-rc9/config