From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936761AbXGSHYA (ORCPT ); Thu, 19 Jul 2007 03:24:00 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1762371AbXGSHXv (ORCPT ); Thu, 19 Jul 2007 03:23:51 -0400 Received: from smtp2.linux-foundation.org ([207.189.120.14]:44113 "EHLO smtp2.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757591AbXGSHXu (ORCPT ); Thu, 19 Jul 2007 03:23:50 -0400 Date: Thu, 19 Jul 2007 00:22:31 -0700 From: Andrew Morton To: Ingo Molnar Cc: Jeremy Fitzhardinge , linux-kernel@vger.kernel.org, Linus Torvalds , stable@kernel.org, Greg KH , Chris Wright Subject: Re: [patch] fix the softlockup watchdog to actually work Message-Id: <20070719002231.069ebbdd.akpm@linux-foundation.org> In-Reply-To: <20070717154934.GA24231@elte.hu> References: <20070717114453.GA8212@elte.hu> <469CCF8F.4010107@goop.org> <20070717154934.GA24231@elte.hu> X-Mailer: Sylpheed 2.4.1 (GTK+ 2.8.17; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 17 Jul 2007 17:49:34 +0200 Ingo Molnar wrote: > Subject: fix the softlockup watchdog to actually work > From: Ingo Molnar > > this Xen related commit: > > commit 966812dc98e6a7fcdf759cbfa0efab77500a8868 > Author: Jeremy Fitzhardinge > Date: Tue May 8 00:28:02 2007 -0700 > > Ignore stolen time in the softlockup watchdog > > broke the softlockup watchdog to never report any lockups. (!) > > print_timestamp defaults to 0, this makes the following condition > always true: > > if (print_timestamp < (touch_timestamp + 1) || > > and we'll in essence never report soft lockups. > > apparently the functionality of the soft lockup watchdog was never > actually tested with that patch applied ... > > [this is -stable material too.] This seems terribly sensitive. Someone has broken the Vaio (shock, horror). It now has mysterious jerkiness: when leaning on autorepeat it stalls for maybe 0.25 seconds every 1.5 seconds. The stalls are far less than a second. Yet this is enough to trigger random softlockup warnings. Some of those warnings are below. Note that the traces are all pretty useless, as softlockup warnings so often seem to be. Of course, it could be that whatever is causing these pauses really _is_ stalling for a whole second occasionally, dunno. But I didn't notice any long stalls in the console output when a particular storm of softlockup warnings came out. But I'll sit on this patch for a while until this gets sorted out. Meanwhile, please double-check the elapsed-time arithmetic in there, maybe do a bit of runtime testing? [ 78.820961] BUG: soft lockup detected on CPU#0! [ 78.821083] [] update_process_times+0x32/0x54 [ 78.821216] [] tick_sched_timer+0x61/0x9c [ 78.821340] [] hrtimer_interrupt+0x142/0x1d4 [ 78.821463] [] tick_sched_timer+0x0/0x9c [ 78.821587] [] tick_do_broadcast+0x1f/0x3f [ 78.821707] [] tick_handle_oneshot_broadcast+0x47/0x72 [ 78.821852] [] timer_interrupt+0x1a/0x20 [ 78.821968] [] handle_IRQ_event+0x1a/0x3f [ 78.822089] [] handle_edge_irq+0x9d/0xcc [ 78.822206] [] do_IRQ+0x53/0x6c [ 78.822307] [] tick_notify+0x15c/0x208 [ 78.822422] [] common_interrupt+0x23/0x28 [ 78.822539] [] clockevents_notify+0x8/0x36 [ 78.822663] [] acpi_processor_idle+0x1d2/0x36d [ 78.822798] [] cpu_idle+0x44/0x5e [ 78.822900] [] start_kernel+0x26d/0x275 [ 78.823017] [] unknown_bootoption+0x0/0x202 [ 78.823142] ======================= [ 106.282830] BUG: soft lockup detected on CPU#0! [ 106.282967] [] update_process_times+0x32/0x54 [ 106.283116] [] tick_sched_timer+0x61/0x9c [ 106.283255] [] hrtimer_interrupt+0x142/0x1d4 [ 106.283391] [] tick_sched_timer+0x0/0x9c [ 106.283530] [] tick_do_broadcast+0x1f/0x3f [ 106.283663] [] tick_handle_oneshot_broadcast+0x47/0x72 [ 106.283821] [] timer_interrupt+0x1a/0x20 [ 106.283949] [] handle_IRQ_event+0x1a/0x3f [ 106.284084] [] handle_edge_irq+0x9d/0xcc [ 106.284215] [] do_IRQ+0x53/0x6c [ 106.284326] [] tick_notify+0x15c/0x208 [ 106.284455] [] common_interrupt+0x23/0x28 [ 106.284587] [] clockevents_notify+0x8/0x36 [ 106.284725] [] acpi_processor_idle+0x1d2/0x36d [ 106.284875] [] cpu_idle+0x44/0x5e [ 106.284988] [] start_kernel+0x26d/0x275 [ 106.285117] [] unknown_bootoption+0x0/0x202 [ 106.285257] ======================= [ 109.266423] BUG: soft lockup detected on CPU#0! [ 109.266558] [] update_process_times+0x32/0x54 [ 109.266703] [] tick_sched_timer+0x61/0x9c [ 109.270745] [] hrtimer_interrupt+0x142/0x1d4 [ 109.274790] [] tick_sched_timer+0x0/0x9c [ 109.278865] [] tick_do_broadcast+0x1f/0x3f [ 109.282950] [] tick_handle_oneshot_broadcast+0x47/0x72 [ 109.287026] [] timer_interrupt+0x1a/0x20 [ 109.291012] [] handle_IRQ_event+0x1a/0x3f [ 109.294950] [] handle_edge_irq+0x9d/0xcc [ 109.298864] [] do_IRQ+0x53/0x6c [ 109.302818] [] tick_notify+0x15c/0x208 [ 109.306740] [] common_interrupt+0x23/0x28 [ 109.310641] [] clockevents_notify+0x8/0x36 [ 109.314543] [] acpi_processor_idle+0x1d2/0x36d [ 109.318461] [] cpu_idle+0x44/0x5e [ 109.322348] [] start_kernel+0x26d/0x275 [ 109.326267] [] unknown_bootoption+0x0/0x202 [ 109.330188] ======================= (ah, the Vaio breakage seems to be -mm-only, whew)