From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754589Ab1GFGRR (ORCPT ); Wed, 6 Jul 2011 02:17:17 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:44014 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754366Ab1GFGRQ (ORCPT ); Wed, 6 Jul 2011 02:17:16 -0400 Date: Tue, 5 Jul 2011 23:15:15 -0700 From: Andrew Morton To: john stultz Cc: Faidon Liambotis , linux-kernel@vger.kernel.org, stable@kernel.org, Nikola Ciprich , seto.hidetoshi@jp.fujitsu.com, =?ISO-8859-1?Q?Herv=E9?= Commowick , Willy Tarreau , Randy Dunlap , Greg KH , Ben Hutchings , Apollon Oikonomopoulos Subject: Re: 2.6.32.21 - uptime related crashes? Message-Id: <20110705231515.95bc758f.akpm@linux-foundation.org> In-Reply-To: References: <20110428082625.GA23293@pcnci.linuxbox.cz> <20110428183434.GG30645@1wt.eu> <20110429100200.GB23293@pcnci.linuxbox.cz> <20110430093605.GA10529@1wt.eu> <20110430173905.GA25641@tty.gr> X-Mailer: Sylpheed 2.7.1 (GTK+ 2.18.9; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 27 Jun 2011 19:25:31 -0700 john stultz wrote: > On Sat, Apr 30, 2011 at 10:39 AM, Faidon Liambotis wrote: > > We too experienced problems with just the G6 blades at near 215 days uptime > > (on the 19th of April), all at the same time. From our investigation, it > > seems that their cpu_clocks jumped suddenly far in the future and then > > almost immediately rolled over due to wrapping around 64-bits. > > > > Although all of their (G6s) clocks wrapped around *at the same time*, only > > one > > of them actually crashed at the time, with a second one crashing just a few > > days later, on the 28th. > > > > Three of them had the following on their logs: > > Apr 18 20:56:07 hn-05 kernel: [17966378.581971] tap0: no IPv6 routers > > present > > Apr 19 10:15:42 hn-05 kernel: [18446743935.365550] BUG: soft lockup - CPU#4 > > stuck for 17163091968s! [kvm:25913] > > So, did this issue ever get any traction or get resolved? > https://bugzilla.kernel.org/show_bug.cgi?id=37382 is similar - a divide-by-zero in update_sg_lb_stats() after 209 days uptime. Can we change this stuff so that the timers wrap after 10 minutes uptime, like INITIAL_JIFFIES?