From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752239Ab1JWWH7 (ORCPT ); Sun, 23 Oct 2011 18:07:59 -0400 Received: from out5.smtp.messagingengine.com ([66.111.4.29]:56253 "EHLO out5.smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752023Ab1JWWH6 (ORCPT ); Sun, 23 Oct 2011 18:07:58 -0400 X-Sasl-enc: PoKgCY43Rnr7pKmp6RbLNf6GHtmUythb54WYa8uo67+d 1319407677 Date: Mon, 24 Oct 2011 00:07:31 +0200 From: Greg KH To: Ruben Kerkhof Cc: linux-kernel@vger.kernel.org, seto.hidetoshi@jp.fujitsu.com, Peter Zijlstra , MINOURA Makoto , Ingo Molnar , stable@kernel.org, =?iso-8859-1?Q?Herv=E9?= Commowick , john stultz , Rand@jasper.es, Andrew Morton , Willy Tarreau , Faidon Liambotis Subject: Re: [stable] 2.6.32.21 - uptime related crashes? Message-ID: <20111023220731.GB402@kroah.com> References: <1310752795.2945.4.camel@work-vm> <20110721072256.GE9216@elte.hu> <1311251098.29152.130.camel@twins> <20110721125008.GF11246@pcnci.linuxbox.cz> <1311252799.29152.147.camel@twins> <20110721184524.GB381@elte.hu> <20110825185616.GA17078@faidon.noc.grnet.gr> <20110830223829.GB17450@kroah.com> <20110904232657.GC6749@tty.gr> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Oct 23, 2011 at 08:31:32PM +0200, Ruben Kerkhof wrote: > On Mon, Sep 5, 2011 at 01:26, Faidon Liambotis wrote: > > On Tue, Aug 30, 2011 at 03:38:29PM -0700, Greg KH wrote: > >> On Thu, Aug 25, 2011 at 09:56:16PM +0300, Faidon Liambotis wrote: > >> > On Thu, Jul 21, 2011 at 08:45:25PM +0200, Ingo Molnar wrote: > >> > > * Peter Zijlstra wrote: > >> > > > >> > > > On Thu, 2011-07-21 at 14:50 +0200, Nikola Ciprich wrote: > >> > > > > thanks for the patch! I'll put this on our testing boxes... > >> > > > > >> > > > With a patch that frobs the starting value close to overflowing I hope, > >> > > > otherwise we'll not hear from you in like 7 months ;-) > >> > > > > >> > > > > Are You going to push this upstream so we can ask Greg to push this to > >> > > > > -stable? > >> > > > > >> > > > Yeah, I think we want to commit this with a -stable tag, Ingo? > >> > > > >> > > yeah - and we also want a Reported-by tag and an explanation of how > >> > > it can crash and why it matters in practice. I can then stick it into > >> > > the urgent branch for Linus. (probably will only hit upstream in the > >> > > merge window though.) > >> > > >> > Has this been pushed or has the problem been solved somehow? Time is > >> > against us on this bug as more boxes will crash as they reach 200 days > >> > of uptime... > >> > > >> > In any case, feel free to use me as a Reported-by, my full report of the > >> > problem being <20110430173905.GA25641@tty.gr>. > >> > > >> > FWIW and if I understand correctly, my symptoms were caused by *two* > >> > different bugs: > >> > a) the 54 bits wraparound at 208 days that Peter fixed above, > >> > b) a kernel crash at ~215 days related to RT tasks, fixed by > >> > 305e6835e05513406fa12820e40e4a8ecb63743c (already in -stable). > >> > >> So, what do I do here as part of the .32-longterm kernel?  Is there a > >> fix that is in Linus's tree that I need to apply here? > >> > >> confused, > > > > Is this even pushed upstream? I checked Linus' tree and the proposed > > patch is *not* merged there. I'm not really sure if it was fixed some > > other way, though. I thought this was intended to be an "urgent" fix or > > something? > > > > Regards, > > Faidon > > I just had two crashes on two different machines, both with an uptime > of 208 days. > Both were 5520's running 2.6.34.8, but with a CONFIG_HZ of 1000 > > 2011-10-23T16:49:18.618029+02:00 phy001 kernel: BUG: soft lockup - > CPU#0 stuck for 17163091968s! [qemu-kvm:16949] > 2011-10-23T16:49:18.618054+02:00 phy001 kernel: Modules linked in: > xt_limit ebt_log ebt_limit ebt_arp ebtable_filter ebtable_nat ebtables > ufs nls_utf8 tun ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q > garp stp llc bonding xt_comment xt_recent ip6t_REJECT > nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm > ioatdma i2c_i801 igb iTCO_wdt dca iTCO_vendor_support serio_raw > i2c_core 3w_9xxx [last unloaded: scsi_wait_scan] > 2011-10-23T16:49:18.618060+02:00 phy001 kernel: CPU 0 > 2011-10-23T16:49:18.618068+02:00 phy001 kernel: Modules linked in: > xt_limit ebt_log ebt_limit ebt_arp ebtable_filter ebtable_nat ebtables > ufs nls_utf8 tun ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q > garp stp llc bonding xt_comment xt_recent ip6t_REJECT > nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm > ioatdma i2c_i801 igb iTCO_wdt dca iTCO_vendor_support serio_raw > i2c_core 3w_9xxx [last unloaded: scsi_wait_scan] > 2011-10-23T16:49:18.618072+02:00 phy001 kernel: > 2011-10-23T16:49:18.618077+02:00 phy001 kernel: Pid: 16949, comm: > qemu-kvm Tainted: G M 2.6.34.8-68.local.fc13.x86_64 #1 > X8DTU/X8DTU > 2011-10-23T16:49:18.618083+02:00 phy001 kernel: RIP: > 0010:[] [] > kvm_arch_vcpu_ioctl_run+0x764/0xa74 [kvm] > 2011-10-23T16:49:18.618086+02:00 phy001 kernel: RSP: > 0018:ffff880bafa29d18 EFLAGS: 00000202 > 2011-10-23T16:49:18.618088+02:00 phy001 kernel: RAX: ffff880002000000 > RBX: ffff880bafa29dc8 RCX: ffff8805e45128a0 > 2011-10-23T16:49:18.618091+02:00 phy001 kernel: RDX: 000000000000cb80 > RSI: 0000000004b2a3a0 RDI: 000000000b630000 > 2011-10-23T16:49:18.618093+02:00 phy001 kernel: RBP: ffffffff8100a60e > R08: 000000000000002b R09: 00000000760d0735 > 2011-10-23T16:49:18.618095+02:00 phy001 kernel: R10: 0000000000000000 > R11: 0000000000000000 R12: 0000000000000001 > 2011-10-23T16:49:18.618097+02:00 phy001 kernel: R13: ffff880bafa29cc8 > R14: ffffffffa007b536 R15: ffff880bafa29ca8 > 2011-10-23T16:49:18.618100+02:00 phy001 kernel: FS: > 00007fe92cd38700(0000) GS:ffff880002000000(0000) > knlGS:fffff880009b8000 > 2011-10-23T16:49:18.618102+02:00 phy001 kernel: CS: 0010 DS: 002b ES: > 002b CR0: 0000000080050033 > 2011-10-23T16:49:18.618104+02:00 phy001 kernel: CR2: 00000000c1a00044 > CR3: 00000006b3f2e000 CR4: 00000000000026e0 > 2011-10-23T16:49:18.618107+02:00 phy001 kernel: DR0: 0000000000000000 > DR1: 0000000000000000 DR2: 0000000000000000 > 2011-10-23T16:49:18.618109+02:00 phy001 kernel: DR3: 0000000000000000 > DR6: 00000000ffff0ff0 DR7: 0000000000000400 > 2011-10-23T16:49:18.618112+02:00 phy001 kernel: Process qemu-kvm (pid: > 16949, threadinfo ffff880bafa28000, task ffff880c242e0000) > 2011-10-23T16:49:18.618114+02:00 phy001 kernel: Stack: > 2011-10-23T16:49:18.618116+02:00 phy001 kernel: ffff88077b1a3ca8 > ffffffff81d3cf38 ffff8805e4513f00 ffff880c242e0000 > 2011-10-23T16:49:18.618119+02:00 phy001 kernel: <0> ffff880c242e0000 > ffff880bafa29fd8 ffff8805e4513ef8 0000000000015fd0 > 2011-10-23T16:49:18.618121+02:00 phy001 kernel: <0> 000000000000cb80 > ffff880c242e0000 ffff880bafa28000 ffff880ab43f4038 > 2011-10-23T16:49:18.618123+02:00 phy001 kernel: Call Trace: > 2011-10-23T16:49:18.618126+02:00 phy001 kernel: [] ? > kvm_vcpu_ioctl+0xfd/0x56e [kvm] > 2011-10-23T16:49:18.618129+02:00 phy001 kernel: [] ? > __switch_to_xtra+0x121/0x141 > 2011-10-23T16:49:18.618131+02:00 phy001 kernel: [] ? > vfs_ioctl+0x32/0xa6 > 2011-10-23T16:49:18.618134+02:00 phy001 kernel: [] ? > do_vfs_ioctl+0x483/0x4c9 > 2011-10-23T16:49:18.618137+02:00 phy001 kernel: [] ? > sys_ioctl+0x56/0x79 > 2011-10-23T16:49:18.618139+02:00 phy001 kernel: [] ? > system_call_fastpath+0x16/0x1b > 2011-10-23T16:49:18.618142+02:00 phy001 kernel: Code: df ff 90 48 01 > 00 00 48 8b 55 90 65 48 8b 04 25 90 e8 00 00 f6 04 10 aa 74 05 e8 05 > 06 f9 e0 f0 41 80 0f 02 fb 66 0f 1f 44 00 00 83 b0 00 00 00 48 8b > b5 68 ff ff ff 83 66 14 ef 48 8b 3b 48 > > Can the necessary fix please be pushed upstream? I agree, again, can someone please do this? greg k-h