From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754070AbZEUCup (ORCPT ); Wed, 20 May 2009 22:50:45 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754979AbZEUCui (ORCPT ); Wed, 20 May 2009 22:50:38 -0400 Received: from e3.ny.us.ibm.com ([32.97.182.143]:39985 "EHLO e3.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754944AbZEUCuh (ORCPT ); Wed, 20 May 2009 22:50:37 -0400 Date: Wed, 20 May 2009 19:50:37 -0700 From: "Paul E. McKenney" To: Janos Haar Cc: linux-kernel@vger.kernel.org Subject: Re: Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies) Pid: 902, comm: md1_raid5 Message-ID: <20090521025037.GD6839@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <030101c9d92f$d0668600$0400a8c0@dcccs> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <030101c9d92f$d0668600$0400a8c0@dcccs> User-Agent: Mutt/1.5.15+20070412 (2007-04-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 20, 2009 at 11:46:07AM +0200, Janos Haar wrote: > Hello list, > > Somebody know, what is this? > May 17 23:12:13 gladiator-afth1 kernel: RCU detected CPU 1 stall > (t=4295904002/751 jiffies) > May 17 23:12:13 gladiator-afth1 kernel: Pid: 902, comm: md1_raid5 Not > tainted 2.6.28.10 #1 > May 17 23:12:13 gladiator-afth1 kernel: Call Trace: > May 17 23:12:13 gladiator-afth1 kernel: [] ? > get_timestamp+0x9/0xf > May 17 23:12:13 gladiator-afth1 kernel: [] > __rcu_pending+0x64/0x1e2 > May 17 23:12:13 gladiator-afth1 kernel: [] > rcu_pending+0x36/0x6f > May 17 23:12:13 gladiator-afth1 kernel: [] > update_process_times+0x37/0x5f > May 17 23:12:13 gladiator-afth1 kernel: [] > tick_periodic+0x6e/0x7a > May 17 23:12:13 gladiator-afth1 kernel: [] > tick_handle_periodic+0x21/0x65 > May 17 23:12:13 gladiator-afth1 kernel: [] > smp_apic_timer_interrupt+0x8f/0xad > May 17 23:12:13 gladiator-afth1 kernel: [] > apic_timer_interrupt+0x6b/0x70 > May 17 23:12:13 gladiator-afth1 kernel: [] ? > _spin_unlock_irqrestore+0x13/0x17 One of the following functions is looping in the kernel. If you are running with HZ=250, it has been looping for about three seconds. Interrupts are enabled, but preemption must be disabled (perhaps due to !CONFIG_PREEMPT). Thanx, Paul > May 17 23:12:13 gladiator-afth1 kernel: [] ? > bitmap_daemon_work+0x142/0x3b0 > May 17 23:12:18 gladiator-afth1 kernel: [] ? > md_check_recovery+0x1b/0x45b > May 17 23:12:18 gladiator-afth1 kernel: [] ? > raid5d+0x5d/0x503 > May 17 23:12:18 gladiator-afth1 kernel: [] ? > md_thread+0xd5/0xed > May 17 23:12:18 gladiator-afth1 kernel: [] ? > autoremove_wake_function+0x0/0x38 > May 17 23:12:18 gladiator-afth1 kernel: [] ? > md_thread+0x0/0xed > May 17 23:12:18 gladiator-afth1 kernel: [] ? > kthread+0x49/0x76 > May 17 23:12:18 gladiator-afth1 kernel: [] ? > child_rip+0xa/0x11 > May 17 23:12:18 gladiator-afth1 kernel: [] ? > kthread+0x0/0x76 > May 17 23:12:18 gladiator-afth1 kernel: [] ? > child_rip+0x0/0x11 > Neilbrown from the RAID list suggested to ask someone else...(The mail is > below.)Thanks,Janos Haar----- Original Message ----- From: "Janos Haar" > > To: "Neil Brown" > Cc: > Sent: Tuesday, May 19, 2009 12:30 PM > Subject: Re: RCU detected CPU 1 stall (t=4295904002/751 jiffies) Pid: 902, > comm: md1_raid5 > > >> >> ----- Original Message ----- From: "Neil Brown" >> To: "Janos Haar" >> Cc: >> Sent: Tuesday, May 19, 2009 3:05 AM >> Subject: Re: RCU detected CPU 1 stall (t=4295904002/751 jiffies) Pid: 902, >> comm: md1_raid5 >> >> >>> On Tuesday May 19, janos.haar@netcenter.hu wrote: >>>> Hello list, Neil, >>>> >>>> Somebody can say something about this issue? >>>> I am not surprised, if it is hardware related, but this is on a brand >>>> new >>>> server, so i am looking for a solution... :-) >>>> May 17 23:12:13 gladiator-afth1 kernel: RCU detected CPU 1 stall >>>> (t=4295904002/751 jiffies) >>> >>> I have no idea what this means. >>> I've occasionally seen this sort of message in early boot then the >>> system continued to work perfectly so I figured it was an early-boot >>> glitch. I suggest asking someone who understands RCU. >>> >>>> >>>> The entire log is here: >>>> http://download.netcenter.hu/bughunt/20090518/messages >>>> >>>> The system is on the md1, and working, but slowly. >>> >>> How slowly? Is the slowness due to disk throughput? >> >> No no, this is a fresh and idle server. >> I have configured the disks, raid on another PC, and when it finished, i >> have copy up the known good, pre-installed sw pack with old 2.6.18. >> This pack is good, tested on many times, and this reports too this issue >> on this machine. (first) >> I have compiled the 2.6.28.10 on it, it takes about 6 hour! 8-/ >> But the 2.6.28.10 reports this too. >> >> The slowness is not disk based, i think, on idle time if i move the >> selector line in mc, this stopps too for some seconds or i can't type in >> bash when this happens, and another one RCU message comes to the log... >> (It happens periodically, independently of i am doing something or not.) >> >> I am not sure, it is raid related or not, but the kernel reports only the >> md1_raid5 pid, not another one. >> This is why i am asking here first. >> >> Thanks anyway. :-) >> >>> Have you tested the individual drives and compared that with the >>> array? >> >> This is a brand new hw, with 4x500GB samsung drive, wich reports no >> problem at all by smart. >> >>> >>> >>>> If i left the server for 1 day, it will crash without a saved log. >>> >>> This is a concern! It usually points to some sort of hardware >>> problem, but it is very hard to trace. >>> Is the power supply rated high enough to support all devices? >> >> I am using 550W good quality new PS, and the PC uses only 55-65W, >> measured. ;-) >> (1x core2duo, 4x hdd, nothing more interesting) >> >>> I cannot think of anything else to suggest .. except start swapping >>> components until the problem goes away... >> >> In this way, i need to start with the motherboard. 8-( >> >> Thanks a lot, >> Janos Haar >> >>> >>> NeilBrown >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/