From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751200AbZEUFDc (ORCPT ); Thu, 21 May 2009 01:03:32 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750726AbZEUFDZ (ORCPT ); Thu, 21 May 2009 01:03:25 -0400 Received: from ns.netcenter.hu ([195.228.254.57]:52943 "EHLO mail.netcenter.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750725AbZEUFDY (ORCPT ); Thu, 21 May 2009 01:03:24 -0400 X-Greylist: delayed 742 seconds by postgrey-1.27 at vger.kernel.org; Thu, 21 May 2009 01:03:23 EDT Message-ID: <013501c9d9cf$161a74a0$0400a8c0@dcccs> From: "Janos Haar" To: Cc: References: <030101c9d92f$d0668600$0400a8c0@dcccs> <20090521025037.GD6839@linux.vnet.ibm.com> Subject: Re: Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies) Pid:902, comm: md1_raid5 Date: Thu, 21 May 2009 06:46:15 +0200 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="ISO-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.3138 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3350 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Paul, Thank you for your attention. Yes, the PC makes 2-3 second "pauses" and drop this message again and again. If i remove the RCU debugging, the message disappears, but the pauses still here, and makes 2-3 load on the idle system. Can i do something? You suggest to use PREEMPT? (This is a server.) Thank you, Janos Haar ----- Original Message ----- From: "Paul E. McKenney" To: "Janos Haar" Cc: Sent: Thursday, May 21, 2009 4:50 AM Subject: Re: Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies) Pid:902, comm: md1_raid5 > On Wed, May 20, 2009 at 11:46:07AM +0200, Janos Haar wrote: >> Hello list, >> >> Somebody know, what is this? >> May 17 23:12:13 gladiator-afth1 kernel: RCU detected CPU 1 stall >> (t=4295904002/751 jiffies) >> May 17 23:12:13 gladiator-afth1 kernel: Pid: 902, comm: md1_raid5 Not >> tainted 2.6.28.10 #1 >> May 17 23:12:13 gladiator-afth1 kernel: Call Trace: >> May 17 23:12:13 gladiator-afth1 kernel: [] ? >> get_timestamp+0x9/0xf >> May 17 23:12:13 gladiator-afth1 kernel: [] >> __rcu_pending+0x64/0x1e2 >> May 17 23:12:13 gladiator-afth1 kernel: [] >> rcu_pending+0x36/0x6f >> May 17 23:12:13 gladiator-afth1 kernel: [] >> update_process_times+0x37/0x5f >> May 17 23:12:13 gladiator-afth1 kernel: [] >> tick_periodic+0x6e/0x7a >> May 17 23:12:13 gladiator-afth1 kernel: [] >> tick_handle_periodic+0x21/0x65 >> May 17 23:12:13 gladiator-afth1 kernel: [] >> smp_apic_timer_interrupt+0x8f/0xad >> May 17 23:12:13 gladiator-afth1 kernel: [] >> apic_timer_interrupt+0x6b/0x70 >> May 17 23:12:13 gladiator-afth1 kernel: [] ? >> _spin_unlock_irqrestore+0x13/0x17 > > One of the following functions is looping in the kernel. If you are > running with HZ=250, it has been looping for about three seconds. > Interrupts are enabled, but preemption must be disabled (perhaps due > to !CONFIG_PREEMPT). > > Thanx, Paul > >> May 17 23:12:13 gladiator-afth1 kernel: [] ? >> bitmap_daemon_work+0x142/0x3b0 >> May 17 23:12:18 gladiator-afth1 kernel: [] ? >> md_check_recovery+0x1b/0x45b >> May 17 23:12:18 gladiator-afth1 kernel: [] ? >> raid5d+0x5d/0x503 >> May 17 23:12:18 gladiator-afth1 kernel: [] ? >> md_thread+0xd5/0xed >> May 17 23:12:18 gladiator-afth1 kernel: [] ? >> autoremove_wake_function+0x0/0x38 >> May 17 23:12:18 gladiator-afth1 kernel: [] ? >> md_thread+0x0/0xed >> May 17 23:12:18 gladiator-afth1 kernel: [] ? >> kthread+0x49/0x76 >> May 17 23:12:18 gladiator-afth1 kernel: [] ? >> child_rip+0xa/0x11 >> May 17 23:12:18 gladiator-afth1 kernel: [] ? >> kthread+0x0/0x76 >> May 17 23:12:18 gladiator-afth1 kernel: [] ? >> child_rip+0x0/0x11 >> Neilbrown from the RAID list suggested to ask someone else...(The mail is >> below.)Thanks,Janos Haar----- Original Message ----- From: "Janos Haar" >> >> To: "Neil Brown" >> Cc: >> Sent: Tuesday, May 19, 2009 12:30 PM >> Subject: Re: RCU detected CPU 1 stall (t=4295904002/751 jiffies) Pid: >> 902, >> comm: md1_raid5 >> >> >>> >>> ----- Original Message ----- From: "Neil Brown" >>> To: "Janos Haar" >>> Cc: >>> Sent: Tuesday, May 19, 2009 3:05 AM >>> Subject: Re: RCU detected CPU 1 stall (t=4295904002/751 jiffies) Pid: >>> 902, >>> comm: md1_raid5 >>> >>> >>>> On Tuesday May 19, janos.haar@netcenter.hu wrote: >>>>> Hello list, Neil, >>>>> >>>>> Somebody can say something about this issue? >>>>> I am not surprised, if it is hardware related, but this is on a brand >>>>> new >>>>> server, so i am looking for a solution... :-) >>>>> May 17 23:12:13 gladiator-afth1 kernel: RCU detected CPU 1 stall >>>>> (t=4295904002/751 jiffies) >>>> >>>> I have no idea what this means. >>>> I've occasionally seen this sort of message in early boot then the >>>> system continued to work perfectly so I figured it was an early-boot >>>> glitch. I suggest asking someone who understands RCU. >>>> >>>>> >>>>> The entire log is here: >>>>> http://download.netcenter.hu/bughunt/20090518/messages >>>>> >>>>> The system is on the md1, and working, but slowly. >>>> >>>> How slowly? Is the slowness due to disk throughput? >>> >>> No no, this is a fresh and idle server. >>> I have configured the disks, raid on another PC, and when it finished, i >>> have copy up the known good, pre-installed sw pack with old 2.6.18. >>> This pack is good, tested on many times, and this reports too this issue >>> on this machine. (first) >>> I have compiled the 2.6.28.10 on it, it takes about 6 hour! 8-/ >>> But the 2.6.28.10 reports this too. >>> >>> The slowness is not disk based, i think, on idle time if i move the >>> selector line in mc, this stopps too for some seconds or i can't type in >>> bash when this happens, and another one RCU message comes to the log... >>> (It happens periodically, independently of i am doing something or not.) >>> >>> I am not sure, it is raid related or not, but the kernel reports only >>> the >>> md1_raid5 pid, not another one. >>> This is why i am asking here first. >>> >>> Thanks anyway. :-) >>> >>>> Have you tested the individual drives and compared that with the >>>> array? >>> >>> This is a brand new hw, with 4x500GB samsung drive, wich reports no >>> problem at all by smart. >>> >>>> >>>> >>>>> If i left the server for 1 day, it will crash without a saved log. >>>> >>>> This is a concern! It usually points to some sort of hardware >>>> problem, but it is very hard to trace. >>>> Is the power supply rated high enough to support all devices? >>> >>> I am using 550W good quality new PS, and the PC uses only 55-65W, >>> measured. ;-) >>> (1x core2duo, 4x hdd, nothing more interesting) >>> >>>> I cannot think of anything else to suggest .. except start swapping >>>> components until the problem goes away... >>> >>> In this way, i need to start with the motherboard. 8-( >>> >>> Thanks a lot, >>> Janos Haar >>> >>>> >>>> NeilBrown >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" >> in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/