From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261337AbVHBAxs (ORCPT ); Mon, 1 Aug 2005 20:53:48 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261340AbVHBAxs (ORCPT ); Mon, 1 Aug 2005 20:53:48 -0400 Received: from ms-smtp-02.nyroc.rr.com ([24.24.2.56]:8871 "EHLO ms-smtp-02.nyroc.rr.com") by vger.kernel.org with ESMTP id S261337AbVHBAxr (ORCPT ); Mon, 1 Aug 2005 20:53:47 -0400 Subject: Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 From: Steven Rostedt To: dwalker@mvista.com Cc: Ingo Molnar , linux-kernel@vger.kernel.org In-Reply-To: <1122931238.4623.17.camel@dhcp153.mvista.com> References: <20050730160345.GA3584@elte.hu> <1122920564.6759.15.camel@localhost.localdomain> <1122931238.4623.17.camel@dhcp153.mvista.com> Content-Type: text/plain Organization: Kihon Technologies Date: Mon, 01 Aug 2005 20:53:30 -0400 Message-Id: <1122944010.6759.64.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.3 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2005-08-01 at 14:20 -0700, Daniel Walker wrote: > On Mon, 2005-08-01 at 14:22 -0400, Steven Rostedt wrote: > > Ingo, > > > > What's with the "BUG: possible soft lockup detected on CPU..."? I'm > > getting a bunch of them from the IDE interrupt. It's not locking up, > > but it does things that probably do take some time. Is this really > > necessary? Here's an example dump: > > > > -- Steve > > > > Note: I added the curr=%s:%d,current->comm,current->pid just to see who > > was at fault. > > It means that IRQ 14 is running for a long time as an RT task .. btw, > the curr=%s:%d information duplicates some in the "show all held locks" > section . yeah I know that was redundant (after putting it in), but I wanted to make sure what current was. The locks held wasn't as straight forward as to what was current (I wasn't looking at what produced that, I just noticed the output). > > I could base it off current_sched_time() to only trigger if the task has > actually been running for 10 seconds, instead of just assuming that it > has.. I thought about changing that too. But I'm assuming that you are looking for bugs (like the kjournald as RT) where a task may be in a loop, but higher priority tasks can still preempt it. Putting the check elsewhere will still be screwed up by preempting higher prio tasks. In my custom kernel, I have a wchan field of the task that records where the task calls something that might schedule. This way I can see where things locked up if I don't have a back trace of the task. This field is always zero when it switches to usermode. Something like this can also be used to check how long the process is in kernel mode. If a task is in the kernel for more than 10 seconds without sleeping, that would definitely be a good indication of something wrong. I probably could write something to check for this if people are interested. I wont waste my time if nobody would want it. -- Steve