All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lee Revell <rlrevell@joe-job.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Pieter Palmers <pieterp@joow.be>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: tasklet_unlock_wait() causes soft lockup with -rt and ieee1394 audio
Date: Sat, 08 Jul 2006 11:18:42 -0400	[thread overview]
Message-ID: <1152371924.4736.169.camel@mindpipe> (raw)

Pieter has found this bug in -rt:

We are experiencing 'soft' deadlocks when running our code (Freebob, 
i.e. userspace lib for firewire audio) on RT kernels. After a few 
seconds of system freeze, I get a kernel panic message that signals a soft lockup.

I've uploaded the photo's of the panic here:
http://freebob.sourceforge.net/old/img_3378.jpg (without flash)
http://freebob.sourceforge.net/old/img_3377.jpg (with flash)
both are of suboptimal quality unfortunately, but all info is readable 
on one or the other.

The problems occur when an ISO stream (receive and/or transmit) is shut 
down in a SCHED_FIFO thread. More precisely when running the freebob 
jackd backend in real-time mode. And even more precise: they only seem 
to occur when jackd is shut down. There are no problems when jackd is 
started without RT scheduling.

I havent been able to reproduce this with other test programs that are 
shutting down streams in a SCHED_FIFO thread.

The problem is not reproducible on non-RT kernels, and it only occurs on those configured for 
PREEMPT_RT. If I use PREEMPT_DESKTOP, there is no problem. The PREEMPT_DESKTOP setting was the only change between the two tests, all other kernel settings (threaded irq's etc...) were unchanged.

My tests are performed on 2.6.17-rt1, but the lockups are confirmed for 
PREEMPT_RT configured kernels 2.6.14 and 2.6.16.

Some extra information:

Lee Revell wrote:

> <...>
>
> It seems that the -rt patch changes tasklet_kill:
>
> Unpatched 2.6.17:
>
> void tasklet_kill(struct tasklet_struct *t)
> {
>         if (in_interrupt())
>                 printk("Attempt to kill tasklet from interrupt\n");
>
>         while (test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
>                 do
>                         yield();
>                 while (test_bit(TASKLET_STATE_SCHED, &t->state));
>         }
>         tasklet_unlock_wait(t);
>         clear_bit(TASKLET_STATE_SCHED, &t->state);
> }
>
> 2.6.17-rt:
>
> void tasklet_kill(struct tasklet_struct *t)
> {
>         if (in_interrupt())
>                 printk("Attempt to kill tasklet from interrupt\n");
>
>         while (test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
>                 do                              msleep(1);
>                 while (test_bit(TASKLET_STATE_SCHED, &t->state));
>         }
>         tasklet_unlock_wait(t);
>         clear_bit(TASKLET_STATE_SCHED, &t->state);
> }
>
> You should ask Ingo & the other -rt developers what the intent of this
> change was.  Obviously it loops forever waiting for the state bit to
> change.

On Thu, 2006-07-06 at 22:14 +0200, Pieter Palmers wrote:

> > I've put the debugging printk's into tasklet_kill. One interesting 
> > remark is that now that they are in place, I had to start/stop jackd 
> > multiple times before deadlock occurs. Without the printk's the machine 
> > always locked up on the first pass. However I stopped after the first 
> > lockup, so maybe this is not really significant.
> > 
> > Anyway, the new tasklet_kill looks like this:
> > 
> > void tasklet_kill(struct tasklet_struct *t)
> > {
> > 	printk("enter tasklet_kill\n");
> > 	if (in_interrupt())
> > 		printk("Attempt to kill tasklet from interrupt\n");
> > 	
> > 	printk("passed interrupt check\n");
> > 
> > 	while (test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
> > 		do
> > 			msleep(1);
> > 		while (test_bit(TASKLET_STATE_SCHED, &t->state));
> > 	}
> > 	printk("passed test_and_set_bit\n");
> > 	
> > 	tasklet_unlock_wait(t);
> > 	printk("passed tasklet_unlock_wait\n");
> > 	
> > 	clear_bit(TASKLET_STATE_SCHED, &t->state);
> > }
> > 
> > And the last line printed before lockup is:
> > "passed test_and_set_bit"
>   
This makes the change in tasklet_unlock_wait() as the prime suspect for this problem.







             reply	other threads:[~2006-07-08 15:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-08 15:18 Lee Revell [this message]
2006-07-09  1:51 ` tasklet_unlock_wait() causes soft lockup with -rt and ieee1394 audio Steven Rostedt
2006-07-09  2:12   ` Lee Revell
2006-07-09 11:17     ` Pieter Palmers
2006-07-10 14:41       ` Daniel Walker
2007-01-21 23:30 ` status of: " Pieter Palmers
2007-01-22  9:10   ` Ingo Molnar
2007-01-25  9:04     ` Pieter Palmers
2007-01-27 10:19     ` Pieter Palmers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1152371924.4736.169.camel@mindpipe \
    --to=rlrevell@joe-job.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=pieterp@joow.be \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.