public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed
* Lots of bugs with current->state = TASK_*INTERRUPTIBLE
@ 2010-01-19 20:29 Steven Rostedt
  2010-01-19 20:58 ` Julia Lawall
  2010-01-21 19:18 ` David Daney
  0 siblings, 2 replies; 18+ messages in thread
From: Steven Rostedt @ 2010-01-19 20:29 UTC (permalink / raw)
  To: LKML
  Cc: kernel-janitors, Peter Zijlstra, Andrew Morton, linux-arch,
	Greg KH, Andy Whitcroft

Peter Zijlstra and I were doing a look over of places that assign
current->state = TASK_*INTERRUPTIBLE, by simply looking at places with:

 $ git grep -A1 'state[[:space:]]*=[[:space:]]*TASK_[^R]'

and it seems there are quite a few places that looks like bugs. To be on
the safe side, everything outside of a run queue lock that sets the
current state to something other than TASK_RUNNING (or dead) should be
using set_current_state().

	current->state = TASK_INTERRUPTIBLE;
	schedule();

is probably OK, but it would not hurt to be consistent. Here's a few
examples of likely bugs:

From drivers/staging/line6/midi.c:

        current->state = TASK_INTERRUPTIBLE;

        while (line6->line6midi->num_active_send_urbs > 0)

From drivers/staging/line6/pod.c:

        current->state = TASK_INTERRUPTIBLE;

        while (param->value == POD_system_invalid) {


Also drivers/macintosh/adb.c looks like there's a bug too.

I'm sure there's others but I stopped looking.

Anyway, this looks like a good janitorial work. Anything that assigns
state outside the rq locks to something other than TASK_RUNNING and that
is not before a schedule() (perhaps even those) should be converted to:

	set_current_task(<state>).

This probably should be checked in checkpatch.pl too, if it is not
already.

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-19 20:29 Lots of bugs with current->state = TASK_*INTERRUPTIBLE Steven Rostedt
@ 2010-01-19 20:58 ` Julia Lawall
  2010-01-19 20:58   ` Julia Lawall
  2010-01-19 21:08   ` Steven Rostedt
  2010-01-21 19:18 ` David Daney
  1 sibling, 2 replies; 18+ messages in thread
From: Julia Lawall @ 2010-01-19 20:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, kernel-janitors, Peter Zijlstra, Andrew Morton, linux-arch,
	Greg KH, Andy Whitcroft

> Anyway, this looks like a good janitorial work. Anything that assigns
> state outside the rq locks to something other than TASK_RUNNING and that
> is not before a schedule() (perhaps even those) should be converted to:
> 
> 	set_current_task(<state>).

Does "not before a schedule" mean not before a schedule_timeout as well?

Also, I assume you mean set_current_state?

julia

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-19 20:58 ` Julia Lawall
@ 2010-01-19 20:58   ` Julia Lawall
  2010-01-19 21:08   ` Steven Rostedt
  1 sibling, 0 replies; 18+ messages in thread
From: Julia Lawall @ 2010-01-19 20:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, kernel-janitors, Peter Zijlstra, Andrew Morton, linux-arch,
	Greg KH, Andy Whitcroft

> Anyway, this looks like a good janitorial work. Anything that assigns
> state outside the rq locks to something other than TASK_RUNNING and that
> is not before a schedule() (perhaps even those) should be converted to:
> 
> 	set_current_task(<state>).

Does "not before a schedule" mean not before a schedule_timeout as well?

Also, I assume you mean set_current_state?

julia

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-19 20:58 ` Julia Lawall
  2010-01-19 20:58   ` Julia Lawall
@ 2010-01-19 21:08   ` Steven Rostedt
  2010-01-21 10:47     ` Julia Lawall
  1 sibling, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2010-01-19 21:08 UTC (permalink / raw)
  To: Julia Lawall
  Cc: LKML, kernel-janitors, Peter Zijlstra, Andrew Morton, linux-arch,
	Greg KH, Andy Whitcroft

On Tue, 2010-01-19 at 21:58 +0100, Julia Lawall wrote:
> > Anyway, this looks like a good janitorial work. Anything that assigns
> > state outside the rq locks to something other than TASK_RUNNING and that
> > is not before a schedule() (perhaps even those) should be converted to:
> > 
> > 	set_current_task(<state>).
> 
> Does "not before a schedule" mean not before a schedule_timeout as well?

Yep.

> 
> Also, I assume you mean set_current_state?

Yep!

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-19 21:08   ` Steven Rostedt
@ 2010-01-21 10:47     ` Julia Lawall
  2010-01-21 10:47       ` Julia Lawall
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Julia Lawall @ 2010-01-21 10:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, kernel-janitors, Peter Zijlstra, Andrew Morton, linux-arch,
	Greg KH, Andy Whitcroft

What about something like the following (drivers/macintosh/adb.c):

        add_wait_queue(&state->wait_queue, &wait);
        current->state = TASK_INTERRUPTIBLE;

	for (;;) {
	        req = state->completed;
                if (req != NULL)
                        state->completed = req->next;
                else if (atomic_read(&state->n_pending) == 0)
                        ret = -EIO;
		if (req != NULL || ret != 0)
			break;

                if (file->f_flags & O_NONBLOCK) {
                        ret = -EAGAIN;
                        break;
		}
                if (signal_pending(current)) {
                        ret = -ERESTARTSYS;
                        break;
                }
                spin_unlock_irqrestore(&state->lock, flags);
                schedule();
        	spin_lock_irqsave(&state->lock, flags);
        }

        current->state = TASK_RUNNING;
        remove_wait_queue(&state->wait_queue, &wait);

There is a call to schedule eventually after the first current->state 
assignment, but it is not right after.

thanks,
julia

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-21 10:47     ` Julia Lawall
@ 2010-01-21 10:47       ` Julia Lawall
  2010-01-21 10:53       ` Frederic Weisbecker
  2010-01-21 17:31       ` Steven Rostedt
  2 siblings, 0 replies; 18+ messages in thread
From: Julia Lawall @ 2010-01-21 10:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, kernel-janitors, Peter Zijlstra, Andrew Morton, linux-arch,
	Greg KH, Andy Whitcroft

What about something like the following (drivers/macintosh/adb.c):

        add_wait_queue(&state->wait_queue, &wait);
        current->state = TASK_INTERRUPTIBLE;

	for (;;) {
	        req = state->completed;
                if (req != NULL)
                        state->completed = req->next;
                else if (atomic_read(&state->n_pending) == 0)
                        ret = -EIO;
		if (req != NULL || ret != 0)
			break;

                if (file->f_flags & O_NONBLOCK) {
                        ret = -EAGAIN;
                        break;
		}
                if (signal_pending(current)) {
                        ret = -ERESTARTSYS;
                        break;
                }
                spin_unlock_irqrestore(&state->lock, flags);
                schedule();
        	spin_lock_irqsave(&state->lock, flags);
        }

        current->state = TASK_RUNNING;
        remove_wait_queue(&state->wait_queue, &wait);

There is a call to schedule eventually after the first current->state 
assignment, but it is not right after.

thanks,
julia

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-21 10:47     ` Julia Lawall
  2010-01-21 10:47       ` Julia Lawall
@ 2010-01-21 10:53       ` Frederic Weisbecker
  2010-01-21 10:56         ` Peter Zijlstra
  2010-01-21 17:31       ` Steven Rostedt
  2 siblings, 1 reply; 18+ messages in thread
From: Frederic Weisbecker @ 2010-01-21 10:53 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Steven Rostedt, LKML, kernel-janitors, Peter Zijlstra,
	Andrew Morton, linux-arch, Greg KH, Andy Whitcroft

On Thu, Jan 21, 2010 at 11:47:41AM +0100, Julia Lawall wrote:
> What about something like the following (drivers/macintosh/adb.c):
> 
>         add_wait_queue(&state->wait_queue, &wait);
>         current->state = TASK_INTERRUPTIBLE;
> 
> 	for (;;) {
> 	        req = state->completed;
>                 if (req != NULL)
>                         state->completed = req->next;
>                 else if (atomic_read(&state->n_pending) == 0)
>                         ret = -EIO;
> 		if (req != NULL || ret != 0)
> 			break;
> 
>                 if (file->f_flags & O_NONBLOCK) {
>                         ret = -EAGAIN;
>                         break;
> 		}
>                 if (signal_pending(current)) {
>                         ret = -ERESTARTSYS;
>                         break;
>                 }
>                 spin_unlock_irqrestore(&state->lock, flags);
>                 schedule();
>         	spin_lock_irqsave(&state->lock, flags);
>         }
> 
>         current->state = TASK_RUNNING;
>         remove_wait_queue(&state->wait_queue, &wait);
> 
> There is a call to schedule eventually after the first current->state 
> assignment, but it is not right after.



Looks fine as spin_unlock includes a memory barrier, IIRC.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-21 10:53       ` Frederic Weisbecker
@ 2010-01-21 10:56         ` Peter Zijlstra
  2010-01-21 10:59           ` Frederic Weisbecker
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2010-01-21 10:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Julia Lawall, Steven Rostedt, LKML, kernel-janitors,
	Andrew Morton, linux-arch, Greg KH, Andy Whitcroft

On Thu, 2010-01-21 at 11:53 +0100, Frederic Weisbecker wrote:

> Looks fine as spin_unlock includes a memory barrier, IIRC.

It doesn't actually, see Documentation/memory-barriers.txt

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-21 10:56         ` Peter Zijlstra
@ 2010-01-21 10:59           ` Frederic Weisbecker
  0 siblings, 0 replies; 18+ messages in thread
From: Frederic Weisbecker @ 2010-01-21 10:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Julia Lawall, Steven Rostedt, LKML, kernel-janitors,
	Andrew Morton, linux-arch, Greg KH, Andy Whitcroft

On Thu, Jan 21, 2010 at 11:56:53AM +0100, Peter Zijlstra wrote:
> On Thu, 2010-01-21 at 11:53 +0100, Frederic Weisbecker wrote:
> 
> > Looks fine as spin_unlock includes a memory barrier, IIRC.
> 
> It doesn't actually, see Documentation/memory-barriers.txt
> 


Doh!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-21 10:47     ` Julia Lawall
  2010-01-21 10:47       ` Julia Lawall
  2010-01-21 10:53       ` Frederic Weisbecker
@ 2010-01-21 17:31       ` Steven Rostedt
  2010-01-21 17:31         ` Steven Rostedt
  2010-01-21 18:12         ` Julia Lawall
  2 siblings, 2 replies; 18+ messages in thread
From: Steven Rostedt @ 2010-01-21 17:31 UTC (permalink / raw)
  To: Julia Lawall
  Cc: LKML, kernel-janitors, Peter Zijlstra, Andrew Morton, linux-arch,
	Greg KH, Andy Whitcroft, Benjamin Herrenschmidt, Paul Mackerras

On Thu, 2010-01-21 at 11:47 +0100, Julia Lawall wrote:
> What about something like the following (drivers/macintosh/adb.c):
> 
>         add_wait_queue(&state->wait_queue, &wait);
>         current->state = TASK_INTERRUPTIBLE;
> 
> 	for (;;) {
> 	        req = state->completed;
>                 if (req != NULL)
>                         state->completed = req->next;
>                 else if (atomic_read(&state->n_pending) == 0)
>                         ret = -EIO;
> 		if (req != NULL || ret != 0)
> 			break;
> 
>                 if (file->f_flags & O_NONBLOCK) {
>                         ret = -EAGAIN;
>                         break;
> 		}
>                 if (signal_pending(current)) {
>                         ret = -ERESTARTSYS;
>                         break;
>                 }
>                 spin_unlock_irqrestore(&state->lock, flags);
>                 schedule();
>         	spin_lock_irqsave(&state->lock, flags);
>         }
> 
>         current->state = TASK_RUNNING;
>         remove_wait_queue(&state->wait_queue, &wait);
> 
> There is a call to schedule eventually after the first current->state 
> assignment, but it is not right after.

I looked at this code in a bit more detail. Seems that it does not need
the set_current_state(), because all activities between the state of the
task and the variables being checked (state->n_pending, et al) are under
the state->lock.

But there should be a comment stating that above the assignment of
current->state. Something like:

	/*
	 * No need for the set_current_state() memory barrier since
	 * all checks between state and wakeups are done under the
	 * state->lock.
	 */
	current->state = TASK_INTERRUPTIBLE;


But I'd rather have the author of this code write that.

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-21 17:31       ` Steven Rostedt
@ 2010-01-21 17:31         ` Steven Rostedt
  2010-01-21 18:12         ` Julia Lawall
  1 sibling, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2010-01-21 17:31 UTC (permalink / raw)
  To: Julia Lawall
  Cc: LKML, kernel-janitors, Peter Zijlstra, Andrew Morton, linux-arch,
	Greg KH, Andy Whitcroft, Benjamin Herrenschmidt, Paul Mackerras

On Thu, 2010-01-21 at 11:47 +0100, Julia Lawall wrote:
> What about something like the following (drivers/macintosh/adb.c):
> 
>         add_wait_queue(&state->wait_queue, &wait);
>         current->state = TASK_INTERRUPTIBLE;
> 
> 	for (;;) {
> 	        req = state->completed;
>                 if (req != NULL)
>                         state->completed = req->next;
>                 else if (atomic_read(&state->n_pending) == 0)
>                         ret = -EIO;
> 		if (req != NULL || ret != 0)
> 			break;
> 
>                 if (file->f_flags & O_NONBLOCK) {
>                         ret = -EAGAIN;
>                         break;
> 		}
>                 if (signal_pending(current)) {
>                         ret = -ERESTARTSYS;
>                         break;
>                 }
>                 spin_unlock_irqrestore(&state->lock, flags);
>                 schedule();
>         	spin_lock_irqsave(&state->lock, flags);
>         }
> 
>         current->state = TASK_RUNNING;
>         remove_wait_queue(&state->wait_queue, &wait);
> 
> There is a call to schedule eventually after the first current->state 
> assignment, but it is not right after.

I looked at this code in a bit more detail. Seems that it does not need
the set_current_state(), because all activities between the state of the
task and the variables being checked (state->n_pending, et al) are under
the state->lock.

But there should be a comment stating that above the assignment of
current->state. Something like:

	/*
	 * No need for the set_current_state() memory barrier since
	 * all checks between state and wakeups are done under the
	 * state->lock.
	 */
	current->state = TASK_INTERRUPTIBLE;


But I'd rather have the author of this code write that.

-- Steve





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-21 17:31       ` Steven Rostedt
  2010-01-21 17:31         ` Steven Rostedt
@ 2010-01-21 18:12         ` Julia Lawall
  1 sibling, 0 replies; 18+ messages in thread
From: Julia Lawall @ 2010-01-21 18:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, kernel-janitors, Peter Zijlstra, Andrew Morton, linux-arch,
	Greg KH, Andy Whitcroft, Benjamin Herrenschmidt, Paul Mackerras

On Thu, 21 Jan 2010, Steven Rostedt wrote:

> On Thu, 2010-01-21 at 11:47 +0100, Julia Lawall wrote:
> > What about something like the following (drivers/macintosh/adb.c):
> > 
> >         add_wait_queue(&state->wait_queue, &wait);
> >         current->state = TASK_INTERRUPTIBLE;
> > 
> > 	for (;;) {
> > 	        req = state->completed;
> >                 if (req != NULL)
> >                         state->completed = req->next;
> >                 else if (atomic_read(&state->n_pending) == 0)
> >                         ret = -EIO;
> > 		if (req != NULL || ret != 0)
> > 			break;
> > 
> >                 if (file->f_flags & O_NONBLOCK) {
> >                         ret = -EAGAIN;
> >                         break;
> > 		}
> >                 if (signal_pending(current)) {
> >                         ret = -ERESTARTSYS;
> >                         break;
> >                 }
> >                 spin_unlock_irqrestore(&state->lock, flags);
> >                 schedule();
> >         	spin_lock_irqsave(&state->lock, flags);
> >         }
> > 
> >         current->state = TASK_RUNNING;
> >         remove_wait_queue(&state->wait_queue, &wait);
> > 
> > There is a call to schedule eventually after the first current->state 
> > assignment, but it is not right after.
> 
> I looked at this code in a bit more detail. Seems that it does not need
> the set_current_state(), because all activities between the state of the
> task and the variables being checked (state->n_pending, et al) are under
> the state->lock.
> 
> But there should be a comment stating that above the assignment of
> current->state. Something like:
> 
> 	/*
> 	 * No need for the set_current_state() memory barrier since
> 	 * all checks between state and wakeups are done under the
> 	 * state->lock.
> 	 */
> 	current->state = TASK_INTERRUPTIBLE;
> 
> 
> But I'd rather have the author of this code write that.

As far as I can tell, state is something that is local to this driver.  So 
is the point that a lock is taken, or that interrupts are turned off?

julia

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-19 20:29 Lots of bugs with current->state = TASK_*INTERRUPTIBLE Steven Rostedt
  2010-01-19 20:58 ` Julia Lawall
@ 2010-01-21 19:18 ` David Daney
  2010-01-21 19:34   ` Steven Rostedt
  1 sibling, 1 reply; 18+ messages in thread
From: David Daney @ 2010-01-21 19:18 UTC (permalink / raw)
  To: rostedt
  Cc: LKML, kernel-janitors, Peter Zijlstra, Andrew Morton, linux-arch,
	Greg KH, Andy Whitcroft, Ralf Baechle, linux-mips

Steven Rostedt wrote:
> Peter Zijlstra and I were doing a look over of places that assign
> current->state = TASK_*INTERRUPTIBLE, by simply looking at places with:
> 
>  $ git grep -A1 'state[[:space:]]*=[[:space:]]*TASK_[^R]'
> 
> and it seems there are quite a few places that looks like bugs. To be on
> the safe side, everything outside of a run queue lock that sets the
> current state to something other than TASK_RUNNING (or dead) should be
> using set_current_state().
> 
> 	current->state = TASK_INTERRUPTIBLE;
> 	schedule();
> 
> is probably OK, but it would not hurt to be consistent. Here's a few
> examples of likely bugs:
> 
[...]

This may be a bit off topic, but exactly which type of barrier should 
set_current_state() be implying?

On MIPS, set_mb() (which is used by set_current_state()) has a full mb().

Some MIPS based processors have a much lighter weight wmb().  Could 
wmb() be used in place of mb() here?

If not, an explanation of the required memory ordering semantics here 
would be appreciated.

I know the documentation says:

     set_current_state() includes a barrier so that the write of
     current->state is correctly serialised wrt the caller's subsequent
     test of whether to actually sleep:

  	set_current_state(TASK_UNINTERRUPTIBLE);
  	if (do_i_need_to_sleep())
  		schedule();


Since the current CPU sees the memory accesses in order, what can be 
happening on other CPUs that would require a full mb()?


Thanks,
David Daney

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-21 19:18 ` David Daney
@ 2010-01-21 19:34   ` Steven Rostedt
  2010-01-21 19:57     ` David Daney
  0 siblings, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2010-01-21 19:34 UTC (permalink / raw)
  To: David Daney
  Cc: LKML, kernel-janitors, Peter Zijlstra, Andrew Morton, linux-arch,
	Greg KH, Andy Whitcroft, Ralf Baechle, linux-mips

On Thu, 2010-01-21 at 11:18 -0800, David Daney wrote:
> Steven Rostedt wrote:
> > Peter Zijlstra and I were doing a look over of places that assign
> > current->state = TASK_*INTERRUPTIBLE, by simply looking at places with:
> > 
> >  $ git grep -A1 'state[[:space:]]*=[[:space:]]*TASK_[^R]'
> > 
> > and it seems there are quite a few places that looks like bugs. To be on
> > the safe side, everything outside of a run queue lock that sets the
> > current state to something other than TASK_RUNNING (or dead) should be
> > using set_current_state().
> > 
> > 	current->state = TASK_INTERRUPTIBLE;
> > 	schedule();
> > 
> > is probably OK, but it would not hurt to be consistent. Here's a few
> > examples of likely bugs:
> > 
> [...]
> 
> This may be a bit off topic, but exactly which type of barrier should 
> set_current_state() be implying?
> 
> On MIPS, set_mb() (which is used by set_current_state()) has a full mb().
> 
> Some MIPS based processors have a much lighter weight wmb().  Could 
> wmb() be used in place of mb() here?

Nope, wmb() is not enough. Below is an explanation.

> 
> If not, an explanation of the required memory ordering semantics here 
> would be appreciated.
> 
> I know the documentation says:
> 
>      set_current_state() includes a barrier so that the write of
>      current->state is correctly serialised wrt the caller's subsequent
>      test of whether to actually sleep:
> 
>   	set_current_state(TASK_UNINTERRUPTIBLE);
>   	if (do_i_need_to_sleep())
>   		schedule();
> 
> 
> Since the current CPU sees the memory accesses in order, what can be 
> happening on other CPUs that would require a full mb()?

Lets look at a hypothetical situation with:

	add_wait_queue();
	current->state = TASK_UNINTERRUPTIBLE;
	smp_wmb();
	if (!x)
		schedule();



Then somewhere we probably have:

	x = 1;
	smp_wmb();
	wake_up(queue);



	   CPU 0			   CPU 1
	------------			-----------
	add_wait_queue();
	(cpu pipeline sees a load
	 of x ahead, and preloads it)
					x = 1;
					smp_wmb();
					wake_up(queue);
					(task on CPU 0 is still at
					 TASK_RUNNING);

	current->state = TASK_INTERRUPTIBLE;
	smp_wmb(); <<-- does not prevent early loading of x
	if (!x)  <<-- returns true
		schedule();

Now the task on CPU 0 missed the wake up.

Note, places that call schedule() are not fast paths, and probably not
called often. Adding the overhead of smp_mb() to ensure correctness is a
small price to pay compared to search for why you have a stuck task that
was never woken up.

Read Documentation/memory-barriers.txt, it will be worth the time you
spend doing so.

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-21 19:34   ` Steven Rostedt
@ 2010-01-21 19:57     ` David Daney
  2010-01-21 20:18       ` Steven Rostedt
  0 siblings, 1 reply; 18+ messages in thread
From: David Daney @ 2010-01-21 19:57 UTC (permalink / raw)
  To: rostedt
  Cc: LKML, kernel-janitors, Peter Zijlstra, Andrew Morton, linux-arch,
	Greg KH, Andy Whitcroft, Ralf Baechle, linux-mips

Steven Rostedt wrote:
> On Thu, 2010-01-21 at 11:18 -0800, David Daney wrote:
>> Steven Rostedt wrote:
>>> Peter Zijlstra and I were doing a look over of places that assign
>>> current->state = TASK_*INTERRUPTIBLE, by simply looking at places with:
>>>
>>>  $ git grep -A1 'state[[:space:]]*=[[:space:]]*TASK_[^R]'
>>>
>>> and it seems there are quite a few places that looks like bugs. To be on
>>> the safe side, everything outside of a run queue lock that sets the
>>> current state to something other than TASK_RUNNING (or dead) should be
>>> using set_current_state().
>>>
>>> 	current->state = TASK_INTERRUPTIBLE;
>>> 	schedule();
>>>
>>> is probably OK, but it would not hurt to be consistent. Here's a few
>>> examples of likely bugs:
>>>
>> [...]
>>
>> This may be a bit off topic, but exactly which type of barrier should 
>> set_current_state() be implying?
>>
>> On MIPS, set_mb() (which is used by set_current_state()) has a full mb().
>>
>> Some MIPS based processors have a much lighter weight wmb().  Could 
>> wmb() be used in place of mb() here?
> 
> Nope, wmb() is not enough. Below is an explanation.
> 
>> If not, an explanation of the required memory ordering semantics here 
>> would be appreciated.
>>
>> I know the documentation says:
>>
>>      set_current_state() includes a barrier so that the write of
>>      current->state is correctly serialised wrt the caller's subsequent
>>      test of whether to actually sleep:
>>
>>   	set_current_state(TASK_UNINTERRUPTIBLE);
>>   	if (do_i_need_to_sleep())
>>   		schedule();
>>
>>
>> Since the current CPU sees the memory accesses in order, what can be 
>> happening on other CPUs that would require a full mb()?
> 
> Lets look at a hypothetical situation with:
> 
> 	add_wait_queue();
> 	current->state = TASK_UNINTERRUPTIBLE;
> 	smp_wmb();
> 	if (!x)
> 		schedule();
> 
> 
> 
> Then somewhere we probably have:
> 
> 	x = 1;
> 	smp_wmb();
> 	wake_up(queue);
> 
> 
> 
> 	   CPU 0			   CPU 1
> 	------------			-----------
> 	add_wait_queue();
> 	(cpu pipeline sees a load
> 	 of x ahead, and preloads it)


This is what I thought.

My cpu (Cavium Octeon) does not have out of order reads, so my wmb() is 
in fact a full mb() from the point of view of the current CPU.  So I 
think I could weaken my bariers in set_current_state() and still get 
correct operation.  However as you say...


> 					x = 1;
> 					smp_wmb();
> 					wake_up(queue);
> 					(task on CPU 0 is still at
> 					 TASK_RUNNING);
> 
> 	current->state = TASK_INTERRUPTIBLE;
> 	smp_wmb(); <<-- does not prevent early loading of x
> 	if (!x)  <<-- returns true
> 		schedule();
> 
> Now the task on CPU 0 missed the wake up.
> 
> Note, places that call schedule() are not fast paths, and probably not
> called often. Adding the overhead of smp_mb() to ensure correctness is a
> small price to pay compared to search for why you have a stuck task that
> was never woken up.

... It may not be worth the trouble.


> 
> Read Documentation/memory-barriers.txt, it will be worth the time you
> spend doing so.

Indeed I have read it.  My questions arise because the semantics of my 
barrier primitives do not map exactly to the semantics prescribed for 
mb() and wmb().

A kernel programmer has only the types of barriers described in 
memory-barriers.txt available.  Since there is no 
mb_on_current_cpu_but_only_order_writes_as_seen_by_other_cpus(), we use 
  a full mb() instead.


Thanks for the explanation Steve,

David Daney

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-21 19:57     ` David Daney
@ 2010-01-21 20:18       ` Steven Rostedt
  2010-01-21 20:18         ` Steven Rostedt
  2010-01-21 20:21         ` David Daney
  0 siblings, 2 replies; 18+ messages in thread
From: Steven Rostedt @ 2010-01-21 20:18 UTC (permalink / raw)
  To: David Daney
  Cc: LKML, kernel-janitors, Peter Zijlstra, Andrew Morton, linux-arch,
	Greg KH, Andy Whitcroft, Ralf Baechle, linux-mips

On Thu, 2010-01-21 at 11:57 -0800, David Daney wrote:

> >> Since the current CPU sees the memory accesses in order, what can be 
> >> happening on other CPUs that would require a full mb()?
> > 
> > Lets look at a hypothetical situation with:
> > 
> > 	add_wait_queue();
> > 	current->state = TASK_UNINTERRUPTIBLE;
> > 	smp_wmb();
> > 	if (!x)
> > 		schedule();
> > 
> > 
> > 
> > Then somewhere we probably have:
> > 
> > 	x = 1;
> > 	smp_wmb();
> > 	wake_up(queue);
> > 
> > 
> > 
> > 	   CPU 0			   CPU 1
> > 	------------			-----------
> > 	add_wait_queue();
> > 	(cpu pipeline sees a load
> > 	 of x ahead, and preloads it)
> 
> 
> This is what I thought.
> 
> My cpu (Cavium Octeon) does not have out of order reads, so my wmb() is 

Can you have reads that are out of order wrt writes? Because the above
does not have out of order reads. It just had a read that came before a
write. The above code could look like:

(hypothetical assembly language)

	ld r2, TASK_UNINTERRUPTIBLE
	st r2, (current->state)
	wmb
	ld r1, (x)
	cmp r1, 0

Is it possible for the CPU to do the load of r1 before storing r2? If
so, then the bug still exists.

-- Steve


> in fact a full mb() from the point of view of the current CPU.  So I 
> think I could weaken my bariers in set_current_state() and still get 
> correct operation.  However as you say...
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-21 20:18       ` Steven Rostedt
@ 2010-01-21 20:18         ` Steven Rostedt
  2010-01-21 20:21         ` David Daney
  1 sibling, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2010-01-21 20:18 UTC (permalink / raw)
  To: David Daney
  Cc: LKML, kernel-janitors, Peter Zijlstra, Andrew Morton, linux-arch,
	Greg KH, Andy Whitcroft, Ralf Baechle, linux-mips

On Thu, 2010-01-21 at 11:57 -0800, David Daney wrote:

> >> Since the current CPU sees the memory accesses in order, what can be 
> >> happening on other CPUs that would require a full mb()?
> > 
> > Lets look at a hypothetical situation with:
> > 
> > 	add_wait_queue();
> > 	current->state = TASK_UNINTERRUPTIBLE;
> > 	smp_wmb();
> > 	if (!x)
> > 		schedule();
> > 
> > 
> > 
> > Then somewhere we probably have:
> > 
> > 	x = 1;
> > 	smp_wmb();
> > 	wake_up(queue);
> > 
> > 
> > 
> > 	   CPU 0			   CPU 1
> > 	------------			-----------
> > 	add_wait_queue();
> > 	(cpu pipeline sees a load
> > 	 of x ahead, and preloads it)
> 
> 
> This is what I thought.
> 
> My cpu (Cavium Octeon) does not have out of order reads, so my wmb() is 

Can you have reads that are out of order wrt writes? Because the above
does not have out of order reads. It just had a read that came before a
write. The above code could look like:

(hypothetical assembly language)

	ld r2, TASK_UNINTERRUPTIBLE
	st r2, (current->state)
	wmb
	ld r1, (x)
	cmp r1, 0

Is it possible for the CPU to do the load of r1 before storing r2? If
so, then the bug still exists.

-- Steve


> in fact a full mb() from the point of view of the current CPU.  So I 
> think I could weaken my bariers in set_current_state() and still get 
> correct operation.  However as you say...
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Lots of bugs with current->state = TASK_*INTERRUPTIBLE
  2010-01-21 20:18       ` Steven Rostedt
  2010-01-21 20:18         ` Steven Rostedt
@ 2010-01-21 20:21         ` David Daney
  1 sibling, 0 replies; 18+ messages in thread
From: David Daney @ 2010-01-21 20:21 UTC (permalink / raw)
  To: rostedt
  Cc: LKML, kernel-janitors, Peter Zijlstra, Andrew Morton, linux-arch,
	Greg KH, Andy Whitcroft, Ralf Baechle, linux-mips

>>
>> This is what I thought.
>>
>> My cpu (Cavium Octeon) does not have out of order reads, so my wmb() is 
> 
> Can you have reads that are out of order wrt writes? Because the above
> does not have out of order reads. It just had a read that came before a
> write. The above code could look like:
> 
> (hypothetical assembly language)
> 
> 	ld r2, TASK_UNINTERRUPTIBLE
> 	st r2, (current->state)
> 	wmb
> 	ld r1, (x)
> 	cmp r1, 0
> 
> Is it possible for the CPU to do the load of r1 before storing r2? If
> so, then the bug still exists.
> 

Indeed it is.  Lockless operations make my head hurt.

Thanks for clarifying.

David Daney



> -- Steve
> 
> 
>> in fact a full mb() from the point of view of the current CPU.  So I 
>> think I could weaken my bariers in set_current_state() and still get 
>> correct operation.  However as you say...
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-01-21 20:21 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-19 20:29 Lots of bugs with current->state = TASK_*INTERRUPTIBLE Steven Rostedt
2010-01-19 20:58 ` Julia Lawall
2010-01-19 20:58   ` Julia Lawall
2010-01-19 21:08   ` Steven Rostedt
2010-01-21 10:47     ` Julia Lawall
2010-01-21 10:47       ` Julia Lawall
2010-01-21 10:53       ` Frederic Weisbecker
2010-01-21 10:56         ` Peter Zijlstra
2010-01-21 10:59           ` Frederic Weisbecker
2010-01-21 17:31       ` Steven Rostedt
2010-01-21 17:31         ` Steven Rostedt
2010-01-21 18:12         ` Julia Lawall
2010-01-21 19:18 ` David Daney
2010-01-21 19:34   ` Steven Rostedt
2010-01-21 19:57     ` David Daney
2010-01-21 20:18       ` Steven Rostedt
2010-01-21 20:18         ` Steven Rostedt
2010-01-21 20:21         ` David Daney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox