allow preemption in check_task

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* allow preemption in check_task_state
@ 2014-02-10 15:38 Nicholas Mc Guire
  2014-02-10 16:11 ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Nicholas Mc Guire @ 2014-02-10 15:38 UTC (permalink / raw)
  To: linux-rt-users
  Cc: LKML, Sebastian Andrzej Siewior, Steven Rostedt, Peter Zijlstra,
	Carsten Emde, Thomas Gleixner, Andreas Platschek

A lockfree approach to check_task_state

This treates the state as an indicator variable and use it to probe 
saved_state lock free. There is actually no consistency demand on 
state/saved_state but rather a consistency demand on the transitions 
of the two variables but those transition, based on path inspection,
are not independent.

Its probably not faster than the lock/unlock case if uncontended - atleast
it does not show up in benchmark results, but it would never be hit by a 
full pi-boost cycle as there is no contention.

This also was tested against the test-case from Sebastian as well as 
rnning a few scripted gdb breakpoint debugging/single-stepping loops
to trigger this.

Tested-by: Andreas Platschek <platschek@ict.tuwien.ac.at>
Tested-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
---
 kernel/sched/core.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bf93f63..5690ba3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1074,11 +1074,17 @@ static int migration_cpu_stop(void *data);
 static bool check_task_state(struct task_struct *p, long match_state)
 {
 	bool match = false;
+	long state, saved_state;
+
+	/* catch restored state */
+	do {
+		state = p->state;
+		saved_state = p->saved_state;
+		rmb();  /* make sure we actually catch updates */
+	} while (state != p->state);

-	raw_spin_lock_irq(&p->pi_lock);
 	if (p->state == match_state || p->saved_state == match_state)
 		match = true;
-	raw_spin_unlock_irq(&p->pi_lock);

 	return match;
 }
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: allow preemption in check_task_state
  2014-02-10 15:38 allow preemption in check_task_state Nicholas Mc Guire
@ 2014-02-10 16:11 ` Steven Rostedt
  2014-02-10 17:17   ` Nicholas Mc Guire
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2014-02-10 16:11 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: linux-rt-users, LKML, Sebastian Andrzej Siewior, Peter Zijlstra,
	Carsten Emde, Thomas Gleixner, Andreas Platschek

Subject is missing patch number.


On Mon, 10 Feb 2014 16:38:56 +0100
Nicholas Mc Guire <der.herr@hofr.at> wrote:

> 
> A lockfree approach to check_task_state
> 
> This treates the state as an indicator variable and use it to probe 
> saved_state lock free. There is actually no consistency demand on 
> state/saved_state but rather a consistency demand on the transitions 
> of the two variables but those transition, based on path inspection,
> are not independent.
> 
> Its probably not faster than the lock/unlock case if uncontended - atleast
> it does not show up in benchmark results, but it would never be hit by a 
> full pi-boost cycle as there is no contention.
> 
> This also was tested against the test-case from Sebastian as well as 
> rnning a few scripted gdb breakpoint debugging/single-stepping loops
> to trigger this.

To trigger what?

> 
> Tested-by: Andreas Platschek <platschek@ict.tuwien.ac.at>
> Tested-by: Carsten Emde <C.Emde@osadl.org>
> Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
> ---
>  kernel/sched/core.c |   10 ++++++++--
>  1 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bf93f63..5690ba3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1074,11 +1074,17 @@ static int migration_cpu_stop(void *data);
>  static bool check_task_state(struct task_struct *p, long match_state)
>  {
>  	bool match = false;
> +	long state, saved_state;
> +
> +	/* catch restored state */
> +	do {
> +		state = p->state;
> +		saved_state = p->saved_state;
> +		rmb();  /* make sure we actually catch updates */

The problem I have with this is that there's no matching wmb(). Also,
shouldn't that be a smp_rmb(), I don't think we can race with devices
here.

> +	} while (state != p->state);
>  
> -	raw_spin_lock_irq(&p->pi_lock);
>  	if (p->state == match_state || p->saved_state == match_state)
>  		match = true;
> -	raw_spin_unlock_irq(&p->pi_lock);
>  
>  	return match;
>  }


In rtmutex.c we have:

	pi_lock(&self->pi_lock);
	__set_current_state(self->saved_state);
	self->saved_state = TASK_RUNNING;
	pi_unlock(&self->pi_lock);

As there is no wmb() here, it can be very possible that another CPU
will see saved_state as TASK_RUNNING, and current state as
TASK_RUNNING, and miss the update completely.

I would not want to add a wmb() unless there is a real bug with the
check state, as the above is in a very fast path and the check state is
in a slower path.

-- Steve


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: allow preemption in check_task_state
  2014-02-10 16:11 ` Steven Rostedt
@ 2014-02-10 17:17   ` Nicholas Mc Guire
  2014-02-10 17:38     ` Peter Zijlstra
  2014-02-10 17:52     ` Steven Rostedt
  0 siblings, 2 replies; 9+ messages in thread
From: Nicholas Mc Guire @ 2014-02-10 17:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-rt-users, LKML, Sebastian Andrzej Siewior, Peter Zijlstra,
	Carsten Emde, Thomas Gleixner, Andreas Platschek

On Mon, 10 Feb 2014, Steven Rostedt wrote:

> Subject is missing patch number.
> 
> 
> On Mon, 10 Feb 2014 16:38:56 +0100
> Nicholas Mc Guire <der.herr@hofr.at> wrote:
> 
> > 
> > A lockfree approach to check_task_state
> > 
> > This treates the state as an indicator variable and use it to probe 
> > saved_state lock free. There is actually no consistency demand on 
> > state/saved_state but rather a consistency demand on the transitions 
> > of the two variables but those transition, based on path inspection,
> > are not independent.
> > 
> > Its probably not faster than the lock/unlock case if uncontended - atleast
> > it does not show up in benchmark results, but it would never be hit by a 
> > full pi-boost cycle as there is no contention.
> > 
> > This also was tested against the test-case from Sebastian as well as 
> > rnning a few scripted gdb breakpoint debugging/single-stepping loops
> > to trigger this.
> 
> To trigger what?

sorry should have included that in the patch header
the testcase that Sebastian Andrzej Siewior had - available at:
  http://breakpoint.cc/ptrace-test.c
the test-case triggers missing the state update.

> 
> > 
> > Tested-by: Andreas Platschek <platschek@ict.tuwien.ac.at>
> > Tested-by: Carsten Emde <C.Emde@osadl.org>
> > Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
> > ---
> >  kernel/sched/core.c |   10 ++++++++--
> >  1 files changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index bf93f63..5690ba3 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1074,11 +1074,17 @@ static int migration_cpu_stop(void *data);
> >  static bool check_task_state(struct task_struct *p, long match_state)
> >  {
> >  	bool match = false;
> > +	long state, saved_state;
> > +
> > +	/* catch restored state */
> > +	do {
> > +		state = p->state;
> > +		saved_state = p->saved_state;
> > +		rmb();  /* make sure we actually catch updates */
> 
> The problem I have with this is that there's no matching wmb(). Also,
> shouldn't that be a smp_rmb(), I don't think we can race with devices
> here.

Sebastian also mentioned that - I simply was not sure on this - still
not into this deep enough I guess .

> 
> > +	} while (state != p->state);
> >  
> > -	raw_spin_lock_irq(&p->pi_lock);
> >  	if (p->state == match_state || p->saved_state == match_state)
> >  		match = true;
> > -	raw_spin_unlock_irq(&p->pi_lock);
> >  
> >  	return match;
> >  }
> 
> 
> In rtmutex.c we have:
> 
> 	pi_lock(&self->pi_lock);
> 	__set_current_state(self->saved_state);
> 	self->saved_state = TASK_RUNNING;
> 	pi_unlock(&self->pi_lock);
> 
> As there is no wmb() here, it can be very possible that another CPU
> will see saved_state as TASK_RUNNING, and current state as
> TASK_RUNNING, and miss the update completely.
> 
> I would not want to add a wmb() unless there is a real bug with the
> check state, as the above is in a very fast path and the check state is
> in a slower path.
>
maybe I'm missing/missunderstanding something here but
pi_unlock -> arch_spin_unlock is a full mb() 
so once any task did an update of the state the loop should be catching
this update ? if the loop exits before the updat takes effect (pi_unlock)
would that be ncorrect ?

thx!
hofrat
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: allow preemption in check_task_state
  2014-02-10 17:17   ` Nicholas Mc Guire
@ 2014-02-10 17:38     ` Peter Zijlstra
  2014-02-10 18:12       ` Nicholas Mc Guire
  2014-02-10 17:52     ` Steven Rostedt
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2014-02-10 17:38 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: Steven Rostedt, linux-rt-users, LKML, Sebastian Andrzej Siewior,
	Carsten Emde, Thomas Gleixner, Andreas Platschek

On Mon, Feb 10, 2014 at 06:17:12PM +0100, Nicholas Mc Guire wrote:
> maybe I'm missing/missunderstanding something here but
> pi_unlock -> arch_spin_unlock is a full mb() 

Nope, arch_spin_unlock() on x86 is a single add[wb] without LOCK prefix.

The lock and unlock primitives are in general specified to have ACQUIRE
resp. RELEASE semantics.

See Documentation/memory-barriers.txt for far too much head-hurting
details.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: allow preemption in check_task_state
  2014-02-10 17:17   ` Nicholas Mc Guire
  2014-02-10 17:38     ` Peter Zijlstra
@ 2014-02-10 17:52     ` Steven Rostedt
  2014-02-10 18:13       ` Nicholas Mc Guire
  1 sibling, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2014-02-10 17:52 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: linux-rt-users, LKML, Sebastian Andrzej Siewior, Peter Zijlstra,
	Carsten Emde, Thomas Gleixner, Andreas Platschek

On Mon, 10 Feb 2014 18:17:12 +0100
Nicholas Mc Guire <der.herr@hofr.at> wrote:


> > 
> > In rtmutex.c we have:
> > 
> > 	pi_lock(&self->pi_lock);
> > 	__set_current_state(self->saved_state);
> > 	self->saved_state = TASK_RUNNING;
> > 	pi_unlock(&self->pi_lock);
> > 
> > As there is no wmb() here, it can be very possible that another CPU
> > will see saved_state as TASK_RUNNING, and current state as
> > TASK_RUNNING, and miss the update completely.
> > 
> > I would not want to add a wmb() unless there is a real bug with the
> > check state, as the above is in a very fast path and the check state is
> > in a slower path.
> >
> maybe I'm missing/missunderstanding something here but
> pi_unlock -> arch_spin_unlock is a full mb() 
> so once any task did an update of the state the loop should be catching
> this update ? if the loop exits before the updat takes effect (pi_unlock)
> would that be ncorrect ?

Even if the spin locks were full memory barriers, it is still buggy.
The fact that we set current_state to saved_state, and then saved_state
to TASK_RUNNING without any memory barriers in between those two
statements, means that the reader (even with a rmb()) can still see
both as TASK_RUNNING.

-- Steve

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: allow preemption in check_task_state
  2014-02-10 17:38     ` Peter Zijlstra
@ 2014-02-10 18:12       ` Nicholas Mc Guire
  2014-02-10 18:16         ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Nicholas Mc Guire @ 2014-02-10 18:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-rt-users, LKML, Sebastian Andrzej Siewior,
	Carsten Emde, Thomas Gleixner, Andreas Platschek

On Mon, 10 Feb 2014, Peter Zijlstra wrote:

> On Mon, Feb 10, 2014 at 06:17:12PM +0100, Nicholas Mc Guire wrote:
> > maybe I'm missing/missunderstanding something here but
> > pi_unlock -> arch_spin_unlock is a full mb() 
> 
> Nope, arch_spin_unlock() on x86 is a single add[wb] without LOCK prefix.
> 
> The lock and unlock primitives are in general specified to have ACQUIRE
> resp. RELEASE semantics.
> 
> See Documentation/memory-barriers.txt for far too much head-hurting
> details.

I did check that - but from the code check it seems to me to be using a
lock prefix in the fast __add() path and an explicit smp_add() in the slow
path (arch/x86/include/asm/spinlock.h:arch_spin_unlock) the __add from 
arch/x86/include/asm/cmpxchg.h does lock or am I missinterpreting this ?
the other archs I believe were all doing explicit mb()/smp_mb() in the 
arch_spin_unlock - will go check this again.

thx!
hofrat

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: allow preemption in check_task_state
  2014-02-10 17:52     ` Steven Rostedt
@ 2014-02-10 18:13       ` Nicholas Mc Guire
  0 siblings, 0 replies; 9+ messages in thread
From: Nicholas Mc Guire @ 2014-02-10 18:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-rt-users, LKML, Sebastian Andrzej Siewior, Peter Zijlstra,
	Carsten Emde, Thomas Gleixner, Andreas Platschek

On Mon, 10 Feb 2014, Steven Rostedt wrote:

> On Mon, 10 Feb 2014 18:17:12 +0100
> Nicholas Mc Guire <der.herr@hofr.at> wrote:
> 
> 
> > > 
> > > In rtmutex.c we have:
> > > 
> > > 	pi_lock(&self->pi_lock);
> > > 	__set_current_state(self->saved_state);
> > > 	self->saved_state = TASK_RUNNING;
> > > 	pi_unlock(&self->pi_lock);
> > > 
> > > As there is no wmb() here, it can be very possible that another CPU
> > > will see saved_state as TASK_RUNNING, and current state as
> > > TASK_RUNNING, and miss the update completely.
> > > 
> > > I would not want to add a wmb() unless there is a real bug with the
> > > check state, as the above is in a very fast path and the check state is
> > > in a slower path.
> > >
> > maybe I'm missing/missunderstanding something here but
> > pi_unlock -> arch_spin_unlock is a full mb() 
> > so once any task did an update of the state the loop should be catching
> > this update ? if the loop exits before the updat takes effect (pi_unlock)
> > would that be ncorrect ?
> 
> Even if the spin locks were full memory barriers, it is still buggy.
> The fact that we set current_state to saved_state, and then saved_state
> to TASK_RUNNING without any memory barriers in between those two
> statements, means that the reader (even with a rmb()) can still see
> both as TASK_RUNNING.
>
ok - thanks - I think now I got it.

thx!
hofrat 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: allow preemption in check_task_state
  2014-02-10 18:12       ` Nicholas Mc Guire
@ 2014-02-10 18:16         ` Peter Zijlstra
  2014-02-10 18:45           ` Nicholas Mc Guire
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2014-02-10 18:16 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: Steven Rostedt, linux-rt-users, LKML, Sebastian Andrzej Siewior,
	Carsten Emde, Thomas Gleixner, Andreas Platschek

On Mon, Feb 10, 2014 at 07:12:03PM +0100, Nicholas Mc Guire wrote:
> On Mon, 10 Feb 2014, Peter Zijlstra wrote:
> 
> > On Mon, Feb 10, 2014 at 06:17:12PM +0100, Nicholas Mc Guire wrote:
> > > maybe I'm missing/missunderstanding something here but
> > > pi_unlock -> arch_spin_unlock is a full mb() 
> > 
> > Nope, arch_spin_unlock() on x86 is a single add[wb] without LOCK prefix.
> > 
> > The lock and unlock primitives are in general specified to have ACQUIRE
> > resp. RELEASE semantics.
> > 
> > See Documentation/memory-barriers.txt for far too much head-hurting
> > details.
> 
> I did check that - but from the code check it seems to me to be using a
> lock prefix in the fast __add() path and an explicit smp_add() in the slow
> path (arch/x86/include/asm/spinlock.h:arch_spin_unlock) the __add from 
> arch/x86/include/asm/cmpxchg.h does lock or am I missinterpreting this ?
> the other archs I believe were all doing explicit mb()/smp_mb() in the 
> arch_spin_unlock - will go check this again.

It uses UNLOCK_LOCK_PREFIX, which if you look carefully, is normally
always "". Only some 'broken' i386 chips require a LOCK there.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: allow preemption in check_task_state
  2014-02-10 18:16         ` Peter Zijlstra
@ 2014-02-10 18:45           ` Nicholas Mc Guire
  0 siblings, 0 replies; 9+ messages in thread
From: Nicholas Mc Guire @ 2014-02-10 18:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-rt-users, LKML, Sebastian Andrzej Siewior,
	Carsten Emde, Thomas Gleixner, Andreas Platschek

On Mon, 10 Feb 2014, Peter Zijlstra wrote:

> On Mon, Feb 10, 2014 at 07:12:03PM +0100, Nicholas Mc Guire wrote:
> > On Mon, 10 Feb 2014, Peter Zijlstra wrote:
> > 
> > > On Mon, Feb 10, 2014 at 06:17:12PM +0100, Nicholas Mc Guire wrote:
> > > > maybe I'm missing/missunderstanding something here but
> > > > pi_unlock -> arch_spin_unlock is a full mb() 
> > > 
> > > Nope, arch_spin_unlock() on x86 is a single add[wb] without LOCK prefix.
> > > 
> > > The lock and unlock primitives are in general specified to have ACQUIRE
> > > resp. RELEASE semantics.
> > > 
> > > See Documentation/memory-barriers.txt for far too much head-hurting
> > > details.
> > 
> > I did check that - but from the code check it seems to me to be using a
> > lock prefix in the fast __add() path and an explicit smp_add() in the slow
> > path (arch/x86/include/asm/spinlock.h:arch_spin_unlock) the __add from 
> > arch/x86/include/asm/cmpxchg.h does lock or am I missinterpreting this ?
> > the other archs I believe were all doing explicit mb()/smp_mb() in the 
> > arch_spin_unlock - will go check this again.
> 
> It uses UNLOCK_LOCK_PREFIX, which if you look carefully, is normally
> always "". Only some 'broken' i386 chips require a LOCK there.

thanks for the details - will go dig through this again - still 
missing seme bits.

the first patch proposal is taken care of I guess :)

thx!
hofrat

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-02-10 18:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-10 15:38 allow preemption in check_task_state Nicholas Mc Guire
2014-02-10 16:11 ` Steven Rostedt
2014-02-10 17:17   ` Nicholas Mc Guire
2014-02-10 17:38     ` Peter Zijlstra
2014-02-10 18:12       ` Nicholas Mc Guire
2014-02-10 18:16         ` Peter Zijlstra
2014-02-10 18:45           ` Nicholas Mc Guire
2014-02-10 17:52     ` Steven Rostedt
2014-02-10 18:13       ` Nicholas Mc Guire

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).