* allow preemption in check_task_state
@ 2014-02-10 15:38 Nicholas Mc Guire
2014-02-10 16:11 ` Steven Rostedt
0 siblings, 1 reply; 9+ messages in thread
From: Nicholas Mc Guire @ 2014-02-10 15:38 UTC (permalink / raw)
To: linux-rt-users
Cc: LKML, Sebastian Andrzej Siewior, Steven Rostedt, Peter Zijlstra,
Carsten Emde, Thomas Gleixner, Andreas Platschek
A lockfree approach to check_task_state
This treates the state as an indicator variable and use it to probe
saved_state lock free. There is actually no consistency demand on
state/saved_state but rather a consistency demand on the transitions
of the two variables but those transition, based on path inspection,
are not independent.
Its probably not faster than the lock/unlock case if uncontended - atleast
it does not show up in benchmark results, but it would never be hit by a
full pi-boost cycle as there is no contention.
This also was tested against the test-case from Sebastian as well as
rnning a few scripted gdb breakpoint debugging/single-stepping loops
to trigger this.
Tested-by: Andreas Platschek <platschek@ict.tuwien.ac.at>
Tested-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
---
kernel/sched/core.c | 10 ++++++++--
1 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bf93f63..5690ba3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1074,11 +1074,17 @@ static int migration_cpu_stop(void *data);
static bool check_task_state(struct task_struct *p, long match_state)
{
bool match = false;
+ long state, saved_state;
+
+ /* catch restored state */
+ do {
+ state = p->state;
+ saved_state = p->saved_state;
+ rmb(); /* make sure we actually catch updates */
+ } while (state != p->state);
- raw_spin_lock_irq(&p->pi_lock);
if (p->state == match_state || p->saved_state == match_state)
match = true;
- raw_spin_unlock_irq(&p->pi_lock);
return match;
}
--
1.7.2.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: allow preemption in check_task_state
2014-02-10 15:38 allow preemption in check_task_state Nicholas Mc Guire
@ 2014-02-10 16:11 ` Steven Rostedt
2014-02-10 17:17 ` Nicholas Mc Guire
0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2014-02-10 16:11 UTC (permalink / raw)
To: Nicholas Mc Guire
Cc: linux-rt-users, LKML, Sebastian Andrzej Siewior, Peter Zijlstra,
Carsten Emde, Thomas Gleixner, Andreas Platschek
Subject is missing patch number.
On Mon, 10 Feb 2014 16:38:56 +0100
Nicholas Mc Guire <der.herr@hofr.at> wrote:
>
> A lockfree approach to check_task_state
>
> This treates the state as an indicator variable and use it to probe
> saved_state lock free. There is actually no consistency demand on
> state/saved_state but rather a consistency demand on the transitions
> of the two variables but those transition, based on path inspection,
> are not independent.
>
> Its probably not faster than the lock/unlock case if uncontended - atleast
> it does not show up in benchmark results, but it would never be hit by a
> full pi-boost cycle as there is no contention.
>
> This also was tested against the test-case from Sebastian as well as
> rnning a few scripted gdb breakpoint debugging/single-stepping loops
> to trigger this.
To trigger what?
>
> Tested-by: Andreas Platschek <platschek@ict.tuwien.ac.at>
> Tested-by: Carsten Emde <C.Emde@osadl.org>
> Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
> ---
> kernel/sched/core.c | 10 ++++++++--
> 1 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bf93f63..5690ba3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1074,11 +1074,17 @@ static int migration_cpu_stop(void *data);
> static bool check_task_state(struct task_struct *p, long match_state)
> {
> bool match = false;
> + long state, saved_state;
> +
> + /* catch restored state */
> + do {
> + state = p->state;
> + saved_state = p->saved_state;
> + rmb(); /* make sure we actually catch updates */
The problem I have with this is that there's no matching wmb(). Also,
shouldn't that be a smp_rmb(), I don't think we can race with devices
here.
> + } while (state != p->state);
>
> - raw_spin_lock_irq(&p->pi_lock);
> if (p->state == match_state || p->saved_state == match_state)
> match = true;
> - raw_spin_unlock_irq(&p->pi_lock);
>
> return match;
> }
In rtmutex.c we have:
pi_lock(&self->pi_lock);
__set_current_state(self->saved_state);
self->saved_state = TASK_RUNNING;
pi_unlock(&self->pi_lock);
As there is no wmb() here, it can be very possible that another CPU
will see saved_state as TASK_RUNNING, and current state as
TASK_RUNNING, and miss the update completely.
I would not want to add a wmb() unless there is a real bug with the
check state, as the above is in a very fast path and the check state is
in a slower path.
-- Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: allow preemption in check_task_state
2014-02-10 16:11 ` Steven Rostedt
@ 2014-02-10 17:17 ` Nicholas Mc Guire
2014-02-10 17:38 ` Peter Zijlstra
2014-02-10 17:52 ` Steven Rostedt
0 siblings, 2 replies; 9+ messages in thread
From: Nicholas Mc Guire @ 2014-02-10 17:17 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-rt-users, LKML, Sebastian Andrzej Siewior, Peter Zijlstra,
Carsten Emde, Thomas Gleixner, Andreas Platschek
On Mon, 10 Feb 2014, Steven Rostedt wrote:
> Subject is missing patch number.
>
>
> On Mon, 10 Feb 2014 16:38:56 +0100
> Nicholas Mc Guire <der.herr@hofr.at> wrote:
>
> >
> > A lockfree approach to check_task_state
> >
> > This treates the state as an indicator variable and use it to probe
> > saved_state lock free. There is actually no consistency demand on
> > state/saved_state but rather a consistency demand on the transitions
> > of the two variables but those transition, based on path inspection,
> > are not independent.
> >
> > Its probably not faster than the lock/unlock case if uncontended - atleast
> > it does not show up in benchmark results, but it would never be hit by a
> > full pi-boost cycle as there is no contention.
> >
> > This also was tested against the test-case from Sebastian as well as
> > rnning a few scripted gdb breakpoint debugging/single-stepping loops
> > to trigger this.
>
> To trigger what?
sorry should have included that in the patch header
the testcase that Sebastian Andrzej Siewior had - available at:
http://breakpoint.cc/ptrace-test.c
the test-case triggers missing the state update.
>
> >
> > Tested-by: Andreas Platschek <platschek@ict.tuwien.ac.at>
> > Tested-by: Carsten Emde <C.Emde@osadl.org>
> > Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
> > ---
> > kernel/sched/core.c | 10 ++++++++--
> > 1 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index bf93f63..5690ba3 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1074,11 +1074,17 @@ static int migration_cpu_stop(void *data);
> > static bool check_task_state(struct task_struct *p, long match_state)
> > {
> > bool match = false;
> > + long state, saved_state;
> > +
> > + /* catch restored state */
> > + do {
> > + state = p->state;
> > + saved_state = p->saved_state;
> > + rmb(); /* make sure we actually catch updates */
>
> The problem I have with this is that there's no matching wmb(). Also,
> shouldn't that be a smp_rmb(), I don't think we can race with devices
> here.
Sebastian also mentioned that - I simply was not sure on this - still
not into this deep enough I guess .
>
> > + } while (state != p->state);
> >
> > - raw_spin_lock_irq(&p->pi_lock);
> > if (p->state == match_state || p->saved_state == match_state)
> > match = true;
> > - raw_spin_unlock_irq(&p->pi_lock);
> >
> > return match;
> > }
>
>
> In rtmutex.c we have:
>
> pi_lock(&self->pi_lock);
> __set_current_state(self->saved_state);
> self->saved_state = TASK_RUNNING;
> pi_unlock(&self->pi_lock);
>
> As there is no wmb() here, it can be very possible that another CPU
> will see saved_state as TASK_RUNNING, and current state as
> TASK_RUNNING, and miss the update completely.
>
> I would not want to add a wmb() unless there is a real bug with the
> check state, as the above is in a very fast path and the check state is
> in a slower path.
>
maybe I'm missing/missunderstanding something here but
pi_unlock -> arch_spin_unlock is a full mb()
so once any task did an update of the state the loop should be catching
this update ? if the loop exits before the updat takes effect (pi_unlock)
would that be ncorrect ?
thx!
hofrat
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: allow preemption in check_task_state
2014-02-10 17:17 ` Nicholas Mc Guire
@ 2014-02-10 17:38 ` Peter Zijlstra
2014-02-10 18:12 ` Nicholas Mc Guire
2014-02-10 17:52 ` Steven Rostedt
1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2014-02-10 17:38 UTC (permalink / raw)
To: Nicholas Mc Guire
Cc: Steven Rostedt, linux-rt-users, LKML, Sebastian Andrzej Siewior,
Carsten Emde, Thomas Gleixner, Andreas Platschek
On Mon, Feb 10, 2014 at 06:17:12PM +0100, Nicholas Mc Guire wrote:
> maybe I'm missing/missunderstanding something here but
> pi_unlock -> arch_spin_unlock is a full mb()
Nope, arch_spin_unlock() on x86 is a single add[wb] without LOCK prefix.
The lock and unlock primitives are in general specified to have ACQUIRE
resp. RELEASE semantics.
See Documentation/memory-barriers.txt for far too much head-hurting
details.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: allow preemption in check_task_state
2014-02-10 17:17 ` Nicholas Mc Guire
2014-02-10 17:38 ` Peter Zijlstra
@ 2014-02-10 17:52 ` Steven Rostedt
2014-02-10 18:13 ` Nicholas Mc Guire
1 sibling, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2014-02-10 17:52 UTC (permalink / raw)
To: Nicholas Mc Guire
Cc: linux-rt-users, LKML, Sebastian Andrzej Siewior, Peter Zijlstra,
Carsten Emde, Thomas Gleixner, Andreas Platschek
On Mon, 10 Feb 2014 18:17:12 +0100
Nicholas Mc Guire <der.herr@hofr.at> wrote:
> >
> > In rtmutex.c we have:
> >
> > pi_lock(&self->pi_lock);
> > __set_current_state(self->saved_state);
> > self->saved_state = TASK_RUNNING;
> > pi_unlock(&self->pi_lock);
> >
> > As there is no wmb() here, it can be very possible that another CPU
> > will see saved_state as TASK_RUNNING, and current state as
> > TASK_RUNNING, and miss the update completely.
> >
> > I would not want to add a wmb() unless there is a real bug with the
> > check state, as the above is in a very fast path and the check state is
> > in a slower path.
> >
> maybe I'm missing/missunderstanding something here but
> pi_unlock -> arch_spin_unlock is a full mb()
> so once any task did an update of the state the loop should be catching
> this update ? if the loop exits before the updat takes effect (pi_unlock)
> would that be ncorrect ?
Even if the spin locks were full memory barriers, it is still buggy.
The fact that we set current_state to saved_state, and then saved_state
to TASK_RUNNING without any memory barriers in between those two
statements, means that the reader (even with a rmb()) can still see
both as TASK_RUNNING.
-- Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: allow preemption in check_task_state
2014-02-10 17:38 ` Peter Zijlstra
@ 2014-02-10 18:12 ` Nicholas Mc Guire
2014-02-10 18:16 ` Peter Zijlstra
0 siblings, 1 reply; 9+ messages in thread
From: Nicholas Mc Guire @ 2014-02-10 18:12 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Steven Rostedt, linux-rt-users, LKML, Sebastian Andrzej Siewior,
Carsten Emde, Thomas Gleixner, Andreas Platschek
On Mon, 10 Feb 2014, Peter Zijlstra wrote:
> On Mon, Feb 10, 2014 at 06:17:12PM +0100, Nicholas Mc Guire wrote:
> > maybe I'm missing/missunderstanding something here but
> > pi_unlock -> arch_spin_unlock is a full mb()
>
> Nope, arch_spin_unlock() on x86 is a single add[wb] without LOCK prefix.
>
> The lock and unlock primitives are in general specified to have ACQUIRE
> resp. RELEASE semantics.
>
> See Documentation/memory-barriers.txt for far too much head-hurting
> details.
I did check that - but from the code check it seems to me to be using a
lock prefix in the fast __add() path and an explicit smp_add() in the slow
path (arch/x86/include/asm/spinlock.h:arch_spin_unlock) the __add from
arch/x86/include/asm/cmpxchg.h does lock or am I missinterpreting this ?
the other archs I believe were all doing explicit mb()/smp_mb() in the
arch_spin_unlock - will go check this again.
thx!
hofrat
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: allow preemption in check_task_state
2014-02-10 17:52 ` Steven Rostedt
@ 2014-02-10 18:13 ` Nicholas Mc Guire
0 siblings, 0 replies; 9+ messages in thread
From: Nicholas Mc Guire @ 2014-02-10 18:13 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-rt-users, LKML, Sebastian Andrzej Siewior, Peter Zijlstra,
Carsten Emde, Thomas Gleixner, Andreas Platschek
On Mon, 10 Feb 2014, Steven Rostedt wrote:
> On Mon, 10 Feb 2014 18:17:12 +0100
> Nicholas Mc Guire <der.herr@hofr.at> wrote:
>
>
> > >
> > > In rtmutex.c we have:
> > >
> > > pi_lock(&self->pi_lock);
> > > __set_current_state(self->saved_state);
> > > self->saved_state = TASK_RUNNING;
> > > pi_unlock(&self->pi_lock);
> > >
> > > As there is no wmb() here, it can be very possible that another CPU
> > > will see saved_state as TASK_RUNNING, and current state as
> > > TASK_RUNNING, and miss the update completely.
> > >
> > > I would not want to add a wmb() unless there is a real bug with the
> > > check state, as the above is in a very fast path and the check state is
> > > in a slower path.
> > >
> > maybe I'm missing/missunderstanding something here but
> > pi_unlock -> arch_spin_unlock is a full mb()
> > so once any task did an update of the state the loop should be catching
> > this update ? if the loop exits before the updat takes effect (pi_unlock)
> > would that be ncorrect ?
>
> Even if the spin locks were full memory barriers, it is still buggy.
> The fact that we set current_state to saved_state, and then saved_state
> to TASK_RUNNING without any memory barriers in between those two
> statements, means that the reader (even with a rmb()) can still see
> both as TASK_RUNNING.
>
ok - thanks - I think now I got it.
thx!
hofrat
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: allow preemption in check_task_state
2014-02-10 18:12 ` Nicholas Mc Guire
@ 2014-02-10 18:16 ` Peter Zijlstra
2014-02-10 18:45 ` Nicholas Mc Guire
0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2014-02-10 18:16 UTC (permalink / raw)
To: Nicholas Mc Guire
Cc: Steven Rostedt, linux-rt-users, LKML, Sebastian Andrzej Siewior,
Carsten Emde, Thomas Gleixner, Andreas Platschek
On Mon, Feb 10, 2014 at 07:12:03PM +0100, Nicholas Mc Guire wrote:
> On Mon, 10 Feb 2014, Peter Zijlstra wrote:
>
> > On Mon, Feb 10, 2014 at 06:17:12PM +0100, Nicholas Mc Guire wrote:
> > > maybe I'm missing/missunderstanding something here but
> > > pi_unlock -> arch_spin_unlock is a full mb()
> >
> > Nope, arch_spin_unlock() on x86 is a single add[wb] without LOCK prefix.
> >
> > The lock and unlock primitives are in general specified to have ACQUIRE
> > resp. RELEASE semantics.
> >
> > See Documentation/memory-barriers.txt for far too much head-hurting
> > details.
>
> I did check that - but from the code check it seems to me to be using a
> lock prefix in the fast __add() path and an explicit smp_add() in the slow
> path (arch/x86/include/asm/spinlock.h:arch_spin_unlock) the __add from
> arch/x86/include/asm/cmpxchg.h does lock or am I missinterpreting this ?
> the other archs I believe were all doing explicit mb()/smp_mb() in the
> arch_spin_unlock - will go check this again.
It uses UNLOCK_LOCK_PREFIX, which if you look carefully, is normally
always "". Only some 'broken' i386 chips require a LOCK there.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: allow preemption in check_task_state
2014-02-10 18:16 ` Peter Zijlstra
@ 2014-02-10 18:45 ` Nicholas Mc Guire
0 siblings, 0 replies; 9+ messages in thread
From: Nicholas Mc Guire @ 2014-02-10 18:45 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Steven Rostedt, linux-rt-users, LKML, Sebastian Andrzej Siewior,
Carsten Emde, Thomas Gleixner, Andreas Platschek
On Mon, 10 Feb 2014, Peter Zijlstra wrote:
> On Mon, Feb 10, 2014 at 07:12:03PM +0100, Nicholas Mc Guire wrote:
> > On Mon, 10 Feb 2014, Peter Zijlstra wrote:
> >
> > > On Mon, Feb 10, 2014 at 06:17:12PM +0100, Nicholas Mc Guire wrote:
> > > > maybe I'm missing/missunderstanding something here but
> > > > pi_unlock -> arch_spin_unlock is a full mb()
> > >
> > > Nope, arch_spin_unlock() on x86 is a single add[wb] without LOCK prefix.
> > >
> > > The lock and unlock primitives are in general specified to have ACQUIRE
> > > resp. RELEASE semantics.
> > >
> > > See Documentation/memory-barriers.txt for far too much head-hurting
> > > details.
> >
> > I did check that - but from the code check it seems to me to be using a
> > lock prefix in the fast __add() path and an explicit smp_add() in the slow
> > path (arch/x86/include/asm/spinlock.h:arch_spin_unlock) the __add from
> > arch/x86/include/asm/cmpxchg.h does lock or am I missinterpreting this ?
> > the other archs I believe were all doing explicit mb()/smp_mb() in the
> > arch_spin_unlock - will go check this again.
>
> It uses UNLOCK_LOCK_PREFIX, which if you look carefully, is normally
> always "". Only some 'broken' i386 chips require a LOCK there.
thanks for the details - will go dig through this again - still
missing seme bits.
the first patch proposal is taken care of I guess :)
thx!
hofrat
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-02-10 18:45 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-10 15:38 allow preemption in check_task_state Nicholas Mc Guire
2014-02-10 16:11 ` Steven Rostedt
2014-02-10 17:17 ` Nicholas Mc Guire
2014-02-10 17:38 ` Peter Zijlstra
2014-02-10 18:12 ` Nicholas Mc Guire
2014-02-10 18:16 ` Peter Zijlstra
2014-02-10 18:45 ` Nicholas Mc Guire
2014-02-10 17:52 ` Steven Rostedt
2014-02-10 18:13 ` Nicholas Mc Guire
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).