Re: + kthread-add-a-missing-memory-barrier-to-kthread

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: + kthread-add-a-missing-memory-barrier-to-kthread_stop.patch added to -mm tree
       [not found]     ` <20080223182258.GA19946@tv-sign.ru>
@ 2008-02-23 18:57       ` Linus Torvalds
  2008-02-23 19:36         ` Oleg Nesterov
  2008-02-23 19:51         ` Dmitry Adamushko
  0 siblings, 2 replies; 7+ messages in thread
From: Linus Torvalds @ 2008-02-23 18:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Linux Kernel Mailing List, dmitry.adamushko,
	a.p.zijlstra, apw, Ingo Molnar, nickpiggin, paulmck, rusty,
	Steven Rostedt, linux-arch

On Sat, 23 Feb 2008, Oleg Nesterov wrote:
> 
> Yes, but still I suspect wmb() is not enough. Note that try_to_wake_up()
> first checks (reads) the task->state,
> 
> 	if (!(old_state & state))
> 		goto out;
> 
> without the full mb() it is (in theory) possible that try_to_wake_up()
> first reads TASK_RUNNING and only then sets CONDITION. IOW, STORE and
> LOAD could be re-ordered.

No. The spinlock can have preceding stores (and loads, for that matter) 
percolate *into* the locked region, but a spinlock can *not* have loads 
(and stores) escape *out* of the region withou being totally broken. 

So the spinlock that predeces that load of "old_state" absolutely *must* 
be a memory barrier as far as that load is concerned.

Spinlocks are just "one-way" permeable. They stop both reads and writes, 
but they stop them from escaping up to earlier code (and the spin-unlock 
in turn stops reads and writes within the critical section from escaping 
down past the unlock).

But they definitely must stop that direction, and no loads or stores that 
are inside the critical section can possibly be allowed to be done outside 
of it, or the spinlock would be pointless.

[ Yeah, yeah, I guess that in theory you could serialize spinlocks only 
  against each other, and their serialization would be in a totally 
  different "domain" from normal reads and writes, and then the spinlock 
  wouldn't act as a barrier at all except for stuff that is literally 
  inside another spinlock, but I don't think anybody really does that ]

So I think a simple smp_wmb() should be ok.

That said, I do not *mind* the notion of "smp_mb_before/after_spinlock()", 
and just have architectures do whatever they want (with x86 just being a 
no-op). I don't think it's wrong in any way, and may be the really 
generic solution for some theoretical case where locking is done not by 
using the normal cache coherency, but over a separate lock network. But I 
would suggest against adding new abstractions unless there are real cases 
of it mattering.

(The biggest reason to do it in practice is probably architectures that 
have a really slow smp_wmb() and that also have a spinlock that is already 
serializing enough, but if this is only for one single use - in 
try_to_wake_up() - even that doesn't really seem to be a very strong 
argument).

> A bit offtopic, but let's take another example, __queue_work()->insert_work().
> With some trivial modification we can move set_wq_data() outside of cwq->lock.
> But according to linux's memory model we still need wmb(), which is a bit
> annoying. Perhaps we can also add smp_wmb_before_spinlock(). Not sure this
> is not too ugly though.

Is that really so performance-critical? We still need the spinlock for the 
actual list manipulation, so why would we care?

And the smp_wmb() really should be free. It's literally free on x86, but 
doing a write barrier is generally a cheap operation on any sane 
architecture (ie generally cheaper than a read barrier or a full barrier: 
it should mostly be a matter of inserting a magic entry in the write 
buffers).

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: + kthread-add-a-missing-memory-barrier-to-kthread_stop.patch added to -mm tree
  2008-02-23 18:57       ` + kthread-add-a-missing-memory-barrier-to-kthread_stop.patch added to -mm tree Linus Torvalds
@ 2008-02-23 19:36         ` Oleg Nesterov
  2008-02-23 19:51         ` Dmitry Adamushko
  1 sibling, 0 replies; 7+ messages in thread
From: Oleg Nesterov @ 2008-02-23 19:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Linux Kernel Mailing List, dmitry.adamushko,
	a.p.zijlstra, apw, Ingo Molnar, nickpiggin, paulmck, rusty,
	Steven Rostedt, linux-arch

On 02/23, Linus Torvalds wrote:
>
> On Sat, 23 Feb 2008, Oleg Nesterov wrote:
> > 
> > Yes, but still I suspect wmb() is not enough. Note that try_to_wake_up()
> > first checks (reads) the task->state,
> > 
> > 	if (!(old_state & state))
> > 		goto out;
> > 
> > without the full mb() it is (in theory) possible that try_to_wake_up()
> > first reads TASK_RUNNING and only then sets CONDITION. IOW, STORE and
> > LOAD could be re-ordered.
> 
> No. The spinlock can have preceding stores (and loads, for that matter) 
> percolate *into* the locked region, but a spinlock can *not* have loads 
> (and stores) escape *out* of the region withou being totally broken. 
> 
> So the spinlock that predeces that load of "old_state" absolutely *must* 
> be a memory barrier as far as that load is concerned.
> 
> Spinlocks are just "one-way" permeable. They stop both reads and writes, 
> but they stop them from escaping up to earlier code (and the spin-unlock 
> in turn stops reads and writes within the critical section from escaping 
> down past the unlock).
> 
> But they definitely must stop that direction, and no loads or stores that 
> are inside the critical section can possibly be allowed to be done outside 
> of it, or the spinlock would be pointless.

Yes sure. But I meant that the STORE (set CONDITION) can leak into the critical
section, and it could be re-ordered with the LOAD (check ->state) _inside_
the critical section.

However,

> So I think a simple smp_wmb() should be ok.

now I think you are probably right. Because (please ack/nack my understanding)
this smp_wmb() ensures that the preceding STORE can't actually go into
the locked region.

> That said, I do not *mind* the notion of "smp_mb_before/after_spinlock()", 
> and just have architectures do whatever they want (with x86 just being a 
> no-op). I don't think it's wrong in any way, and may be the really 
> generic solution for some theoretical case where locking is done not by 
> using the normal cache coherency, but over a separate lock network. But I 
> would suggest against adding new abstractions unless there are real cases 
> of it mattering.
> 
> (The biggest reason to do it in practice is probably architectures that 
> have a really slow smp_wmb() and that also have a spinlock that is already 
> serializing enough, but if this is only for one single use - in 
> try_to_wake_up() - even that doesn't really seem to be a very strong 
> argument).

Great.

> > A bit offtopic, but let's take another example, __queue_work()->insert_work().
> > With some trivial modification we can move set_wq_data() outside of cwq->lock.
> > But according to linux's memory model we still need wmb(), which is a bit
> > annoying. Perhaps we can also add smp_wmb_before_spinlock(). Not sure this
> > is not too ugly though.
> 
> Is that really so performance-critical? We still need the spinlock for the 
> actual list manipulation, so why would we care?
> 
> And the smp_wmb() really should be free. It's literally free on x86, but 
> doing a write barrier is generally a cheap operation on any sane 
> architecture (ie generally cheaper than a read barrier or a full barrier: 
> it should mostly be a matter of inserting a magic entry in the write 
> buffers).

Yes, yes, this was just a "on a related note", thanks!

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: + kthread-add-a-missing-memory-barrier-to-kthread_stop.patch added to -mm tree
  2008-02-23 18:57       ` + kthread-add-a-missing-memory-barrier-to-kthread_stop.patch added to -mm tree Linus Torvalds
  2008-02-23 19:36         ` Oleg Nesterov
@ 2008-02-23 19:51         ` Dmitry Adamushko
  2008-02-23 20:08           ` Linus Torvalds
  2008-02-25 13:55           ` David Howells
  1 sibling, 2 replies; 7+ messages in thread
From: Dmitry Adamushko @ 2008-02-23 19:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, Andrew Morton, Linux Kernel Mailing List,
	a.p.zijlstra, apw, Ingo Molnar, nickpiggin, paulmck, rusty,
	Steven Rostedt, linux-arch

On 23/02/2008, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
>  On Sat, 23 Feb 2008, Oleg Nesterov wrote:
>  >
>
> > Yes, but still I suspect wmb() is not enough. Note that try_to_wake_up()
>  > first checks (reads) the task->state,
>  >
>  >       if (!(old_state & state))
>  >               goto out;
>  >
>  > without the full mb() it is (in theory) possible that try_to_wake_up()
>  > first reads TASK_RUNNING and only then sets CONDITION. IOW, STORE and
>  > LOAD could be re-ordered.
>
>
> No. The spinlock can have preceding stores (and loads, for that matter)
>  percolate *into* the locked region, but a spinlock can *not* have loads
>  (and stores) escape *out* of the region withou being totally broken.

it's not a LOAD that escapes *out* of the region. It's a MODIFY that gets *in*:

(1)

MODIFY(a);

LOCK

LOAD(b);

UNLOCK


can become:

(2)

LOCK

MOFIDY(a)
LOAD(b);

UNLOCK

and (reordered)

(3)

LOCK

LOAD(a)
MODIFY(b)

UNLOCK

and this last one is a problem. No?


>
>                 Linus
>

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: + kthread-add-a-missing-memory-barrier-to-kthread_stop.patch added to -mm tree
  2008-02-23 19:51         ` Dmitry Adamushko
@ 2008-02-23 20:08           ` Linus Torvalds
  2008-02-23 21:03             ` [PATCH, 3rd resend] documentation: atomic_add_unless() doesn't imply mb() on failure Oleg Nesterov
  2008-02-23 21:07             ` + kthread-add-a-missing-memory-barrier-to-kthread_stop.patch added to -mm tree Dmitry Adamushko
  2008-02-25 13:55           ` David Howells
  1 sibling, 2 replies; 7+ messages in thread
From: Linus Torvalds @ 2008-02-23 20:08 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: Oleg Nesterov, Andrew Morton, Linux Kernel Mailing List,
	a.p.zijlstra, apw, Ingo Molnar, nickpiggin, paulmck, rusty,
	Steven Rostedt, linux-arch

On Sat, 23 Feb 2008, Dmitry Adamushko wrote:
> 
> it's not a LOAD that escapes *out* of the region. It's a MODIFY that gets *in*:

Not with the smp_wmb(). That's the whole point.

Ie the patch I'm suggesting is sufficient is appended, and the point is 
that any write before the critical region will be ordered wrt the critical 
region because of the write barrier before the spinlock (which itself is a 
write).

This is also why I mentioned that if you have a really odd architecure 
that considers spinlocks to be "outside" the normal cache coherency 
domain, that would be broken, but I cannot think of a single valid case of 
that, and I consider it insane.

(There are supecomputers that have a separate "barrier" network that can 
be used to do global ordering, but they don't generally do cache coherency 
and dependable memory ordering at all, which is why they need the separate 
barrier network in the first place. So that odd case isn't relevant to 
this discussion, even if it's historically interesting ;^)

But I'd love to hear a good argument why this wouldn't work on some 
architecture that we actually support (or might want to support in the 
future).. Memory ordering is interesting (even if the only people who do 
it right is Intel and AMD).

			Linus

---
 kernel/sched.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index f28f19e..3bceb3b 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1831,6 +1831,7 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state, int sync)
 	long old_state;
 	struct rq *rq;

+	smp_wmb();
 	rq = task_rq_lock(p, &flags);
 	old_state = p->state;
 	if (!(old_state & state))

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH, 3rd resend] documentation: atomic_add_unless() doesn't imply mb() on failure
  2008-02-23 20:08           ` Linus Torvalds
@ 2008-02-23 21:03             ` Oleg Nesterov
  2008-02-23 21:07             ` + kthread-add-a-missing-memory-barrier-to-kthread_stop.patch added to -mm tree Dmitry Adamushko
  1 sibling, 0 replies; 7+ messages in thread
From: Oleg Nesterov @ 2008-02-23 21:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dmitry Adamushko, Andrew Morton, Linux Kernel Mailing List,
	a.p.zijlstra, apw, Ingo Molnar, nickpiggin, paulmck, rusty,
	Steven Rostedt, linux-arch

(sorry for being offtpoic, but while experts are here...)

A "typical" implementation of atomic_add_unless() can return 0 immediately
after the first atomic_read() (before doing cmpxchg). In that case it doesn't
provide any barrier semantics. See include/asm-ia64/atomic.h as an example.

We should either change the implementation, or fix the docs.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Nick Piggin <npiggin@suse.de>

 Documentation/atomic_ops.txt      |    3 ++-
 Documentation/memory-barriers.txt |    2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

--- t/Documentation/atomic_ops.txt~doc_aau	2008-01-14 23:43:11.000000000 +0300
+++ t/Documentation/atomic_ops.txt	2008-02-23 23:53:12.000000000 +0300
@@ -186,7 +186,8 @@ If the atomic value v is not equal to u,
 returns non zero. If v is equal to u then it returns zero. This is done as
 an atomic operation.
 
-atomic_add_unless requires explicit memory barriers around the operation.
+atomic_add_unless requires explicit memory barriers around the operation
+unless it fails (returns 0).
 
 atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)
 
--- t/Documentation/memory-barriers.txt~doc_aau	2008-01-14 23:43:11.000000000 +0300
+++ t/Documentation/memory-barriers.txt	2008-02-23 23:53:12.000000000 +0300
@@ -1493,7 +1493,7 @@ explicit lock operations, described late
 	atomic_dec_and_test();
 	atomic_sub_and_test();
 	atomic_add_negative();
-	atomic_add_unless();
+	atomic_add_unless();	/* when succeeds (returns 1) */
 	test_and_set_bit();
 	test_and_clear_bit();
 	test_and_change_bit();


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: + kthread-add-a-missing-memory-barrier-to-kthread_stop.patch added to -mm tree
  2008-02-23 20:08           ` Linus Torvalds
  2008-02-23 21:03             ` [PATCH, 3rd resend] documentation: atomic_add_unless() doesn't imply mb() on failure Oleg Nesterov
@ 2008-02-23 21:07             ` Dmitry Adamushko
  1 sibling, 0 replies; 7+ messages in thread
From: Dmitry Adamushko @ 2008-02-23 21:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, Andrew Morton, Linux Kernel Mailing List,
	a.p.zijlstra, apw, Ingo Molnar, nickpiggin, paulmck, rusty,
	Steven Rostedt, linux-arch

On 23/02/2008, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>  On Sat, 23 Feb 2008, Dmitry Adamushko wrote:
>  >
>  > it's not a LOAD that escapes *out* of the region. It's a MODIFY that gets *in*:
>
>
> Not with the smp_wmb(). That's the whole point.
>
>  Ie the patch I'm suggesting is sufficient is appended, and the point is
>  that any write before the critical region will be ordered wrt the critical
>  region because of the write barrier before the spinlock (which itself is a
>  write).

Yeah, good point!

(heh... I wouldn't dare to say this 'obvious thing' only to Anton
Blanchard who is "the only person who always 'have a point' by
definition" :-))

> This is also why I mentioned that if you have a really odd architecure
> that considers spinlocks to be "outside" the normal cache coherency
> domain, that would be broken, but I cannot think of a single valid case of
> that, and I consider it insane.

Yeah, some potential implementations come into my mind but, I guess,
they are as far away from real hardware as science-fiction from
science :-/

So how should we proceed with this issue?

let's use your patch and declare try_to_wake_up() a 'full' mb for the case:

MODIFY
try_to_wake_up
LOAD or MODIFY (that take place either after or inside try_to_wake_up())

so we'll fix (lots of) potentially problematic cases with a single shot.

and

LOAD
try_to_wake_up()
LOAD or MODIFY

is probably not that common so we don't care.


-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: + kthread-add-a-missing-memory-barrier-to-kthread_stop.patch added to -mm tree
  2008-02-23 19:51         ` Dmitry Adamushko
  2008-02-23 20:08           ` Linus Torvalds
@ 2008-02-25 13:55           ` David Howells
  1 sibling, 0 replies; 7+ messages in thread
From: David Howells @ 2008-02-25 13:55 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: dhowells, Linus Torvalds, Oleg Nesterov, Andrew Morton,
	Linux Kernel Mailing List, a.p.zijlstra, apw, Ingo Molnar,
	nickpiggin, paulmck, rusty, Steven Rostedt, linux-arch

Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:

> (3)
> 
> LOCK
> 
> LOAD(a)
> MODIFY(b)
> 
> UNLOCK
> 
> and this last one is a problem. No?

I assume you meant:

	LOCK
	LOAD(b)
	MODIFY(a)
	UNLOCK

David

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-02-25 14:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200802230733.m1N7XnMu018253@imap1.linux-foundation.org>
     [not found] ` <20080223162705.GA7686@tv-sign.ru>
     [not found]   ` <alpine.LFD.1.00.0802230937530.21332@woody.linux-foundation.org>
     [not found]     ` <20080223182258.GA19946@tv-sign.ru>
2008-02-23 18:57       ` + kthread-add-a-missing-memory-barrier-to-kthread_stop.patch added to -mm tree Linus Torvalds
2008-02-23 19:36         ` Oleg Nesterov
2008-02-23 19:51         ` Dmitry Adamushko
2008-02-23 20:08           ` Linus Torvalds
2008-02-23 21:03             ` [PATCH, 3rd resend] documentation: atomic_add_unless() doesn't imply mb() on failure Oleg Nesterov
2008-02-23 21:07             ` + kthread-add-a-missing-memory-barrier-to-kthread_stop.patch added to -mm tree Dmitry Adamushko
2008-02-25 13:55           ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).