frlock and barrier discussion

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* frlock and barrier discussion
  2003-01-29  7:06 ` Richard Henderson
@ 2003-01-30  1:15   ` Stephen Hemminger
  2003-01-30  1:29     ` Andrea Arcangeli
  2003-01-30  1:41     ` Richard Henderson
  0 siblings, 2 replies; 11+ messages in thread
From: Stephen Hemminger @ 2003-01-30  1:15 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Andrea Arcangeli, Andrew Morton, Andi Kleen,
	Linux Kernel Mailing List

On Tue, 2003-01-28 at 23:06, Richard Henderson wrote:
> On Tue, Jan 28, 2003 at 03:42:21PM -0800, Stephen Hemminger wrote:
> > +static inline void fr_write_begin(frlock_t *rw)
> > +{
> > +	preempt_disable();
> > +	rw->pre_sequence++;
> > +	wmb();
> > +}
> > +
> > +static inline void fr_write_end(frlock_t *rw)
> > +{
> > +	wmb();
> > +	rw->post_sequence++;
> 
> These need to be mb(), not wmb(), if you want the bits in between
> to actually happen in between, as with your xtime example.  At
> present there's nothing stoping xtime from being *read* before
> your read from pre_sequence happens.


First, write_begin/end can only be safely used when there is separate
writer synchronization such as a spin_lock or semaphore.  
As far as I know, semaphore or spin_lock guarantees a barrier.
So xtime or anything else can not be read before the spin_lock.

Using mb() is more paranoid than necessary. 





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: frlock and barrier discussion
  2003-01-30  1:15   ` frlock and barrier discussion Stephen Hemminger
@ 2003-01-30  1:29     ` Andrea Arcangeli
  2003-01-30  1:41     ` Richard Henderson
  1 sibling, 0 replies; 11+ messages in thread
From: Andrea Arcangeli @ 2003-01-30  1:29 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Richard Henderson, Andrew Morton, Andi Kleen,
	Linux Kernel Mailing List

On Wed, Jan 29, 2003 at 05:15:55PM -0800, Stephen Hemminger wrote:
> On Tue, 2003-01-28 at 23:06, Richard Henderson wrote:
> > On Tue, Jan 28, 2003 at 03:42:21PM -0800, Stephen Hemminger wrote:
> > > +static inline void fr_write_begin(frlock_t *rw)
> > > +{
> > > +	preempt_disable();
> > > +	rw->pre_sequence++;
> > > +	wmb();
> > > +}
> > > +
> > > +static inline void fr_write_end(frlock_t *rw)
> > > +{
> > > +	wmb();
> > > +	rw->post_sequence++;
> > 
> > These need to be mb(), not wmb(), if you want the bits in between
> > to actually happen in between, as with your xtime example.  At
> > present there's nothing stoping xtime from being *read* before
> > your read from pre_sequence happens.
> 
> 
> First, write_begin/end can only be safely used when there is separate
> writer synchronization such as a spin_lock or semaphore.  
> As far as I know, semaphore or spin_lock guarantees a barrier.
> So xtime or anything else can not be read before the spin_lock.
> 
> Using mb() is more paranoid than necessary. 

yes, it should only generate a superflous lock on x86.

it shouldn't even be necessary in fr_write_trylock.

Andrea

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: frlock and barrier discussion
  2003-01-30  1:15   ` frlock and barrier discussion Stephen Hemminger
  2003-01-30  1:29     ` Andrea Arcangeli
@ 2003-01-30  1:41     ` Richard Henderson
  2003-01-30  1:52       ` Andrea Arcangeli
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2003-01-30  1:41 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Andrea Arcangeli, Andrew Morton, Andi Kleen,
	Linux Kernel Mailing List

On Wed, Jan 29, 2003 at 05:15:55PM -0800, Stephen Hemminger wrote:
> First, write_begin/end can only be safely used when there is separate
> writer synchronization such as a spin_lock or semaphore.  
> As far as I know, semaphore or spin_lock guarantees a barrier.
> So xtime or anything else can not be read before the spin_lock.
> 
> Using mb() is more paranoid than necessary. 

If you want stuff to happen *between* the write_begin/end, or
indeed for the begin/end not to be interleaved, then mb() is
absolutely necessary.  The most likely dynamic reordering of

	//begin
	t1 = rw->pre_sequence
	t1 += 1
	rw->pre_sequence = t1
	wmb()

	//stuff
	xtimensec = xtime.tv_nsec

	//end
	wmb()
	t2 = rw->post_sequence
	t2 += 1
	rw->post_sequence = t2

is

	t1 = rw->pre_sequence
	t2 = rw->post_sequence
	xtimensec = xtime.tv_nsec
	t1 += 1;
	t2 += 2;
	rw->pre_sequence = t1
	wmb()
	wmb()
	rw->post_sequence = t2

Why?  Because pre_sequence and post_sequence are in the same
cache line, and both reads could be satisfied in the same 
cycle by the same line fill from main memory.

If you don't care about stuff happening in between the
write_begin/end, then why are you using them at all?


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: frlock and barrier discussion
  2003-01-30  1:41     ` Richard Henderson
@ 2003-01-30  1:52       ` Andrea Arcangeli
  2003-01-31  0:41         ` Richard Henderson
  0 siblings, 1 reply; 11+ messages in thread
From: Andrea Arcangeli @ 2003-01-30  1:52 UTC (permalink / raw)
  To: Stephen Hemminger, Andrew Morton, Andi Kleen,
	Linux Kernel Mailing List

On Wed, Jan 29, 2003 at 05:41:33PM -0800, Richard Henderson wrote:
> 	//begin
> 	t1 = rw->pre_sequence
> 	t1 += 1
> 	rw->pre_sequence = t1
> 	wmb()
> 
> 	//stuff
> 	xtimensec = xtime.tv_nsec
> 
> 	//end
> 	wmb()
> 	t2 = rw->post_sequence
> 	t2 += 1
> 	rw->post_sequence = t2
> 
> is
> 
> 	t1 = rw->pre_sequence
> 	t2 = rw->post_sequence
> 	xtimensec = xtime.tv_nsec
> 	t1 += 1;
> 	t2 += 2;
> 	rw->pre_sequence = t1
> 	wmb()
> 	wmb()
> 	rw->post_sequence = t2

No it's:


 	t1 = rw->pre_sequence
 	t2 = rw->post_sequence
 	t1 += 1;
 	t2 += 2;
 	rw->pre_sequence = t1
 	wmb()
 	xtimensec = xtime.tv_nsec
 	wmb()
 	rw->post_sequence = t2

you're missing xtimensec is a write.

or this if you prefer:

	spin_lock() / now xtime can't change under us

 	t1 = rw->pre_sequence
 	t2 = rw->post_sequence
	t3 = xtime.tv_nsec
 	t1 += 1;
 	t2 += 2;
 	rw->pre_sequence = t1
 	wmb()
 	xtimensec = t3
 	wmb()
 	rw->post_sequence = t2

	spin_unlock() / now xtime can change again


and the above is the optimal implementation of the write-side. We
definitely don't want to forbid those reoderings. if gcc or cpu thinks
it's worthwhile they must be allowed to optimize it since it's legal.


I believe wmb() is correct, and mb() is overkill.

Andrea

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re:  frlock and barrier discussion
@ 2003-01-30 18:20 Manfred Spraul
  2003-01-30 18:26 ` Andrea Arcangeli
  2003-01-30 22:32 ` Alan Cox
  0 siblings, 2 replies; 11+ messages in thread
From: Manfred Spraul @ 2003-01-30 18:20 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Andrea Arcangeli, Richard Henderson, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 675 bytes --]

Stephen wrote:

[snip - memory barrier for fr_write_begin]

>Using mb() is more paranoid than necessary. 


What about the memory barrier in fr_read_begin?
If I understand the Intel documentation correctly, then i386 doesn't need them:
"Writes by a single processor are observed in the same order by all processors"

I think "smp_read_barrier_depends()" (i.e. a nop for i386) is sufficient. Attached is a test app - could someone try it? I don't have access to a SMP system right now.


What about permitting arch overrides for the memory barriers? E.g. ia64 has acquire and release memory barriers - it doesn't map to the Linux wmb()/rmb() scheme.

--
	Manfred 













[-- Attachment #2: frlock.cpp --]
[-- Type: text/plain, Size: 2579 bytes --]

/*
 * frlock: test for Intel memory ordering.
 * Copyright (C) 1999,2003 by Manfred Spraul.
 * 
 * Redistribution of this file is permitted under the terms of the GNU 
 * Public License (GPL)
 * $Header: /pub/home/manfred/cvs-tree/movopt/frlock.cpp,v 1.2 2003/01/26 10:41:39 manfred Exp $
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <assert.h>

static volatile int g_val1;
static volatile int g_val2;
static volatile int g_seq1;
static volatile int g_seq2;

static volatile int start;
#define MB()	__asm__ __volatile__ ("lock;addl $0,(%%esp)\n\t" \
					:/* no output*/ \
					:/* no input*/:"cc","memory")

#define DELAY()		do { int i; for(i=0;i<1000;i++); } while(0)

void* threadfnc(void* param)
{
	while(!start);
	if(1 == (int)param)
		goto cpu1;
	if(2 == (int)param)
		goto cpu2;
	assert(0);
cpu1:
	{	// reader:
		for(;;) {
			int x1,x2,val1,val2;

			x1 = g_seq1;
			val1 = g_val1;
			val2 = g_val2;
			x2 = g_seq2;
			if (x1 == x2) {
				if (val1 != val2) {
					printf("Bad! memory ordering violation with %d/%d: %d/%d.\n", x1, x2, val1, val2);
				}
			}
	    	}
	}
cpu2:
	{	// writer:
	    	int target = 0;
		for (;;) {

			// write 1:
			target++;
			g_seq1 = target;
			g_val1 = target;
			g_val2 = target;
			g_seq2 = target;
			DELAY();

			// write 2:
			target++;
			g_seq1 = target;
			g_val1 = target;
			MB();
			g_val2 = target;
			g_seq2 = target;
			DELAY();

			// write 3:
			target++;
			g_seq1 = target;
			g_val2 = target;
			g_val1 = target;
			g_seq2 = target;
			DELAY();

			// write 4:
			target++;
			g_seq1 = target;
			g_val2 = target;
			MB();
			g_val1 = target;
			g_seq2 = target;
			DELAY();
			


			// write 5:
			target++;
			g_seq1 = target;
			g_val1 = target;
			MB(); MB();
			g_val2 = target;
			g_seq2 = target;
			DELAY();

			// write 6:
			target++;
			g_seq1 = target;
			g_val1 = target;
			MB(); DELAY();
			g_val2 = target;
			g_seq2 = target;
			DELAY();

			// write 7:
			target++;
			g_seq1 = target;
			g_val2 = target;
			MB(); MB();
			g_val1 = target;
			g_seq2 = target;
			DELAY();

			// write 8:
			target++;
			g_seq1 = target;
			g_val2 = target;
			MB(); DELAY();
			g_val1 = target;
			g_seq2 = target;
			DELAY();
		}
	}
}

void start_thread(int id)
{
	pthread_t thread;
	int res;

	res = pthread_create(&thread,NULL,threadfnc,(void*)id);
	if(res != 0)
		assert(false);
}



int main()
{
	printf("movopt:\n");
	start_thread(1);
	start_thread(2);
	printf(" starting, please wait.\n");
	fflush(stdout);
	start = 1;
	for(;;) sleep(1000);
}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: frlock and barrier discussion
  2003-01-30 18:20 frlock and barrier discussion Manfred Spraul
@ 2003-01-30 18:26 ` Andrea Arcangeli
  2003-01-30 19:05   ` Manfred Spraul
  2003-01-30 19:54   ` Davide Libenzi
  2003-01-30 22:32 ` Alan Cox
  1 sibling, 2 replies; 11+ messages in thread
From: Andrea Arcangeli @ 2003-01-30 18:26 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Stephen Hemminger, Richard Henderson, linux-kernel

On Thu, Jan 30, 2003 at 07:20:33PM +0100, Manfred Spraul wrote:
> Stephen wrote:
> 
> [snip - memory barrier for fr_write_begin]
> 
> >Using mb() is more paranoid than necessary. 
> 
> 
> What about the memory barrier in fr_read_begin?
> If I understand the Intel documentation correctly, then i386 doesn't need 
> them:
> "Writes by a single processor are observed in the same order by all 
> processors"
> 
> I think "smp_read_barrier_depends()" (i.e. a nop for i386) is sufficient. 

I don't see what you mean, there is no dependency we can rely on between
the read of the sequence number and the critical section reads, the
critical section reads has nothing to do with the sequence number reads
and the frlock itself.

> Attached is a test app - could someone try it? I don't have access to a SMP 
> system right now.
> 
> 
> What about permitting arch overrides for the memory barriers? E.g. ia64 has 
> acquire and release memory barriers - it doesn't map to the Linux 
> wmb()/rmb() scheme.
> 
> --
> 	Manfred 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

> /*
>  * frlock: test for Intel memory ordering.
>  * Copyright (C) 1999,2003 by Manfred Spraul.
>  * 
>  * Redistribution of this file is permitted under the terms of the GNU 
>  * Public License (GPL)
>  * $Header: /pub/home/manfred/cvs-tree/movopt/frlock.cpp,v 1.2 2003/01/26 10:41:39 manfred Exp $
>  */
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
> #include <pthread.h>
> #include <assert.h>
> 
> static volatile int g_val1;
> static volatile int g_val2;
> static volatile int g_seq1;
> static volatile int g_seq2;
> 
> static volatile int start;
> #define MB()	__asm__ __volatile__ ("lock;addl $0,(%%esp)\n\t" \
> 					:/* no output*/ \
> 					:/* no input*/:"cc","memory")
> 
> #define DELAY()		do { int i; for(i=0;i<1000;i++); } while(0)
> 
> void* threadfnc(void* param)
> {
> 	while(!start);
> 	if(1 == (int)param)
> 		goto cpu1;
> 	if(2 == (int)param)
> 		goto cpu2;
> 	assert(0);
> cpu1:
> 	{	// reader:
> 		for(;;) {
> 			int x1,x2,val1,val2;
> 
> 			x1 = g_seq1;
> 			val1 = g_val1;
> 			val2 = g_val2;
> 			x2 = g_seq2;
> 			if (x1 == x2) {
> 				if (val1 != val2) {
> 					printf("Bad! memory ordering violation with %d/%d: %d/%d.\n", x1, x2, val1, val2);
> 				}
> 			}
> 	    	}
> 	}
> cpu2:
> 	{	// writer:
> 	    	int target = 0;
> 		for (;;) {
> 
> 			// write 1:
> 			target++;
> 			g_seq1 = target;
> 			g_val1 = target;
> 			g_val2 = target;
> 			g_seq2 = target;
> 			DELAY();
> 
> 			// write 2:
> 			target++;
> 			g_seq1 = target;
> 			g_val1 = target;
> 			MB();
> 			g_val2 = target;
> 			g_seq2 = target;
> 			DELAY();
> 
> 			// write 3:
> 			target++;
> 			g_seq1 = target;
> 			g_val2 = target;
> 			g_val1 = target;
> 			g_seq2 = target;
> 			DELAY();
> 
> 			// write 4:
> 			target++;
> 			g_seq1 = target;
> 			g_val2 = target;
> 			MB();
> 			g_val1 = target;
> 			g_seq2 = target;
> 			DELAY();
> 			
> 
> 
> 			// write 5:
> 			target++;
> 			g_seq1 = target;
> 			g_val1 = target;
> 			MB(); MB();
> 			g_val2 = target;
> 			g_seq2 = target;
> 			DELAY();
> 
> 			// write 6:
> 			target++;
> 			g_seq1 = target;
> 			g_val1 = target;
> 			MB(); DELAY();
> 			g_val2 = target;
> 			g_seq2 = target;
> 			DELAY();
> 
> 			// write 7:
> 			target++;
> 			g_seq1 = target;
> 			g_val2 = target;
> 			MB(); MB();
> 			g_val1 = target;
> 			g_seq2 = target;
> 			DELAY();
> 
> 			// write 8:
> 			target++;
> 			g_seq1 = target;
> 			g_val2 = target;
> 			MB(); DELAY();
> 			g_val1 = target;
> 			g_seq2 = target;
> 			DELAY();
> 		}
> 	}
> }
> 
> void start_thread(int id)
> {
> 	pthread_t thread;
> 	int res;
> 
> 	res = pthread_create(&thread,NULL,threadfnc,(void*)id);
> 	if(res != 0)
> 		assert(false);
> }
> 
> 
> 
> int main()
> {
> 	printf("movopt:\n");
> 	start_thread(1);
> 	start_thread(2);
> 	printf(" starting, please wait.\n");
> 	fflush(stdout);
> 	start = 1;
> 	for(;;) sleep(1000);
> }



Andrea

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: frlock and barrier discussion
  2003-01-30 18:26 ` Andrea Arcangeli
@ 2003-01-30 19:05   ` Manfred Spraul
  2003-01-30 19:54   ` Davide Libenzi
  1 sibling, 0 replies; 11+ messages in thread
From: Manfred Spraul @ 2003-01-30 19:05 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Stephen Hemminger, Richard Henderson, linux-kernel

Andrea Arcangeli wrote:

>On Thu, Jan 30, 2003 at 07:20:33PM +0100, Manfred Spraul wrote:
>  
>
>>Stephen wrote:
>>
>>[snip - memory barrier for fr_write_begin]
>>
>>    
>>
>>>Using mb() is more paranoid than necessary. 
>>>      
>>>
>>What about the memory barrier in fr_read_begin?
>>If I understand the Intel documentation correctly, then i386 doesn't need 
>>them:
>>"Writes by a single processor are observed in the same order by all 
>>processors"
>>
>>I think "smp_read_barrier_depends()" (i.e. a nop for i386) is sufficient. 
>>    
>>
>
>I don't see what you mean, there is no dependency we can rely on between
>the read of the sequence number and the critical section reads, the
>critical section reads has nothing to do with the sequence number reads
>and the frlock itself.
>
You are right - "observed in the same order by all processors" only 
means that the memory interface of the cpus see all writes in order, not 
that instruction executed by the cpus will observe the writes in order.

That leaves ia64 with the acquire/release barriers.

--
    Manfred


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: frlock and barrier discussion
  2003-01-30 18:26 ` Andrea Arcangeli
  2003-01-30 19:05   ` Manfred Spraul
@ 2003-01-30 19:54   ` Davide Libenzi
  1 sibling, 0 replies; 11+ messages in thread
From: Davide Libenzi @ 2003-01-30 19:54 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Linux Kernel Mailing List

> Attached is a test app - could someone try it? I don't have access to a SMP
> system right now.

Try to put [g_val1, g_seq1] and [g_val2, g_seq2] on two different cache
lines and run it on a SMP system using CPUs with a partitioned cache
architecture. Even if you do an WMB on writer side, you might see a
different order w/out an RMB on the reader side. This because the two
cache lines might be committed to different partitions with different
loads, and the latest ( in time order ) commit might see a fastest path
due a lower traffic. An RMB on the reader side ( that is usually expensive )
wait for all CPUs's memory controllers to flush their stuff before
resuming execution.

- Davide

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re:  frlock and barrier discussion
  2003-01-30 18:20 frlock and barrier discussion Manfred Spraul
  2003-01-30 18:26 ` Andrea Arcangeli
@ 2003-01-30 22:32 ` Alan Cox
  1 sibling, 0 replies; 11+ messages in thread
From: Alan Cox @ 2003-01-30 22:32 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Stephen Hemminger, Andrea Arcangeli, Richard Henderson,
	Linux Kernel Mailing List

On Thu, 2003-01-30 at 18:20, Manfred Spraul wrote:
> If I understand the Intel documentation correctly, then i386 doesn't need them:
> "Writes by a single processor are observed in the same order by all processors"

See the PPro errata. There are some constraints on this in the real
world. You may need locked ops on the ppro and earlier cpus while 
being able to do it the fast way on PII and higher


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: frlock and barrier discussion
  2003-01-30  1:52       ` Andrea Arcangeli
@ 2003-01-31  0:41         ` Richard Henderson
  2003-01-31  0:57           ` Andrea Arcangeli
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2003-01-31  0:41 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Stephen Hemminger, Andrew Morton, Andi Kleen,
	Linux Kernel Mailing List

On Thu, Jan 30, 2003 at 02:52:19AM +0100, Andrea Arcangeli wrote:
> you're missing xtimensec is a write.

Eh?  xtimensec is a register.



r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: frlock and barrier discussion
  2003-01-31  0:41         ` Richard Henderson
@ 2003-01-31  0:57           ` Andrea Arcangeli
  0 siblings, 0 replies; 11+ messages in thread
From: Andrea Arcangeli @ 2003-01-31  0:57 UTC (permalink / raw)
  To: Stephen Hemminger, Andrew Morton, Andi Kleen,
	Linux Kernel Mailing List

On Thu, Jan 30, 2003 at 04:41:20PM -0800, Richard Henderson wrote:
> On Thu, Jan 30, 2003 at 02:52:19AM +0100, Andrea Arcangeli wrote:
> > you're missing xtimensec is a write.
> 
> Eh?  xtimensec is a register.

then it's fine, the register will be flushed to ram eventually and it
will be within the two wmb(), just write in the example also the line
where the xtimensec ""register"" is flushed to ram and it will be in the
right place

if the register isn't flushed to ram eventually, it will be discared and
the whole critical section is a noop from the point of view of the other
cpus and no wmb() or rmb() or mb() would be needed in the first place

Andrea

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2003-01-31  0:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-30 18:20 frlock and barrier discussion Manfred Spraul
2003-01-30 18:26 ` Andrea Arcangeli
2003-01-30 19:05   ` Manfred Spraul
2003-01-30 19:54   ` Davide Libenzi
2003-01-30 22:32 ` Alan Cox
  -- strict thread matches above, loose matches on Subject: below --
2003-01-28 23:42 [PATCH] (1/4) 2.5.59 fast reader/writer lock for gettimeofday Stephen Hemminger
2003-01-29  7:06 ` Richard Henderson
2003-01-30  1:15   ` frlock and barrier discussion Stephen Hemminger
2003-01-30  1:29     ` Andrea Arcangeli
2003-01-30  1:41     ` Richard Henderson
2003-01-30  1:52       ` Andrea Arcangeli
2003-01-31  0:41         ` Richard Henderson
2003-01-31  0:57           ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox