linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* why isync in atomic icc and return and atomic dec and return for CONFIG_SMP
@ 2002-07-23 21:25 Kevin B. Hendricks
  2002-07-23 22:15 ` Kevin B. Hendricks
  2002-07-24  0:30 ` Anton Blanchard
  0 siblings, 2 replies; 7+ messages in thread
From: Kevin B. Hendricks @ 2002-07-23 21:25 UTC (permalink / raw)
  To: linuxppc-dev, yellowdog-devel


Hi,

Can anyone tell me the reason why we need to use an isync in the
atomic_add_return and atomic_sub_return (see kernel source in
asm/atomic.h) only for SMP machiens and only when a value is returned?

My understanding of an "isync" is that it forces all instruction issued
previous to the isync to be completed before any new instructions can be
fetched on that cpu (not on all cpus?).

Why is this needed only for SMP machines?

Why is it only needed when a value is actually returned (there are no isync
instructions on the non-returning versions)?

Can someone explain exactly what the isync does for us here?


#ifdef CONFIG_SMP
#define SMP_ISYNC       "\n\tisync"
#else
#define SMP_ISYNC
#endif

static __inline__ int atomic_add_return(int a, atomic_t *v)
{
        int t;

        __asm__ __volatile__(
"1:     lwarx   %0,0,%2         # atomic_add_return\n\
        add     %0,%1,%0\n\
        stwcx.  %0,0,%2\n\
        bne-    1b"
        SMP_ISYNC
        : "=&r" (t)
        : "r" (a), "r" (&v->counter)
        : "cc", "memory");

        return t;
}


static __inline__ int atomic_sub_return(int a, atomic_t *v)
{
        int t;

        __asm__ __volatile__(
"1:     lwarx   %0,0,%2         # atomic_sub_return\n\
        subf    %0,%1,%0\n\
        stwcx.  %0,0,%2\n\
        bne-    1b"
        SMP_ISYNC
        : "=&r" (t)
        : "r" (a), "r" (&v->counter)
        : "cc", "memory");

        return t;
}


Thanks,

Kevin


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: why isync in atomic icc and return and atomic dec and return for CONFIG_SMP
  2002-07-23 21:25 why isync in atomic icc and return and atomic dec and return for CONFIG_SMP Kevin B. Hendricks
@ 2002-07-23 22:15 ` Kevin B. Hendricks
  2002-07-24  0:30 ` Anton Blanchard
  1 sibling, 0 replies; 7+ messages in thread
From: Kevin B. Hendricks @ 2002-07-23 22:15 UTC (permalink / raw)
  To: yellowdog-devel, linuxppc-dev


Hi,

"PowerPC Microprocessor Family: The Programming Environments" manual
(MPRPPCFPE-01, MPCFPE/AD from IBM Microelectronics and Motorola -dated
1994) on page 8-104 says the following about "isync" and I am quoting:

This instruction waits for all previous instructions to complete and then
discards any prefetched instructions, causing subsequent instructions to
be fetched (or refetched) from memory and to execute in the context
established by the previous instructions.  This instruction has NO EFFECT
on other processors or their caches.

So why should we be using this only for SMP systems?  If we are using the
atomic operations to create our own locks then I understand why the isync
but then it should be there for both smp and non-smp systems shouldn't it?

I must be missing something here?

Kevin



On July 23, 2002 05:25, Kevin B. Hendricks wrote:
> Hi,
>
> Can anyone tell me the reason why we need to use an isync in the
> atomic_add_return and atomic_sub_return (see kernel source in
> asm/atomic.h) only for SMP machiens and only when a value is returned?
>
> My understanding of an "isync" is that it forces all instruction issued
> previous to the isync to be completed before any new instructions can be
> fetched on that cpu (not on all cpus?).
>
> Why is this needed only for SMP machines?
>
> Why is it only needed when a value is actually returned (there are no
> isync instructions on the non-returning versions)?
>
> Can someone explain exactly what the isync does for us here?
>
>
> #ifdef CONFIG_SMP
> #define SMP_ISYNC       "\n\tisync"
> #else
> #define SMP_ISYNC
> #endif
>
> static __inline__ int atomic_add_return(int a, atomic_t *v)
> {
>         int t;
>
>         __asm__ __volatile__(
> "1:     lwarx   %0,0,%2         # atomic_add_return\n\
>         add     %0,%1,%0\n\
>         stwcx.  %0,0,%2\n\
>         bne-    1b"
>         SMP_ISYNC
>
>         : "=&r" (t)
>         : "r" (a), "r" (&v->counter)
>         : "cc", "memory");
>
>         return t;
> }
>
>
> static __inline__ int atomic_sub_return(int a, atomic_t *v)
> {
>         int t;
>
>         __asm__ __volatile__(
> "1:     lwarx   %0,0,%2         # atomic_sub_return\n\
>         subf    %0,%1,%0\n\
>         stwcx.  %0,0,%2\n\
>         bne-    1b"
>         SMP_ISYNC
>
>         : "=&r" (t)
>         : "r" (a), "r" (&v->counter)
>         : "cc", "memory");
>
>         return t;
> }
>
>
> Thanks,
>
> Kevin
>
> _______________________________________________
> yellowdog-devel mailing list
> yellowdog-devel@lists.terrasoftsolutions.com
> http://lists.terrasoftsolutions.com/mailman/listinfo/yellowdog-devel


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: why isync in atomic icc and return and atomic dec and return for CONFIG_SMP
  2002-07-23 21:25 why isync in atomic icc and return and atomic dec and return for CONFIG_SMP Kevin B. Hendricks
  2002-07-23 22:15 ` Kevin B. Hendricks
@ 2002-07-24  0:30 ` Anton Blanchard
  2002-07-24 12:04   ` Kevin B. Hendricks
  1 sibling, 1 reply; 7+ messages in thread
From: Anton Blanchard @ 2002-07-24  0:30 UTC (permalink / raw)
  To: Kevin B. Hendricks; +Cc: linuxppc-dev, yellowdog-devel


Hi Kevin,

> Can anyone tell me the reason why we need to use an isync in the
> atomic_add_return and atomic_sub_return (see kernel source in
> asm/atomic.h) only for SMP machiens and only when a value is returned?

We are using isync here as an "import barrier". The stwcx., bne, isync
sequence ensures that any instructions following the isync are not
performed until the lock has been taken. Basically it prevents anything
inside the spinlock protected region from leaking outside.

We dont need this on a UP machine because the local cpu sees everything
in program order.

Anton

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: why isync in atomic icc and return and atomic dec and return for CONFIG_SMP
  2002-07-24  0:30 ` Anton Blanchard
@ 2002-07-24 12:04   ` Kevin B. Hendricks
  2002-07-28  2:26     ` Anton Blanchard
  0 siblings, 1 reply; 7+ messages in thread
From: Kevin B. Hendricks @ 2002-07-24 12:04 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: linuxppc-dev, yellowdog-devel


Hi,

> > Can anyone tell me the reason why we need to use an isync in the
> > atomic_add_return and atomic_sub_return (see kernel source in
> > asm/atomic.h) only for SMP machiens and only when a value is returned?
>
> We are using isync here as an "import barrier". The stwcx., bne, isync
> sequence ensures that any instructions following the isync are not
> performed until the lock has been taken. Basically it prevents anything
> inside the spinlock protected region from leaking outside.
>
> We dont need this on a UP machine because the local cpu sees everything
> in program order.

So the atomic increment and decrement awith return are being used  in locks
to protect extended criticial regions?

If so, a lock (of any sort) does require an isync (according to the manual)
immediately after gaining the lock to make sure all speculative
prefetching of instructions and data (possibly stale since someone else
could have changed them before dropping the lock) should be done for pboth
cases.

Why doesn't the same problem happen from the processor's speculative
prefetching of instructions in the uniprocessor case?  Since that routine
is inlined, the single processor could have loaded and started to process
instructions past the "lock" before it actually aaquires the lock.

Kevin


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: why isync in atomic icc and return and atomic dec and return for CONFIG_SMP
  2002-07-24 12:04   ` Kevin B. Hendricks
@ 2002-07-28  2:26     ` Anton Blanchard
  2002-08-03 14:28       ` Kevin B. Hendricks
  0 siblings, 1 reply; 7+ messages in thread
From: Anton Blanchard @ 2002-07-28  2:26 UTC (permalink / raw)
  To: Kevin B. Hendricks; +Cc: linuxppc-dev, yellowdog-devel


> So the atomic increment and decrement awith return are being used  in locks
> to protect extended criticial regions?

Yes, and so are test_and_set_bit etc. In fact I just found a bug in 2.5
where we were using bitops as spinlocks and were missing a memory
barrier on the lock drop (notice how clear_bit doesnt have a barrier and
we have smp_mb__before_clear_bit()).

> If so, a lock (of any sort) does require an isync (according to the manual)
> immediately after gaining the lock to make sure all speculative
> prefetching of instructions and data (possibly stale since someone else
> could have changed them before dropping the lock) should be done for pboth
> cases.

Yes.

> Why doesn't the same problem happen from the processor's speculative
> prefetching of instructions in the uniprocessor case?  Since that routine
> is inlined, the single processor could have loaded and started to process
> instructions past the "lock" before it actually aaquires the lock.

The big difference here is that there are no other cpus that can modify
memory. The cpu is free to prefetch the load but it must present
everything in program order to the program. Imagine what would happen if
we had int i = 0; i++; printf("%d\n", i); and we got 0 :)

There are two cases:

1. The prefetched load ends up conflicting with a previous store. The
load and all instructions after it depending on this load must be
flushed and retried.

2. The load has no previous dependencies. Since no other CPU could
modify memory then the prefetch is valid.

On a UP build the spinlocks disappear, all that is left is the interrupt
disable/enable if using the _irq and _irqsave versions. Having said this
you may ask we we need the the lwarx/stcwx. in the atomics and bitops at
all in a UP build. The reason is that we could get an interrupt and we
need to ensure that we are atomic wrt them.

BTW inlining isnt enough to avoid prefetching, the cpu is free to
prefetch both into a function and out of it.

Anton

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: why isync in atomic icc and return and atomic dec and return for CONFIG_SMP
  2002-07-28  2:26     ` Anton Blanchard
@ 2002-08-03 14:28       ` Kevin B. Hendricks
  2002-08-04 14:26         ` Michael R. Zucca
  0 siblings, 1 reply; 7+ messages in thread
From: Kevin B. Hendricks @ 2002-08-03 14:28 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: linuxppc-dev, yellowdog-devel


Hi ,

One followup question please.

In userland code, say with a fully pre-emptible kernel, if a signal comes
in to thread A and before returning another thread (call it B)  is run,
couldn't the same problem happen with prefetching data being old since
thread B may in fact be the one holding the lock and allowed to change the
data.

I guess what I am asking is would we need the isync for a userland version
of this code if threads could actually be pre-empted even for a UP
machine?

Thanks for your help.

Kevin

On July 27, 2002 10:26, Anton Blanchard wrote:
> > So the atomic increment and decrement awith return are being used  in
> > locks to protect extended criticial regions?
>
> Yes, and so are test_and_set_bit etc. In fact I just found a bug in 2.5
> where we were using bitops as spinlocks and were missing a memory
> barrier on the lock drop (notice how clear_bit doesnt have a barrier and
> we have smp_mb__before_clear_bit()).
>
> > If so, a lock (of any sort) does require an isync (according to the
> > manual) immediately after gaining the lock to make sure all
> > speculative prefetching of instructions and data (possibly stale since
> > someone else could have changed them before dropping the lock) should
> > be done for pboth cases.
>
> Yes.
>
> > Why doesn't the same problem happen from the processor's speculative
> > prefetching of instructions in the uniprocessor case?  Since that
> > routine is inlined, the single processor could have loaded and started
> > to process instructions past the "lock" before it actually aaquires
> > the lock.
>
> The big difference here is that there are no other cpus that can modify
> memory. The cpu is free to prefetch the load but it must present
> everything in program order to the program. Imagine what would happen if
> we had int i = 0; i++; printf("%d\n", i); and we got 0 :)
>
> There are two cases:
>
> 1. The prefetched load ends up conflicting with a previous store. The
> load and all instructions after it depending on this load must be
> flushed and retried.
>
> 2. The load has no previous dependencies. Since no other CPU could
> modify memory then the prefetch is valid.
>
> On a UP build the spinlocks disappear, all that is left is the interrupt
> disable/enable if using the _irq and _irqsave versions. Having said this
> you may ask we we need the the lwarx/stcwx. in the atomics and bitops at
> all in a UP build. The reason is that we could get an interrupt and we
> need to ensure that we are atomic wrt them.
>
> BTW inlining isnt enough to avoid prefetching, the cpu is free to
> prefetch both into a function and out of it.
>
> Anton


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: why isync in atomic icc and return and atomic dec and return for CONFIG_SMP
  2002-08-03 14:28       ` Kevin B. Hendricks
@ 2002-08-04 14:26         ` Michael R. Zucca
  0 siblings, 0 replies; 7+ messages in thread
From: Michael R. Zucca @ 2002-08-04 14:26 UTC (permalink / raw)
  To: Kevin B. Hendricks; +Cc: Anton Blanchard, linuxppc-dev, yellowdog-devel


At 10:28 AM -0400 8/3/02, Kevin B. Hendricks wrote:

>In userland code, say with a fully pre-emptible kernel, if a signal comes
>in to thread A and before returning another thread (call it B)  is run,
>couldn't the same problem happen with prefetching data being old since
>thread B may in fact be the one holding the lock and allowed to change the
>data.
>
>I guess what I am asking is would we need the isync for a userland version
>of this code if threads could actually be pre-empted even for a UP
>machine?

I would imagine this still isn't necessary because a preemption must happen
as the result of an interrupt, which means an rfi gets executed. rfi is
context synchronizing, just like isync.

Voluntary context switching would involve making a syscall and sc is also
context synchronzing.

So I think all of the bases are covered for UP.


----------------------------------------------
 Michael Zucca - mrz5149@acm.org
----------------------------------------------
 "I'm too old to use Emacs." -- Rod MacDonald
----------------------------------------------


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2002-08-04 14:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-23 21:25 why isync in atomic icc and return and atomic dec and return for CONFIG_SMP Kevin B. Hendricks
2002-07-23 22:15 ` Kevin B. Hendricks
2002-07-24  0:30 ` Anton Blanchard
2002-07-24 12:04   ` Kevin B. Hendricks
2002-07-28  2:26     ` Anton Blanchard
2002-08-03 14:28       ` Kevin B. Hendricks
2002-08-04 14:26         ` Michael R. Zucca

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).