linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* PPC Kernel Gurus Help?
       [not found] ` <19990411150328.032219@mail.mipsys.com>
@ 1999-04-11 15:07   ` Kevin B. Hendricks
  1999-04-12  4:38     ` Paul Mackerras
  0 siblings, 1 reply; 7+ messages in thread
From: Kevin B. Hendricks @ 1999-04-11 15:07 UTC (permalink / raw)
  To: linux-pmac, gdt, linuxppc-dev, Paul.Mackerras


Hi,

I have been trying to track down deadlocks in linuxthreads code when using
semaphores.  The semaphores are implemented using the lwarx and stwcx.
instructions to create both a testandset and compare_and_swap type of
intructions.

Unfortunately, things are not working very reliably.  I was looking in the
PPC Programmers Environment Manual found that in section 6.3 it specifies
that the operating system when process switching should do an stwcx.
instruction to a nonsense EA to clear any reservations held by the
processor before starting the new process.

Is this being done in Linux PPC kernels?

Does anything special have to be done for threads created with the clone
system call?

What about in signal handlers?  If a signal handler is invoked in the
middle of the lwarx/stwcx. instruction pairs, should the handler be
clearing the reservation bit?

Should we be clearing the reservation using an stwcx. instruction in the
sigsetjmp / longjmp calls because they are often used to longjmp out of
signal handlers which in turn might result in a mispaired ldwarx/stwcx. set
of instructions similar to a process switch?

Use of these instructions lwarx/stwcx. is not something I am up on.  I have
been trying to understand things from reading the PPC manuals from Motorola
and IBM but the answers to my questions above were not clear.

Any help here would be greatly appreciated?  Am I barking up the wrong tree?

Thanks,

Kevin





[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PPC Kernel Gurus Help?
@ 1999-04-11 18:15 Kevin B. Hendricks
  1999-04-12  4:48 ` Paul Mackerras
  0 siblings, 1 reply; 7+ messages in thread
From: Kevin B. Hendricks @ 1999-04-11 18:15 UTC (permalink / raw)
  To: linuxppc-dev, gdt


>Date: Sun, 11 Apr 1999 14:13:40 -0400
>To: Benjamin Herrenschmidt <bh40@calva.net>
>From: "Kevin B. Hendricks" <kbhend@business.wm.edu>
>Subject: Re: PPC Kernel Gurus Help?
>Cc:
>Bcc:
>X-Attachments:
>
>Hi,
>
>I hope Apple's implementation is just overkill.  The linuxthreads
>pt-machine.h file in both glibc 1.99 and glibc 2.1 do not have the extra
>isyncs (they just use sync both before and after the routine).  They also
>do not align things to cache boundaries.  To do that we would have to
>change the sem_t because both the spinlock and the semaphore value are
>side by side and both are accessed this way meaning that sem_t would have
>to be 32 byte aligned and take up 64 bytes to be safe (32 for the
>semaphore and 32 for the spinlock).
>
>By the way,  I looked in that arch/ppc kernel for 2.2.1 and their
>implementation of testandset and compare_and_swap does not use either sync
>or isync or any cache alignment!!!!! (see bitops, misc.S and head.S for
>examples)
>
>The PowerPC manual in Appendix G mentions the cache grain resolution
>problem but does not include it in their examples of testandset and
>compare_and_swap.  Also there examples only use isync and not sync but
>point out that for SMP, you should use sync.  They also only use sync
>before and after the routines and not in the middle.
>
>
>So are the isync and syncs needed?
>
>Should the semaphores be aligned to 32 byte address boundaries to take up
>a whole cache line?
>
>>It looks like apple implementation makes sure to always align the value
>>that is c&swapped to a cache line boundary (32 bytes). Also, they do a
>>sync and an isync. Apple's implementation looks like this: (This one
>>comes from some code I use on old PPC macs that don't have a system
>>function for compare&swap).
>>
>>static asm Boolean	s_low_compare_and_swap(	UInt32
>>	inOld,
>>
>>			UInt32				inNew,
>>
>>			volatile UInt32		*outOld)
>>	{
>>	begin:	lwarx	r6,r0,r5
>>			cmpw	r6,r3
>>			bne		failed
>>			sync
>>			stwcx.	r4,r0,r5
>>			bne-	begin
>>			sync
>>			isync
>>			li		r3,1
>>			blr
>>	failed:	sync
>>			stwcx.	r6,r0,r5
>>			li		r3,0
>>			blr
>>	}
>>
>>Regarding the kernel, I beleive signals and context switches (and
>>eventually any interrupt handler) should clear reservations too, but I'm
>>not sure if failure to do so can be the cause of your problems.
>
>Kevin
>
>
>



[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PPC Kernel Gurus Help?
  1999-04-11 15:07   ` Kevin B. Hendricks
@ 1999-04-12  4:38     ` Paul Mackerras
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Mackerras @ 1999-04-12  4:38 UTC (permalink / raw)
  To: kbhend; +Cc: linuxppc-dev


Kevin B. Hendricks <kbhend@business.wm.edu> wrote:

> Unfortunately, things are not working very reliably.  I was looking in the
> PPC Programmers Environment Manual found that in section 6.3 it specifies
> that the operating system when process switching should do an stwcx.
> instruction to a nonsense EA to clear any reservations held by the
> processor before starting the new process.
> 
> Is this being done in Linux PPC kernels?

In fact, recent kernels do a dummy stwcx. on every entry to the
kernel, not just on context switches.  If you want to check the kernel
source you're using, look for a stwcx. in the transfer_to_handler
routine in arch/ppc/kernel/head.S.

> Does anything special have to be done for threads created with the clone
> system call?

Not AFAICS, since you have to enter the kernel to switch from one
thread to another.  If you have a user-level threads implementation,
it should do a dummy stwcx. in the context-switch code.

> What about in signal handlers?  If a signal handler is invoked in the
> middle of the lwarx/stwcx. instruction pairs, should the handler be
> clearing the reservation bit?

Yes.

> Should we be clearing the reservation using an stwcx. instruction in the
> sigsetjmp / longjmp calls because they are often used to longjmp out of
> signal handlers which in turn might result in a mispaired ldwarx/stwcx. set
> of instructions similar to a process switch?

It is maybe worth considering doing a dummy stwcx. in [sig]longjmp.
However, the reservation should have been cleared by the time you get
into a signal handler anyway, so the only time you should see a
problem is if you do an explicit longjmp between the lwarx and
stwcx. :-)

Paul.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PPC Kernel Gurus Help?
  1999-04-11 18:15 PPC Kernel Gurus Help? Kevin B. Hendricks
@ 1999-04-12  4:48 ` Paul Mackerras
  1999-04-12 18:41   ` Crashing My PowerbookG3-Series when writing to the serial port Alexander Derbes
  1999-04-14 13:37   ` PPC Kernel Gurus Help? Benjamin Herrenschmidt
  0 siblings, 2 replies; 7+ messages in thread
From: Paul Mackerras @ 1999-04-12  4:48 UTC (permalink / raw)
  To: kbhend; +Cc: linuxppc-dev


Kevin B. Hendricks <kbhend@business.wm.edu> wrote:

> >I hope Apple's implementation is just overkill.  The linuxthreads
> >pt-machine.h file in both glibc 1.99 and glibc 2.1 do not have the extra
> >isyncs (they just use sync both before and after the routine).  They also
> >do not align things to cache boundaries.  To do that we would have to
> >change the sem_t because both the spinlock and the semaphore value are
> >side by side and both are accessed this way meaning that sem_t would have
> >to be 32 byte aligned and take up 64 bytes to be safe (32 for the
> >semaphore and 32 for the spinlock).

Does it use lwarx/stwcx. to access both the spinlock and the
semaphore?  If so that could possibly cause a problem if they are in
the same cache line (strictly, "reservation granule").

> >By the way,  I looked in that arch/ppc kernel for 2.2.1 and their
> >implementation of testandset and compare_and_swap does not use either sync
> >or isync or any cache alignment!!!!! (see bitops, misc.S and head.S for
> >examples)

The way I understand it is that the syncs aren't necessary for the
atomic nature of the read/modify/write on the location you're
accessing with lwarx/stwcx.  The reason you would have syncs there is
if you additionally want some ordering (WRT other cpus) between the
atomic R/M/W and other memory accesses you do before or after it.
Thus in the kernel source you will see syncs in the semaphore code but
not in the atomic operations.  An atomic operation doesn't of itself
imply anything about other memory accesses, whereas a semaphore
lock/unlock does.

> >The PowerPC manual in Appendix G mentions the cache grain resolution
> >problem but does not include it in their examples of testandset and
> >compare_and_swap.  Also there examples only use isync and not sync but
> >point out that for SMP, you should use sync.  They also only use sync
> >before and after the routines and not in the middle.
> >
> >
> >So are the isync and syncs needed?

I don't understand why an isync should be needed.  I believe a sync is
only needed if you want a constraint on the order in which other CPUs
will see the atomic operation compared to other memory references (I'm
not dogmatic about that, I could be wrong, but that's my current
understanding.)

Paul.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Crashing My PowerbookG3-Series when writing to the serial port
  1999-04-12  4:48 ` Paul Mackerras
@ 1999-04-12 18:41   ` Alexander Derbes
  1999-04-14 13:37   ` PPC Kernel Gurus Help? Benjamin Herrenschmidt
  1 sibling, 0 replies; 7+ messages in thread
From: Alexander Derbes @ 1999-04-12 18:41 UTC (permalink / raw)
  To: linuxppc-dev



I think I have found a bug, However, I am not sure how to go about
isolating it.  

I have written a little piece of C code which opens the serial port,
configures it to 8bits No Parity and 2 Stop bits at 9600 Baud, and then
write a few bytes and closes the serial port.  The code is being used to
send packets to an external device.

The code will sometimes hang the powerbook.  It seems that hangs occur
only directly after I have started apache.  The serial program is called
as a CGI by apache so that packets can be written to the device via the
web.

Because the machine is totally hung when the problem shows up I do not
know how to go about debugging it.  Any ideas?

-acd


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PPC Kernel Gurus Help?
@ 1999-04-12 20:49 Edward Swarthout
  0 siblings, 0 replies; 7+ messages in thread
From: Edward Swarthout @ 1999-04-12 20:49 UTC (permalink / raw)
  To: linuxppc-dev



>  Date: Mon, 12 Apr 1999 14:48:17 +1000
>  From: Paul Mackerras <paulus@cs.anu.edu.au>
>
>  > >...                Also there examples only use isync and not sync but
>  > >point out that for SMP, you should use sync.  They also only use sync
>  > >before and after the routines and not in the middle.
>  > >
>  > >So are the isync and syncs needed?
>
>  I don't understand why an isync should be needed.  I believe a sync is
>  only needed if you want a constraint on the order in which other CPUs
>  will see the atomic operation compared to other memory references (I'm
>  not dogmatic about that, I could be wrong, but that's my current
>  understanding.)
>
>  Paul.

I believe this discussion comes from the example in appendix E.4 "Lock
Acquisition and Release".  I think the example could use a better
wording to motivate the need for the isync.  It simply makes the
statement: "The processor must not access the shared resource until it
sets the lock".  A better wording: "IF the lock must prevent the
processor from accessing the shared resource until the successful lock
is acquired, a barrier needs to be created between the stwcx and the
access".

The lock code looks like:

  lock: call test_and_set until lock acquired (lwarx/stwcx loop)
        isync
        access_shared_location

Without the isync, nothing prevents the access_shared_location to
happen before the lwarx/stwcx loop returns.  To prevent the access, a
dependency between the stwcx and the access must be created.  Three
ways (with the isync option being the best):

1. isync - instruction-stream is blocked until successful stwcx
2. sync  - memory access is blocked until successful stwcx
3. operand dependency - delay loading register containing shared address
           until lock is acquired.

Only one option needs to be picked.

-Ed Swarthout
Somerset Design Center
Motorola 

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PPC Kernel Gurus Help?
  1999-04-12  4:48 ` Paul Mackerras
  1999-04-12 18:41   ` Crashing My PowerbookG3-Series when writing to the serial port Alexander Derbes
@ 1999-04-14 13:37   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 7+ messages in thread
From: Benjamin Herrenschmidt @ 1999-04-14 13:37 UTC (permalink / raw)
  To: Paul.Mackerras, linuxppc-dev, kbhend


On Mon, Apr 12, 1999, Paul Mackerras <paulus@cs.anu.edu.au> wrote:

> >I hope Apple's implementation is just overkill.  The linuxthreads
>> >pt-machine.h file in both glibc 1.99 and glibc 2.1 do not have the extra
>> >isyncs (they just use sync both before and after the routine).  They also
>> >do not align things to cache boundaries.  To do that we would have to
>> >change the sem_t because both the spinlock and the semaphore value are
>> >side by side and both are accessed this way meaning that sem_t would have
>> >to be 32 byte aligned and take up 64 bytes to be safe (32 for the
>> >semaphore and 32 for the spinlock).
>
>Does it use lwarx/stwcx. to access both the spinlock and the
>semaphore?  If so that could possibly cause a problem if they are in
>the same cache line (strictly, "reservation granule").

Don't care about what I said, I was just plain wrong (still having
trouble with PPC assembly...). Apple's implementation doesn't watch for
32 byte alignement but for 4 byte alignement, which is a lot more
understandable.

If I understand things correctly, however, there is still a potential
problem in MP, if two atomics are in the same granule, and two processors
are trying to use them at the same time. I beleive we should make sure
the kernel's atomic type takes a whole granule (cache line).

-- 
           E-Mail: <mailto:bh40@calva.net>
BenH.      Web   : <http://calvaweb.calvacom.fr/bh40/>





[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~1999-04-14 13:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
1999-04-11 18:15 PPC Kernel Gurus Help? Kevin B. Hendricks
1999-04-12  4:48 ` Paul Mackerras
1999-04-12 18:41   ` Crashing My PowerbookG3-Series when writing to the serial port Alexander Derbes
1999-04-14 13:37   ` PPC Kernel Gurus Help? Benjamin Herrenschmidt
  -- strict thread matches above, loose matches on Subject: below --
1999-04-12 20:49 Edward Swarthout
     [not found] <370E71B0.577788B0@synxis.com>
     [not found] ` <19990411150328.032219@mail.mipsys.com>
1999-04-11 15:07   ` Kevin B. Hendricks
1999-04-12  4:38     ` Paul Mackerras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).