Linux MIPS Architecture development
 help / color / mirror / Atom feed
* Once again: test_and_set for CPUs w/o LL/SC
@ 2002-09-16 16:40 Johannes Stezenbach
  2002-10-07 14:47 ` Johannes Stezenbach
  0 siblings, 1 reply; 25+ messages in thread
From: Johannes Stezenbach @ 2002-09-16 16:40 UTC (permalink / raw)
  To: linux-mips

[-- Attachment #1: Type: text/plain, Size: 2046 bytes --]

Hi,

the thread with subject "LL/SC benchmarking" in mid July
died away, since I was busy doing more pressing things.

To refresh your memory:
The NEC VR41xx CPU has no LL/SC instructions, so they must
be emulated by the kernel, which slows down the test-and-set
and compare-and-swap operations (used by linux-threads)
considerably. For the VR41xx (and other CPUs which have
branch-likely instructions), there exisits a workaround
which enables userspace-only atomic operations, with minor
help from the kernel: The kernel must guarantee that register
k1 is not equal to some magic value after every transition
to userspace.

Two things were left open in July:
- find out the minimal amount of changes to the kernel
  to guarantee k1 != MAGIC after eret
- determine how to tell glibc to use the branch-likely
  workaround instead of emulated LL/SC

I looked through the kernel code, and I think that
arch/mips/mm/tlbex-r4k.S always leaves the last value
from CP0_ENTRYLO in k1, thus bit 31 of k1 is guarateed
to be zero.
The other interrupt and exception handlers seem to
be too complex to make any guarantees about the value left
in k1, so I added a 'move k1,$0' to RESTORE_SP_AND_RET
in include/asm-mips/stackframe.h, and conditionalized
it via CONFIG_MIPS_USERSPACE_ATOMIC_OPS.

To tell glibc about the support for the branch-likely
workaround I added a sysctl.

A few things:
- I couldn't come up with a better name for the config
  option and sysctl.
- If glibc were to use sysctl(2) to query for
  mips_userspace_atomic_ops, glibc would depend on
  linux/sysctl.h from the very newest kernel, which
  is bad; maybe glibc should just query for exisitence
  of the file /proc/sys/kernel/mips_userspace_atomic_ops?
  Or maybe we should just add a /proc/foobar instead
  of a sysctl?
- Maybe I should add some comments to tlbex-r4k.S so
  no-one accidentally breaks the k1 < 0x80000000 assertion?


Please comment on the appended patch.
If we could agree the kernel support for this, I would
prepare a matching glibc patch.


Regards,
Johannes

[-- Attachment #2: linux-oss-nollsc.patch --]
[-- Type: text/plain, Size: 4993 bytes --]

Index: Documentation/Configure.help
===================================================================
RCS file: /cvs/linux/Documentation/Attic/Configure.help,v
retrieving revision 1.109.2.8
diff -a -u -r1.109.2.8 Configure.help
--- Documentation/Configure.help	2002/09/11 12:44:23	1.109.2.8
+++ Documentation/Configure.help	2002/09/16 14:39:29
@@ -2337,6 +2337,20 @@
   for better performance, N if you don't know.  You must say Y here
   for multiprocessor machines.
 
+Support userspace atomic ops for MIPS2 CPUs
+CONFIG_MIPS_USERSPACE_ATOMIC_OPS
+  Say Y here if your CPU supports the MIPS2 ISA (i.e. it has
+  support for the "branch likely" instructions), but does not
+  have the LL and SC instructions which normally are required for
+  userspace atomic operations, e.g. for the NEC VR41xx CPUs.
+  Selecting this option guarantees that the value of the k1 register,
+  which is normally reserved for the kernel, is lower than
+  0x80000000 after any transition from kernel to userspace. It
+  also sets the read-only sysctl kernel.mips_userspace_atomic_ops to
+  the value "1", so that libc/libpthread can detect kernel support for
+  a fast test-and-set implementation for this kind of CPU (instead
+  of LL/SC emulation). If in doubt, say N.
+
 lld and scd instructions available
 CONFIG_CPU_HAS_LLDSCD
   Say Y here if your CPU has the lld and scd instructions, the 64-bit
Index: arch/mips/config-shared.in
===================================================================
RCS file: /cvs/linux/arch/mips/config-shared.in,v
retrieving revision 1.1.2.18
diff -a -u -r1.1.2.18 config-shared.in
--- arch/mips/config-shared.in	2002/09/11 12:44:28	1.1.2.18
+++ arch/mips/config-shared.in	2002/09/16 14:39:31
@@ -513,6 +513,9 @@
 dep_bool 'Override CPU Options' CONFIG_CPU_ADVANCED $CONFIG_MIPS32
 if [ "$CONFIG_CPU_ADVANCED" = "y" ]; then
    bool '  ll/sc Instructions available' CONFIG_CPU_HAS_LLSC
+   if [ "$CONFIG_CPU_HAS_LLSC" = "n" ]; then
+      bool '    Support userspace atomic ops for MIPS2 CPUs' CONFIG_MIPS_USERSPACE_ATOMIC_OPS
+   fi
    bool '  lld/scd Instructions available' CONFIG_CPU_HAS_LLDSCD
    bool '  Writeback Buffer available' CONFIG_CPU_HAS_WB
 else
@@ -528,6 +531,11 @@
 	 define_bool CONFIG_CPU_HAS_LLDSCD n
 	 define_bool CONFIG_CPU_HAS_WB n
       fi
+      if [ "$CONFIG_CPU_VR41XX" = "y" ]; then
+         define_bool CONFIG_MIPS_USERSPACE_ATOMIC_OPS y
+      else
+         define_bool CONFIG_MIPS_USERSPACE_ATOMIC_OPS n
+      fi
    else
       if [ "$CONFIG_CPU_MIPS32" = "y" ]; then
 	 define_bool CONFIG_CPU_HAS_LLSC y
@@ -538,6 +546,7 @@
 	 define_bool CONFIG_CPU_HAS_LLDSCD y
 	 define_bool CONFIG_CPU_HAS_WB n
       fi
+      define_bool CONFIG_MIPS_USERSPACE_ATOMIC_OPS n
    fi
 fi
 if [ "$CONFIG_CPU_R3000" = "y" ]; then
Index: include/asm-mips/stackframe.h
===================================================================
RCS file: /cvs/linux/include/asm-mips/stackframe.h,v
retrieving revision 1.18.2.2
diff -a -u -r1.18.2.2 stackframe.h
--- include/asm-mips/stackframe.h	2002/08/05 23:53:37	1.18.2.2
+++ include/asm-mips/stackframe.h	2002/09/16 14:39:35
@@ -201,8 +201,15 @@
 		lw	$3,  PT_R3(sp);                  \
 		lw	$2,  PT_R2(sp)
 
+#ifdef CONFIG_MIPS_USERSPACE_ATOMIC_OPS
+#define CLEAR_K1 move k1,$0;
+#else
+#define CLEAR_K1
+#endif
+
 #define RESTORE_SP_AND_RET                               \
 		lw	sp,  PT_R29(sp);                 \
+		CLEAR_K1                                 \
 		.set	mips3;				 \
 		eret;					 \
 		.set	mips0
Index: include/linux/sysctl.h
===================================================================
RCS file: /cvs/linux/include/linux/sysctl.h,v
retrieving revision 1.44.2.3
diff -a -u -r1.44.2.3 sysctl.h
--- include/linux/sysctl.h	2002/09/11 12:45:40	1.44.2.3
+++ include/linux/sysctl.h	2002/09/16 14:39:35
@@ -124,6 +124,7 @@
 	KERN_CORE_USES_PID=52,		/* int: use core or core.%pid */
 	KERN_TAINTED=53,	/* int: various kernel tainted flags */
 	KERN_CADPID=54,		/* int: PID of the process to notify on CAD */
+	KERN_MIPS_USERSPACE_ATOMIC_OPS=55, /* int: kernel supports atomicity w/o LL/SC */
 };
 
 
Index: kernel/sysctl.c
===================================================================
RCS file: /cvs/linux/kernel/sysctl.c,v
retrieving revision 1.46.2.4
diff -a -u -r1.46.2.4 sysctl.c
--- kernel/sysctl.c	2002/09/10 15:32:56	1.46.2.4
+++ kernel/sysctl.c	2002/09/16 14:39:36
@@ -96,6 +96,10 @@
 extern int acct_parm[];
 #endif
 
+#ifdef CONFIG_MIPS_USERSPACE_ATOMIC_OPS
+static int mips_userspace_atomic_ops = 1;
+#endif
+
 extern int pgt_cache_water[];
 
 static int parse_table(int *, int, void *, size_t *, void *, size_t,
@@ -255,6 +259,10 @@
 #endif
 	{KERN_S390_USER_DEBUG_LOGGING,"userprocess_debug",
 	 &sysctl_userprocess_debug,sizeof(int),0644,NULL,&proc_dointvec},
+#endif
+#ifdef CONFIG_MIPS_USERSPACE_ATOMIC_OPS
+	{KERN_MIPS_USERSPACE_ATOMIC_OPS, "mips_userspace_atomic_ops",
+	 &mips_userspace_atomic_ops, sizeof(int), 0444, NULL, &proc_dointvec},
 #endif
 	{0}
 };

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-09-16 16:40 Once again: test_and_set for CPUs w/o LL/SC Johannes Stezenbach
@ 2002-10-07 14:47 ` Johannes Stezenbach
  2002-10-07 16:21   ` Kevin D. Kissell
  0 siblings, 1 reply; 25+ messages in thread
From: Johannes Stezenbach @ 2002-10-07 14:47 UTC (permalink / raw)
  To: linux-mips

Hi all,

On Mon, Sep 16, 2002 at 06:40:34PM +0200, I wrote:
> 
> The NEC VR41xx CPU has no LL/SC instructions, so they must
> be emulated by the kernel, which slows down the test-and-set
> and compare-and-swap operations (used by linux-threads)
> considerably. For the VR41xx (and other CPUs which have
> branch-likely instructions), there exisits a workaround
> which enables userspace-only atomic operations, with minor
> help from the kernel: The kernel must guarantee that register
> k1 is not equal to some magic value after every transition
> to userspace.
> 
> Two things were left open in July:
> - find out the minimal amount of changes to the kernel
>   to guarantee k1 != MAGIC after eret
> - determine how to tell glibc to use the branch-likely
>   workaround instead of emulated LL/SC

Since there have been no follow-ups I must assume that
this topic is no longer of interest. Is this so? Or
is the way I approach it deemed inappropriate?


Regards,
Johannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-07 14:47 ` Johannes Stezenbach
@ 2002-10-07 16:21   ` Kevin D. Kissell
  2002-10-07 16:21     ` Kevin D. Kissell
                       ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Kevin D. Kissell @ 2002-10-07 16:21 UTC (permalink / raw)
  To: Johannes Stezenbach, linux-mips

> Hi all,
> 
> On Mon, Sep 16, 2002 at 06:40:34PM +0200, I wrote:
> > 
> > The NEC VR41xx CPU has no LL/SC instructions, so they must
> > be emulated by the kernel, which slows down the test-and-set
> > and compare-and-swap operations (used by linux-threads)
> > considerably. For the VR41xx (and other CPUs which have
> > branch-likely instructions), there exisits a workaround
> > which enables userspace-only atomic operations, with minor
> > help from the kernel: The kernel must guarantee that register
> > k1 is not equal to some magic value after every transition
> > to userspace.
> > 
> > Two things were left open in July:
> > - find out the minimal amount of changes to the kernel
> >   to guarantee k1 != MAGIC after eret
> > - determine how to tell glibc to use the branch-likely
> >   workaround instead of emulated LL/SC
> 
> Since there have been no follow-ups I must assume that
> this topic is no longer of interest. Is this so? Or
> is the way I approach it deemed inappropriate?

When I first proposed the branch-likely hack last winter,
I thought it might be worth while to do a through code
inspection to determine what set of values could never
be returned in k1 (or k0 for all I care) if an exception
was taken, such that there would be no mods to the
kernel required whatsoever.  I spent a little time going 
down that path, and it does look at first glance as if one 
could guarantee that one will never come out of an exception 
with k1 equal to 0xffdadaff in current oss/linux-mips cvs
sources, but the guys at Sony, who have a big interest in 
this technique, given that the PS2 has no LL/SC,
prefered a more conservative approach which explicitly
clobbered the selective register on all exceptions,
even if it meant some small performance impact.
That's probably going to be a more reliable design,
though I would still consider leaving the TLB refill handler
untouched and counting on the fact that k1 must contain
a non-lethal EntryLo value on return from the exception.

As for glibc, the possibilities are numerous and I'm not
the guy who'd have to make it work.

            Regards,

            Kevin K.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-07 16:21   ` Kevin D. Kissell
@ 2002-10-07 16:21     ` Kevin D. Kissell
  2002-10-07 18:43     ` Johannes Stezenbach
  2002-10-15 15:17     ` Maciej W. Rozycki
  2 siblings, 0 replies; 25+ messages in thread
From: Kevin D. Kissell @ 2002-10-07 16:21 UTC (permalink / raw)
  To: Johannes Stezenbach, linux-mips

> Hi all,
> 
> On Mon, Sep 16, 2002 at 06:40:34PM +0200, I wrote:
> > 
> > The NEC VR41xx CPU has no LL/SC instructions, so they must
> > be emulated by the kernel, which slows down the test-and-set
> > and compare-and-swap operations (used by linux-threads)
> > considerably. For the VR41xx (and other CPUs which have
> > branch-likely instructions), there exisits a workaround
> > which enables userspace-only atomic operations, with minor
> > help from the kernel: The kernel must guarantee that register
> > k1 is not equal to some magic value after every transition
> > to userspace.
> > 
> > Two things were left open in July:
> > - find out the minimal amount of changes to the kernel
> >   to guarantee k1 != MAGIC after eret
> > - determine how to tell glibc to use the branch-likely
> >   workaround instead of emulated LL/SC
> 
> Since there have been no follow-ups I must assume that
> this topic is no longer of interest. Is this so? Or
> is the way I approach it deemed inappropriate?

When I first proposed the branch-likely hack last winter,
I thought it might be worth while to do a through code
inspection to determine what set of values could never
be returned in k1 (or k0 for all I care) if an exception
was taken, such that there would be no mods to the
kernel required whatsoever.  I spent a little time going 
down that path, and it does look at first glance as if one 
could guarantee that one will never come out of an exception 
with k1 equal to 0xffdadaff in current oss/linux-mips cvs
sources, but the guys at Sony, who have a big interest in 
this technique, given that the PS2 has no LL/SC,
prefered a more conservative approach which explicitly
clobbered the selective register on all exceptions,
even if it meant some small performance impact.
That's probably going to be a more reliable design,
though I would still consider leaving the TLB refill handler
untouched and counting on the fact that k1 must contain
a non-lethal EntryLo value on return from the exception.

As for glibc, the possibilities are numerous and I'm not
the guy who'd have to make it work.

            Regards,

            Kevin K.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-07 16:21   ` Kevin D. Kissell
  2002-10-07 16:21     ` Kevin D. Kissell
@ 2002-10-07 18:43     ` Johannes Stezenbach
  2002-10-07 18:51       ` Daniel Jacobowitz
                         ` (2 more replies)
  2002-10-15 15:17     ` Maciej W. Rozycki
  2 siblings, 3 replies; 25+ messages in thread
From: Johannes Stezenbach @ 2002-10-07 18:43 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: linux-mips

On Mon, Oct 07, 2002 at 06:21:52PM +0200, Kevin D. Kissell wrote:
>
> When I first proposed the branch-likely hack last winter,
> I thought it might be worth while to do a through code
> inspection to determine what set of values could never
> be returned in k1 (or k0 for all I care) if an exception
> was taken, such that there would be no mods to the
> kernel required whatsoever.  I spent a little time going 
> down that path, and it does look at first glance as if one 
> could guarantee that one will never come out of an exception 
> with k1 equal to 0xffdadaff in current oss/linux-mips cvs
> sources, but the guys at Sony, who have a big interest in 
> this technique, given that the PS2 has no LL/SC,
> prefered a more conservative approach which explicitly
> clobbered the selective register on all exceptions,
> even if it meant some small performance impact.
> That's probably going to be a more reliable design,
> though I would still consider leaving the TLB refill handler
> untouched and counting on the fact that k1 must contain
> a non-lethal EntryLo value on return from the exception.

In my original posting from Mon, Sep 16, 2002 (maybe I should
have reposted it in full?), a had appended a patch which
leaves the TLB handlers alone (k1 always ends up with an EntryLo value,
thus bit 31 is guaranteed to be 0), but explicitly sets k1 to zero in
RESTORE_SP_AND_RET.

> As for glibc, the possibilities are numerous and I'm not
> the guy who'd have to make it work.

The question is how the glibc can detect if
a) the CPU does not have LL/SC
b) the kernel guarantees k1 != MAGIC

I think the kernel should announce this explicitly.
In my patch from Sep 16 I used a sysctl, but that's
probably bad because glibc would rely on a real new
include/linux/sysctl.h.

I need some advice on this (maybe add a line to /proc/cpuinfo,
or create a new /proc entry? or add a sysmips call to get
this info?).

I also want to know if there's public interest to get such
a change in the kernel. If yes, I will try to get a matching
patch into glibc. If no, I will just post the current patch I
use to the list for the hackers to pick up.


Regards,
Johannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-07 18:43     ` Johannes Stezenbach
@ 2002-10-07 18:51       ` Daniel Jacobowitz
  2002-10-15 17:52         ` Johannes Stezenbach
  2002-10-08  7:38       ` Kevin D. Kissell
  2002-10-15 15:36       ` Maciej W. Rozycki
  2 siblings, 1 reply; 25+ messages in thread
From: Daniel Jacobowitz @ 2002-10-07 18:51 UTC (permalink / raw)
  To: Johannes Stezenbach, Kevin D. Kissell, linux-mips

On Mon, Oct 07, 2002 at 08:43:44PM +0200, Johannes Stezenbach wrote:
> The question is how the glibc can detect if
> a) the CPU does not have LL/SC
> b) the kernel guarantees k1 != MAGIC
> 
> I think the kernel should announce this explicitly.
> In my patch from Sep 16 I used a sysctl, but that's
> probably bad because glibc would rely on a real new
> include/linux/sysctl.h.
> 
> I need some advice on this (maybe add a line to /proc/cpuinfo,
> or create a new /proc entry? or add a sysmips call to get
> this info?).

You should be using an "aux vector"; see how PowerPC provides current
glibc with the size of a cache line.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-07 18:43     ` Johannes Stezenbach
  2002-10-07 18:51       ` Daniel Jacobowitz
@ 2002-10-08  7:38       ` Kevin D. Kissell
  2002-10-08  7:38         ` Kevin D. Kissell
  2002-10-15 15:36       ` Maciej W. Rozycki
  2 siblings, 1 reply; 25+ messages in thread
From: Kevin D. Kissell @ 2002-10-08  7:38 UTC (permalink / raw)
  To: Johannes Stezenbach; +Cc: linux-mips

> On Mon, Oct 07, 2002 at 06:21:52PM +0200, Kevin D. Kissell wrote:
> >
> > When I first proposed the branch-likely hack last winter,
> > I thought it might be worth while to do a through code
> > inspection to determine what set of values could never
> > be returned in k1 (or k0 for all I care) if an exception
> > was taken, such that there would be no mods to the
> > kernel required whatsoever.  I spent a little time going 
> > down that path, and it does look at first glance as if one 
> > could guarantee that one will never come out of an exception 
> > with k1 equal to 0xffdadaff in current oss/linux-mips cvs
> > sources, but the guys at Sony, who have a big interest in 
> > this technique, given that the PS2 has no LL/SC,
> > prefered a more conservative approach which explicitly
> > clobbered the selective register on all exceptions,
> > even if it meant some small performance impact.
> > That's probably going to be a more reliable design,
> > though I would still consider leaving the TLB refill handler
> > untouched and counting on the fact that k1 must contain
> > a non-lethal EntryLo value on return from the exception.
> 
> In my original posting from Mon, Sep 16, 2002 (maybe I should
> have reposted it in full?), a had appended a patch which
> leaves the TLB handlers alone (k1 always ends up with an EntryLo value,
> thus bit 31 is guaranteed to be 0), but explicitly sets k1 to zero in
> RESTORE_SP_AND_RET.

Note that you've still got a handful of erets that aren't
generated by RESTORE_SP_AND_RET that we
would need to "hunt down and kill" if we were serious
about nailing all non-TLB-miss cases explicitly.
(gdb-stub.S, head.S).

> > As for glibc, the possibilities are numerous and I'm not
> > the guy who'd have to make it work.
> 
> The question is how the glibc can detect if
> a) the CPU does not have LL/SC
> b) the kernel guarantees k1 != MAGIC
> 
> I think the kernel should announce this explicitly.
> In my patch from Sep 16 I used a sysctl, but that's
> probably bad because glibc would rely on a real new
> include/linux/sysctl.h.
> 
> I need some advice on this (maybe add a line to /proc/cpuinfo,
> or create a new /proc entry? or add a sysmips call to get
> this info?).

/proc/cpuinfo strikes me as being a better mechanism
than creating a new system call variant, but again, I
would defer to a large measure to the glibc maintainers.
If we do enhance cpuinfo, I note that there are some
other parameters that seem to be in the queue to go
into that data structure, and that they should presumably
be swept up in any such revision.  Is there a reason why
/proc/cpuinfo information doesn't have a version field
to let users know what level of information they are getting,
by the way?

> I also want to know if there's public interest to get such
> a change in the kernel. If yes, I will try to get a matching
> patch into glibc. If no, I will just post the current patch I
> use to the list for the hackers to pick up.

There is certainly interest at MIPS in seeing this mod 
propagated. I'd really like to see the Playstation 2 running 
something a lot closer to manistream MIPS/Linux, which 
isn't going to happen so long as the only choices on offer 
for synchronization on LL/SC-less CPUs are system calls 
and OS emulation of LL/SC.  People running Linux 
on Vr41xx-based PDAs would get a performance benefit 
as well.  Keep in mind that the majority of developers on
this list have either CPUs that support LL/SC, or really
old systems that have neither LL/SC nor branch likely,
which may account for a lack of enthusiasm for the project,
particuarly coming as it does not too long after a long 
debate on system calls versus emulation.

            Regards,

            Kevin K. 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-08  7:38       ` Kevin D. Kissell
@ 2002-10-08  7:38         ` Kevin D. Kissell
  0 siblings, 0 replies; 25+ messages in thread
From: Kevin D. Kissell @ 2002-10-08  7:38 UTC (permalink / raw)
  To: Johannes Stezenbach; +Cc: linux-mips

> On Mon, Oct 07, 2002 at 06:21:52PM +0200, Kevin D. Kissell wrote:
> >
> > When I first proposed the branch-likely hack last winter,
> > I thought it might be worth while to do a through code
> > inspection to determine what set of values could never
> > be returned in k1 (or k0 for all I care) if an exception
> > was taken, such that there would be no mods to the
> > kernel required whatsoever.  I spent a little time going 
> > down that path, and it does look at first glance as if one 
> > could guarantee that one will never come out of an exception 
> > with k1 equal to 0xffdadaff in current oss/linux-mips cvs
> > sources, but the guys at Sony, who have a big interest in 
> > this technique, given that the PS2 has no LL/SC,
> > prefered a more conservative approach which explicitly
> > clobbered the selective register on all exceptions,
> > even if it meant some small performance impact.
> > That's probably going to be a more reliable design,
> > though I would still consider leaving the TLB refill handler
> > untouched and counting on the fact that k1 must contain
> > a non-lethal EntryLo value on return from the exception.
> 
> In my original posting from Mon, Sep 16, 2002 (maybe I should
> have reposted it in full?), a had appended a patch which
> leaves the TLB handlers alone (k1 always ends up with an EntryLo value,
> thus bit 31 is guaranteed to be 0), but explicitly sets k1 to zero in
> RESTORE_SP_AND_RET.

Note that you've still got a handful of erets that aren't
generated by RESTORE_SP_AND_RET that we
would need to "hunt down and kill" if we were serious
about nailing all non-TLB-miss cases explicitly.
(gdb-stub.S, head.S).

> > As for glibc, the possibilities are numerous and I'm not
> > the guy who'd have to make it work.
> 
> The question is how the glibc can detect if
> a) the CPU does not have LL/SC
> b) the kernel guarantees k1 != MAGIC
> 
> I think the kernel should announce this explicitly.
> In my patch from Sep 16 I used a sysctl, but that's
> probably bad because glibc would rely on a real new
> include/linux/sysctl.h.
> 
> I need some advice on this (maybe add a line to /proc/cpuinfo,
> or create a new /proc entry? or add a sysmips call to get
> this info?).

/proc/cpuinfo strikes me as being a better mechanism
than creating a new system call variant, but again, I
would defer to a large measure to the glibc maintainers.
If we do enhance cpuinfo, I note that there are some
other parameters that seem to be in the queue to go
into that data structure, and that they should presumably
be swept up in any such revision.  Is there a reason why
/proc/cpuinfo information doesn't have a version field
to let users know what level of information they are getting,
by the way?

> I also want to know if there's public interest to get such
> a change in the kernel. If yes, I will try to get a matching
> patch into glibc. If no, I will just post the current patch I
> use to the list for the hackers to pick up.

There is certainly interest at MIPS in seeing this mod 
propagated. I'd really like to see the Playstation 2 running 
something a lot closer to manistream MIPS/Linux, which 
isn't going to happen so long as the only choices on offer 
for synchronization on LL/SC-less CPUs are system calls 
and OS emulation of LL/SC.  People running Linux 
on Vr41xx-based PDAs would get a performance benefit 
as well.  Keep in mind that the majority of developers on
this list have either CPUs that support LL/SC, or really
old systems that have neither LL/SC nor branch likely,
which may account for a lack of enthusiasm for the project,
particuarly coming as it does not too long after a long 
debate on system calls versus emulation.

            Regards,

            Kevin K. 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-07 16:21   ` Kevin D. Kissell
  2002-10-07 16:21     ` Kevin D. Kissell
  2002-10-07 18:43     ` Johannes Stezenbach
@ 2002-10-15 15:17     ` Maciej W. Rozycki
  2002-10-15 16:50       ` Johannes Stezenbach
  2 siblings, 1 reply; 25+ messages in thread
From: Maciej W. Rozycki @ 2002-10-15 15:17 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: Johannes Stezenbach, linux-mips

On Mon, 7 Oct 2002, Kevin D. Kissell wrote:

> That's probably going to be a more reliable design,
> though I would still consider leaving the TLB refill handler
> untouched and counting on the fact that k1 must contain
> a non-lethal EntryLo value on return from the exception.

 Well, there is a "nop" just before the "eret" in all R4k-style TLB
exception handlers.  I see no problem to use the slot for explicit
clobbering of k0 or k1 with a single instruction like "li" or "lui". 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-07 18:43     ` Johannes Stezenbach
  2002-10-07 18:51       ` Daniel Jacobowitz
  2002-10-08  7:38       ` Kevin D. Kissell
@ 2002-10-15 15:36       ` Maciej W. Rozycki
  2002-10-15 17:21         ` Johannes Stezenbach
  2002-10-16 18:11         ` Johannes Stezenbach
  2 siblings, 2 replies; 25+ messages in thread
From: Maciej W. Rozycki @ 2002-10-15 15:36 UTC (permalink / raw)
  To: Johannes Stezenbach; +Cc: Kevin D. Kissell, linux-mips

On Mon, 7 Oct 2002, Johannes Stezenbach wrote:

> The question is how the glibc can detect if
> a) the CPU does not have LL/SC
> b) the kernel guarantees k1 != MAGIC

 Well, since the relevant code will mostly be inlined, you don't really
need either as you can't select an alternative anyway.  The relevant
variant will be selected at the build time as it's already being done for
the ll/sc and sysmips() variants.  You may consider marking binaries as
using your approach so that the kernel refuses to run them if unsupported,
but for MIPS it isn't performed for any functionality so far, so you'd
have to study how other ports do that and which way is most suitable. 

> I also want to know if there's public interest to get such
> a change in the kernel. If yes, I will try to get a matching
> patch into glibc. If no, I will just post the current patch I
> use to the list for the hackers to pick up.

 Well, the kernel changes should be trivial, with no performance impact if
written carefully, so they might get included even if only a few people
are interested.  Send a proposal.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-15 15:17     ` Maciej W. Rozycki
@ 2002-10-15 16:50       ` Johannes Stezenbach
  0 siblings, 0 replies; 25+ messages in thread
From: Johannes Stezenbach @ 2002-10-15 16:50 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Kevin D. Kissell, linux-mips

On Tue, Oct 15, 2002 at 05:17:24PM +0200, Maciej W. Rozycki wrote:
> On Mon, 7 Oct 2002, Kevin D. Kissell wrote:
> 
> > That's probably going to be a more reliable design,
> > though I would still consider leaving the TLB refill handler
> > untouched and counting on the fact that k1 must contain
> > a non-lethal EntryLo value on return from the exception.
> 
>  Well, there is a "nop" just before the "eret" in all R4k-style TLB
> exception handlers.  I see no problem to use the slot for explicit
> clobbering of k0 or k1 with a single instruction like "li" or "lui". 

Now that you say it it's pretty obvious...

Thanks,
Johannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-15 15:36       ` Maciej W. Rozycki
@ 2002-10-15 17:21         ` Johannes Stezenbach
  2002-10-16 12:20           ` Maciej W. Rozycki
  2002-10-16 18:11         ` Johannes Stezenbach
  1 sibling, 1 reply; 25+ messages in thread
From: Johannes Stezenbach @ 2002-10-15 17:21 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Kevin D. Kissell, linux-mips

On Tue, Oct 15, 2002 at 05:36:29PM +0200, Maciej W. Rozycki wrote:
> On Mon, 7 Oct 2002, Johannes Stezenbach wrote:
> 
> > The question is how the glibc can detect if
> > a) the CPU does not have LL/SC
> > b) the kernel guarantees k1 != MAGIC
> 
>  Well, since the relevant code will mostly be inlined, you don't really
> need either as you can't select an alternative anyway.  The relevant
> variant will be selected at the build time as it's already being done for
> the ll/sc and sysmips() variants.  You may consider marking binaries as
> using your approach so that the kernel refuses to run them if unsupported,
> but for MIPS it isn't performed for any functionality so far, so you'd
> have to study how other ports do that and which way is most suitable. 

Well, in the (experimental) glibc-patch I posted here on
Fri, 19 Jul 2002 14:38:29 +0200 (Subject: LL/SC benchmarking),
I had some code that lets one chose the implementation
(sysmips vs. LL/SC vs. beql_k1) at run-time, based on the
existence of some "signaling" files. This was used for benchmarking.

The ability to choose the implementation at run time sacrifices
inlining, but has obvious performance benefits for the VR41XX-like
platforms. It's also not a special MIPS thing,
e.g. linuxthreads/sysdeps/<platform>/pt-machine.h has the
HAS_COMPARE_AND_SWAP / TEST_FOR_COMPARE_AND_SWAP hooks,
used by e.g. i386.

But all that is of interest only, if VR41XX-like platforms
would use a glibc from a binary distribution like RedHat or
Debian (I use Debian for development, but have a custom
compiled glibc for production use).
But it seems that this isn't the case?

> > I also want to know if there's public interest to get such
> > a change in the kernel. If yes, I will try to get a matching
> > patch into glibc. If no, I will just post the current patch I
> > use to the list for the hackers to pick up.
> 
>  Well, the kernel changes should be trivial, with no performance impact if
> written carefully, so they might get included even if only a few people
> are interested.  Send a proposal.

Yes, the kernel changes are not difficult. The difficulty was
to find out the minimal necessary changes.


Regards,
Johannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-07 18:51       ` Daniel Jacobowitz
@ 2002-10-15 17:52         ` Johannes Stezenbach
  0 siblings, 0 replies; 25+ messages in thread
From: Johannes Stezenbach @ 2002-10-15 17:52 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: Kevin D. Kissell, linux-mips

On Mon, Oct 07, 2002 at 02:51:36PM -0400, Daniel Jacobowitz wrote:
> On Mon, Oct 07, 2002 at 08:43:44PM +0200, Johannes Stezenbach wrote:
> > The question is how the glibc can detect if
> > a) the CPU does not have LL/SC
> > b) the kernel guarantees k1 != MAGIC
> 
> You should be using an "aux vector"; see how PowerPC provides current
> glibc with the size of a cache line.

It took me a while to figure out how aux vectors work.

It seems to me that MIPS does not use the hardware capabilities
field of the aux vector at all. TO use it, one would
have to

- add a field to struct cpuinfo_mips in include/asm-mips/processor.h,
  and set it in arch/mips/kernel/setup.c after CPU probing;
- define ELF_HWCAP in include/asm-mips/elf.h to return
  something useful from the new cpuinfo_mips field

Or, to just use it to signal "use beql k1, MAGIC instead of LL/SC",
one could just define ELF_HWCAP dependent on kernel-Config.

Anyway, we would have to document the meaning of the HWCAP bits
as part of the kernel ABI.

i386 uses this, as you can see by running e.g.
  $ LD_SHOW_AUXV=1 ls
provided you have glibc-2.2.5.

glibc/sysdeps/generic/dl-sysdep.c then reads it into dl_hwcap,
and it's up to platform specific code to use it.


Regards,
Johannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-15 17:21         ` Johannes Stezenbach
@ 2002-10-16 12:20           ` Maciej W. Rozycki
  2002-10-16 12:52             ` Johannes Stezenbach
  0 siblings, 1 reply; 25+ messages in thread
From: Maciej W. Rozycki @ 2002-10-16 12:20 UTC (permalink / raw)
  To: Johannes Stezenbach; +Cc: Kevin D. Kissell, linux-mips

On Tue, 15 Oct 2002, Johannes Stezenbach wrote:

> The ability to choose the implementation at run time sacrifices
> inlining, but has obvious performance benefits for the VR41XX-like
> platforms. It's also not a special MIPS thing,
> e.g. linuxthreads/sysdeps/<platform>/pt-machine.h has the
> HAS_COMPARE_AND_SWAP / TEST_FOR_COMPARE_AND_SWAP hooks,
> used by e.g. i386.

 It also introduces an indirect call (jump?) overhead.  Anyway, you don't
need to sacrifice anything.  We may simply assume the universally
compatible way is the R3k one (be it sysmips() or whatever, if it gets
replaced).  Then there is the branch-likely way, which requires
branch-likely support (thus excludes R3k-class processors).  Then there is
the ll/sc way, which requires ll/sc (thus excludes R3k-class processors
and ones that lack the ll/sc instructions).  And you select the minimum
set of features required at the build time. 

> But all that is of interest only, if VR41XX-like platforms
> would use a glibc from a binary distribution like RedHat or
> Debian (I use Debian for development, but have a custom
> compiled glibc for production use).

 I wouldn't care of distributions -- if one really needs optimized
binaries it may make them be build somehow (either by doing the task
oneself or by convincing someone else).

> Yes, the kernel changes are not difficult. The difficulty was
> to find out the minimal necessary changes.

 You need to spot all kernel exit points.  Until we have R6k support it
means "eret" instructions. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-16 12:20           ` Maciej W. Rozycki
@ 2002-10-16 12:52             ` Johannes Stezenbach
  2002-10-16 16:30               ` Johannes Stezenbach
  0 siblings, 1 reply; 25+ messages in thread
From: Johannes Stezenbach @ 2002-10-16 12:52 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Kevin D. Kissell, linux-mips

On Wed, Oct 16, 2002 at 02:20:42PM +0200, Maciej W. Rozycki wrote:
> 
>  It also introduces an indirect call (jump?) overhead.  Anyway, you don't
> need to sacrifice anything.  We may simply assume the universally
> compatible way is the R3k one (be it sysmips() or whatever, if it gets
> replaced).  Then there is the branch-likely way, which requires
> branch-likely support (thus excludes R3k-class processors).  Then there is
> the ll/sc way, which requires ll/sc (thus excludes R3k-class processors
> and ones that lack the ll/sc instructions).  And you select the minimum
> set of features required at the build time. 

sysmips is history with current glibc since the Linux kernel emulates
LL/SC for CPUs that don't have it. This emulation is actually faster than
sysmips. (You'd think it's slower because it's one syscall vs. two
emulated instructions. But with LL/SC glibc can use test-and-set
which enables a more efficient linux-threads mutex implementation.)

AFAIK, current Linux distributions based on glibc-2.2.5 were built for
R3K be default and thus used sysmips even on platforms which have
LL/SC.

> > But all that is of interest only, if VR41XX-like platforms
> > would use a glibc from a binary distribution like RedHat or
> > Debian (I use Debian for development, but have a custom
> > compiled glibc for production use).
> 
>  I wouldn't care of distributions -- if one really needs optimized
> binaries it may make them be build somehow (either by doing the task
> oneself or by convincing someone else).

OK, that simplifies the issue. I will prepare a patches for
Linux and glibc.


Regards,
Johannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-16 12:52             ` Johannes Stezenbach
@ 2002-10-16 16:30               ` Johannes Stezenbach
  2002-10-17  9:47                 ` Gleb O. Raiko
  0 siblings, 1 reply; 25+ messages in thread
From: Johannes Stezenbach @ 2002-10-16 16:30 UTC (permalink / raw)
  To: Maciej W. Rozycki, Kevin D. Kissell, linux-mips

I wrote:

> sysmips is history with current glibc since the Linux kernel emulates
> LL/SC for CPUs that don't have it. This emulation is actually faster than
> sysmips. (You'd think it's slower because it's one syscall vs. two
> emulated instructions. But with LL/SC glibc can use test-and-set
                                                      ^^^^^^^^^^^^
> which enables a more efficient linux-threads mutex implementation.)

Oops, I meant compare-and-swap.


Johannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-15 15:36       ` Maciej W. Rozycki
  2002-10-15 17:21         ` Johannes Stezenbach
@ 2002-10-16 18:11         ` Johannes Stezenbach
  2002-10-16 18:23           ` Johannes Stezenbach
  2002-10-17 11:57           ` Maciej W. Rozycki
  1 sibling, 2 replies; 25+ messages in thread
From: Johannes Stezenbach @ 2002-10-16 18:11 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Kevin D. Kissell, linux-mips

On Tue, Oct 15, 2002 at 05:36:29PM +0200, Maciej W. Rozycki wrote:
>  Well, the kernel changes should be trivial, with no performance impact if
> written carefully, so they might get included even if only a few people
> are interested.  Send a proposal.

Here's patch for the kernel. Tested on a VR41XX, but my glibc
patch needs some cleanup and so will be posted seperately.

I thought "explicit is better than implicit" and thus added
many small changes depending on CONFIG_CPU_USERSPACE_LLSC_EMUL
before every eret.

The changes in tlbex-r4k.S are not stricly necessary, since
in current code k1 always ends up with a CP0_ENTRYLO value
with has bit31 == 0, which is sufficient for the glibc-patch.
Also, the 'move k1,zero' does not add any overhead and thus
could be done unconditionally.
But i thought that adding the #ifdef CONFIG_CPU_USERSPACE_LLSC_EMUL
prevents possible future changes from accidentally breaking this.

The patch is only for the VR41XX. I'm not shure what other CPUs
fall into the same category. If I read binutils/opcodes/mips-opc.c
correctly, then the TX39XX, while not being ISA2, has beql.

Please tell me if the patch is acceptable.

Possible options:
- don't mess with tlbex-r4k.S
- or unconditonally replace the 'nop's before 'eret's in tlbex-r4k.S with
  'move k1,zero' plus a comment
- drop the CONFIG_CPU_USERSPACE_LLSC_EMUL configuration option and
  always clear k1 in RESTORE_SP_AND_RET for the VR41XX


Regards,
Johannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-16 18:11         ` Johannes Stezenbach
@ 2002-10-16 18:23           ` Johannes Stezenbach
  2002-10-17 11:57           ` Maciej W. Rozycki
  1 sibling, 0 replies; 25+ messages in thread
From: Johannes Stezenbach @ 2002-10-16 18:23 UTC (permalink / raw)
  To: Maciej W. Rozycki, Kevin D. Kissell, linux-mips

[-- Attachment #1: Type: text/plain, Size: 133 bytes --]

On Wed, Oct 16, 2002 at 08:11:35PM +0200, Johannes Stezenbach wrote:
> Here's patch for the kernel.

Now with the patch...

Johannes

[-- Attachment #2: linux-mips-nollsc.patch --]
[-- Type: text/plain, Size: 5751 bytes --]

Index: Documentation/Configure.help
===================================================================
RCS file: /home/cvs/linux/Documentation/Attic/Configure.help,v
retrieving revision 1.109.2.9
diff -u -r1.109.2.9 Configure.help
--- Documentation/Configure.help	3 Oct 2002 01:27:58 -0000	1.109.2.9
+++ Documentation/Configure.help	16 Oct 2002 17:44:12 -0000
@@ -2349,6 +2349,20 @@
   for better performance, N if you don't know.  You must say Y here
   for multiprocessor machines.
 
+Support userspace ll/sc emulation
+CONFIG_CPU_USERSPACE_LLSC_EMUL
+  Say Y here if your CPU does not have the Load Linked (ll)
+  and Store Conditional (sc) instructions, but supports
+  the Branch Likely instructions, e.g. the NEC VR41xx CPUs.
+  Then the kernel guarantees that the k1 register is 0 after
+  any transition from kernel to userspace, which enables an
+  optimized system library to implement a userspace-only
+  ll/sc emulation.
+
+  If you don't run multithreaded software or don't have a
+  matching libpthread, you don't need it.
+  If in doubt, say N.
+
 lld and scd instructions available
 CONFIG_CPU_HAS_LLDSCD
   Say Y here if your CPU has the lld and scd instructions, the 64-bit
Index: arch/mips/config-shared.in
===================================================================
RCS file: /home/cvs/linux/arch/mips/config-shared.in,v
retrieving revision 1.1.2.23
diff -u -r1.1.2.23 config-shared.in
--- arch/mips/config-shared.in	6 Oct 2002 12:28:03 -0000	1.1.2.23
+++ arch/mips/config-shared.in	16 Oct 2002 17:44:12 -0000
@@ -547,6 +547,11 @@
 dep_bool 'Override CPU Options' CONFIG_CPU_ADVANCED $CONFIG_MIPS32
 if [ "$CONFIG_CPU_ADVANCED" = "y" ]; then
    bool '  ll/sc Instructions available' CONFIG_CPU_HAS_LLSC
+   if [ "$CONFIG_CPU_HAS_LLSC" = "n" ]; then
+      bool '    Support userspace atomic ops for MIPS2 CPUs' CONFIG_CPU_USERSPACE_LLSC_EMUL
+   else
+      define_bool CONFIG_CPU_USERSPACE_LLSC_EMUL n
+   fi
    bool '  lld/scd Instructions available' CONFIG_CPU_HAS_LLDSCD
    bool '  Writeback Buffer available' CONFIG_CPU_HAS_WB
 else
@@ -562,6 +567,11 @@
 	 define_bool CONFIG_CPU_HAS_LLDSCD n
 	 define_bool CONFIG_CPU_HAS_WB n
       fi
+      if [ "$CONFIG_CPU_VR41XX" = "y" ]; then
+         define_bool CONFIG_CPU_USERSPACE_LLSC_EMUL y
+      else
+         define_bool CONFIG_CPU_USERSPACE_LLSC_EMUL n
+      fi
    else
       if [ "$CONFIG_CPU_MIPS32" = "y" ]; then
 	 define_bool CONFIG_CPU_HAS_LLSC y
@@ -572,6 +582,7 @@
 	 define_bool CONFIG_CPU_HAS_LLDSCD y
 	 define_bool CONFIG_CPU_HAS_WB n
       fi
+      define_bool CONFIG_CPU_USERSPACE_LLSC_EMUL n
    fi
 fi
 if [ "$CONFIG_CPU_R3000" = "y" ]; then
Index: arch/mips/mm/tlbex-r4k.S
===================================================================
RCS file: /home/cvs/linux/arch/mips/mm/tlbex-r4k.S,v
retrieving revision 1.2.2.10
diff -u -r1.2.2.10 tlbex-r4k.S
--- arch/mips/mm/tlbex-r4k.S	2 Oct 2002 19:42:04 -0000	1.2.2.10
+++ arch/mips/mm/tlbex-r4k.S	16 Oct 2002 17:44:12 -0000
@@ -182,7 +182,11 @@
 	b	1f
 	tlbwr					# write random tlb entry
 1:
+#ifdef CONFIG_CPU_USERSPACE_LLSC_EMUL
+	move k1, zero
+#else
 	nop
+#endif
 	eret					# return from trap
 	END(except_vec0_r4000)
 
@@ -207,7 +211,11 @@
 	P_MTC0	k1, CP0_ENTRYLO1
 	nop
 	tlbwr
+#ifdef CONFIG_CPU_USERSPACE_LLSC_EMUL
+	move k1, zero
+#else
 	nop
+#endif
 	eret
 	END(except_vec0_r4600)
 
@@ -244,7 +252,11 @@
 	nop					# QED specified nops
 	nop
 	tlbwr					# write random tlb entry
+#ifdef CONFIG_CPU_USERSPACE_LLSC_EMUL
+	move k1, zero
+#else
 	nop					# traditional nop
+#endif
 	eret					# return from trap
 	END(except_vec0_nevada)
 
@@ -276,7 +288,12 @@
 	P_MTC0	k1, CP0_ENTRYLO1		# load it
 	b	1f
 	tlbwr					# write random tlb entry
-1:	nop
+1:
+#ifdef CONFIG_CPU_USERSPACE_LLSC_EMUL
+	move k1, zero
+#else
+	nop
+#endif
 2:	eret					# return from trap
 	END(except_vec0_sb1_m3)
 #endif /* BCM1250_M3_WAR */
@@ -308,7 +325,11 @@
 	bltzl	k0, 1f
 	tlbwr
 1:
+#ifdef CONFIG_CPU_USERSPACE_LLSC_EMUL
+	move k1, zero
+#else
 	nop
+#endif
 	eret
 	END(except_vec0_r45k_bvahwbug)
 
@@ -340,7 +361,11 @@
 	bltzl	k0, 1f
 	tlbwr
 1:
+#ifdef CONFIG_CPU_USERSPACE_LLSC_EMUL
+	move k1, zero
+#else
 	nop
+#endif
 	eret
 	END(except_vec0_r4k_mphwbug)
 #endif
@@ -371,7 +396,11 @@
 	b	1f
 	tlbwr
 1:
+#ifdef CONFIG_CPU_USERSPACE_LLSC_EMUL
+	move k1, zero
+#else
 	nop
+#endif
 	eret
 	END(except_vec0_r4k_250MHZhwbug)
 
@@ -405,7 +434,11 @@
 	bltzl	k0, 1f
 	tlbwr
 1:
+#ifdef CONFIG_CPU_USERSPACE_LLSC_EMUL
+	move k1, zero
+#else
 	nop
+#endif
 	eret
 	END(except_vec0_r4k_MP250MHZhwbug)
 #endif
@@ -456,7 +489,11 @@
 	b	1f
 	 tlbwi
 1:
+#ifdef CONFIG_CPU_USERSPACE_LLSC_EMUL
+	move k1, zero
+#else
 	nop
+#endif
 	.set	mips3
 	eret
 	.set	mips0
@@ -482,7 +519,11 @@
 	b	1f
 	 tlbwi
 1:
+#ifdef CONFIG_CPU_USERSPACE_LLSC_EMUL
+	move k1, zero
+#else
 	nop
+#endif
 	.set	mips3
 	eret
 	.set	mips0
@@ -513,7 +554,11 @@
 	b	1f
 	 tlbwi
 1:
+#ifdef CONFIG_CPU_USERSPACE_LLSC_EMUL
+	move k1, zero
+#else
 	nop
+#endif
 	.set	mips3
 	eret
 	.set	mips0
Index: include/asm-mips/stackframe.h
===================================================================
RCS file: /home/cvs/linux/include/asm-mips/stackframe.h,v
retrieving revision 1.18.2.2
diff -u -r1.18.2.2 stackframe.h
--- include/asm-mips/stackframe.h	5 Aug 2002 23:53:37 -0000	1.18.2.2
+++ include/asm-mips/stackframe.h	16 Oct 2002 17:44:13 -0000
@@ -201,8 +201,15 @@
 		lw	$3,  PT_R3(sp);                  \
 		lw	$2,  PT_R2(sp)
 
+#ifdef CONFIG_CPU_USERSPACE_LLSC_EMUL
+#define CLEAR_K1 move k1,$0;
+#else
+#define CLEAR_K1
+#endif
+
 #define RESTORE_SP_AND_RET                               \
 		lw	sp,  PT_R29(sp);                 \
+		CLEAR_K1                                 \
 		.set	mips3;				 \
 		eret;					 \
 		.set	mips0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-16 16:30               ` Johannes Stezenbach
@ 2002-10-17  9:47                 ` Gleb O. Raiko
  2002-10-17 12:02                   ` Maciej W. Rozycki
  0 siblings, 1 reply; 25+ messages in thread
From: Gleb O. Raiko @ 2002-10-17  9:47 UTC (permalink / raw)
  To: Johannes Stezenbach; +Cc: Maciej W. Rozycki, Kevin D. Kissell, linux-mips

Johannes Stezenbach wrote:
> 
> I wrote:
> 
> > sysmips is history with current glibc since the Linux kernel emulates
> > LL/SC for CPUs that don't have it. This emulation is actually faster than
> > sysmips. (You'd think it's slower because it's one syscall vs. two
> > emulated instructions. But with LL/SC glibc can use test-and-set
>                                                       ^^^^^^^^^^^^
> > which enables a more efficient linux-threads mutex implementation.)
> 
> Oops, I meant compare-and-swap.

Implement new sysmips then.

Regards,
Gleb.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-16 18:11         ` Johannes Stezenbach
  2002-10-16 18:23           ` Johannes Stezenbach
@ 2002-10-17 11:57           ` Maciej W. Rozycki
  2002-10-17 13:25             ` Johannes Stezenbach
  1 sibling, 1 reply; 25+ messages in thread
From: Maciej W. Rozycki @ 2002-10-17 11:57 UTC (permalink / raw)
  To: Johannes Stezenbach; +Cc: Kevin D. Kissell, linux-mips

On Wed, 16 Oct 2002, Johannes Stezenbach wrote:

> The patch is only for the VR41XX. I'm not shure what other CPUs
> fall into the same category. If I read binutils/opcodes/mips-opc.c
> correctly, then the TX39XX, while not being ISA2, has beql.

 I think I have TX39XX docs somewhere -- I may check if that's true.

> Please tell me if the patch is acceptable.
> 
> Possible options:
> - don't mess with tlbex-r4k.S
> - or unconditonally replace the 'nop's before 'eret's in tlbex-r4k.S with
>   'move k1,zero' plus a comment

 I'd go for that, so that VR41XX user binaries work fine on real MIPS II+
processors as well.  There is no performance nor space impact for
tlbex-r4k.S and for stackframe.h the single-instruction impact is not
critical, or I believe there is a single free slot in RESTORE_SOME that
may be reused (after a bit of restructuring to make sure
RESTORE_SP_AND_RET isn't used alone). 

> - drop the CONFIG_CPU_USERSPACE_LLSC_EMUL configuration option and
>   always clear k1 in RESTORE_SP_AND_RET for the VR41XX

 And this one as well.  There is no need for a separate config option --
lone comments in place should suffice.

 But you may ask Ralf before making further changes as he is the one to
decide if the patch goes in. 

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-17  9:47                 ` Gleb O. Raiko
@ 2002-10-17 12:02                   ` Maciej W. Rozycki
  2002-10-17 13:11                     ` Johannes Stezenbach
  0 siblings, 1 reply; 25+ messages in thread
From: Maciej W. Rozycki @ 2002-10-17 12:02 UTC (permalink / raw)
  To: Gleb O. Raiko; +Cc: Johannes Stezenbach, Kevin D. Kissell, linux-mips

On Thu, 17 Oct 2002, Gleb O. Raiko wrote:

> Implement new sysmips then.

 I'm not sure if that's a good idea.  Glibc alone uses test_and_set(),
exchange_and_add(), atomic_add() and compare_and_swap().  Do you want a
separate syscall for each of these functions?  I think the ll/sc emulation
may be the best solution after all.  At least it's most flexible and not
much slower if at all.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-17 12:02                   ` Maciej W. Rozycki
@ 2002-10-17 13:11                     ` Johannes Stezenbach
  2002-10-17 13:32                       ` Gleb O. Raiko
  0 siblings, 1 reply; 25+ messages in thread
From: Johannes Stezenbach @ 2002-10-17 13:11 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Gleb O. Raiko, Kevin D. Kissell, linux-mips

On Thu, Oct 17, 2002 at 02:02:35PM +0200, Maciej W. Rozycki wrote:
> On Thu, 17 Oct 2002, Gleb O. Raiko wrote:
> 
> > Implement new sysmips then.
> 
>  I'm not sure if that's a good idea.  Glibc alone uses test_and_set(),
> exchange_and_add(), atomic_add() and compare_and_swap().  Do you want a
> separate syscall for each of these functions?  I think the ll/sc emulation
> may be the best solution after all.  At least it's most flexible and not
> much slower if at all.

Depends on your usage pattern. E.g. we don't run software that uses
atomicity.h (i.e. no C++ code), but heavily use pthread_mutex_lock() etc.
The few uses of atomicity.h internal to glibc don't warrant
any optimizations. So, if the beql-Method would not exist, I would
consider implementing a new sysmips for compare_and_swap().


Regards,
Johannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-17 11:57           ` Maciej W. Rozycki
@ 2002-10-17 13:25             ` Johannes Stezenbach
  0 siblings, 0 replies; 25+ messages in thread
From: Johannes Stezenbach @ 2002-10-17 13:25 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Maciej W. Rozycki, Kevin D. Kissell, linux-mips

Hello Ralf,

On Thu, Oct 17, 2002 at 01:57:29PM +0200, Maciej W. Rozycki wrote:
> On Wed, 16 Oct 2002, Johannes Stezenbach wrote:
> 
> > The patch is only for the VR41XX. I'm not shure what other CPUs
> > fall into the same category. If I read binutils/opcodes/mips-opc.c
> > correctly, then the TX39XX, while not being ISA2, has beql.
> 
>  I think I have TX39XX docs somewhere -- I may check if that's true.
> 
> > Please tell me if the patch is acceptable.
> > 
> > Possible options:
> > - don't mess with tlbex-r4k.S
> > - or unconditonally replace the 'nop's before 'eret's in tlbex-r4k.S with
> >   'move k1,zero' plus a comment
> 
>  I'd go for that, so that VR41XX user binaries work fine on real MIPS II+
> processors as well.  There is no performance nor space impact for
> tlbex-r4k.S and for stackframe.h the single-instruction impact is not
> critical, or I believe there is a single free slot in RESTORE_SOME that
> may be reused (after a bit of restructuring to make sure
> RESTORE_SP_AND_RET isn't used alone). 

This also would prevent me from shooting myself in the foot by accidentally
running a VR41XX user binaries on a kernel with "clear k1" support disabled ;-)

> > - drop the CONFIG_CPU_USERSPACE_LLSC_EMUL configuration option and
> >   always clear k1 in RESTORE_SP_AND_RET for the VR41XX
> 
>  And this one as well.  There is no need for a separate config option --
> lone comments in place should suffice.
> 
>  But you may ask Ralf before making further changes as he is the one to
> decide if the patch goes in. 

If this is OK with you, I would prepare a patch that just
unconditionally clears k1 before every eret in tlbex-r4k.S
and stackframe.h according to Maciej's suggestions, and adds
a comment explaining its purpose.


Regards,
Johannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-17 13:11                     ` Johannes Stezenbach
@ 2002-10-17 13:32                       ` Gleb O. Raiko
  2002-10-17 14:13                         ` Johannes Stezenbach
  0 siblings, 1 reply; 25+ messages in thread
From: Gleb O. Raiko @ 2002-10-17 13:32 UTC (permalink / raw)
  To: Johannes Stezenbach; +Cc: Maciej W. Rozycki, Kevin D. Kissell, linux-mips

Johannes Stezenbach wrote:
> 
> On Thu, Oct 17, 2002 at 02:02:35PM +0200, Maciej W. Rozycki wrote:
> > On Thu, 17 Oct 2002, Gleb O. Raiko wrote:
> >
> > > Implement new sysmips then.
> >
> >  I'm not sure if that's a good idea.  Glibc alone uses test_and_set(),
> > exchange_and_add(), atomic_add() and compare_and_swap().  Do you want a
> > separate syscall for each of these functions?  I think the ll/sc emulation
> > may be the best solution after all.  At least it's most flexible and not
> > much slower if at all.
> 
> Depends on your usage pattern. E.g. we don't run software that uses
> atomicity.h (i.e. no C++ code), but heavily use pthread_mutex_lock() etc.
> The few uses of atomicity.h internal to glibc don't warrant
> any optimizations. So, if the beql-Method would not exist, I would
> consider implementing a new sysmips for compare_and_swap().

I didn't look at newer glibc sources (read: greater than 2.0.6), so the
question. Why  is the difference between compare_and_swap and
test_and_set so huge that it eats an exception penalty? ;-)

Regards,
Gleb.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Once again: test_and_set for CPUs w/o LL/SC
  2002-10-17 13:32                       ` Gleb O. Raiko
@ 2002-10-17 14:13                         ` Johannes Stezenbach
  0 siblings, 0 replies; 25+ messages in thread
From: Johannes Stezenbach @ 2002-10-17 14:13 UTC (permalink / raw)
  To: Gleb O. Raiko; +Cc: Maciej W. Rozycki, Kevin D. Kissell, linux-mips

On Thu, Oct 17, 2002 at 05:32:03PM +0400, Gleb O. Raiko wrote:
> Johannes Stezenbach wrote:
> > 
> > On Thu, Oct 17, 2002 at 02:02:35PM +0200, Maciej W. Rozycki wrote:
> > > On Thu, 17 Oct 2002, Gleb O. Raiko wrote:
> > >
> > > > Implement new sysmips then.
> > >
> > >  I'm not sure if that's a good idea.  Glibc alone uses test_and_set(),
> > > exchange_and_add(), atomic_add() and compare_and_swap().  Do you want a
> > > separate syscall for each of these functions?  I think the ll/sc emulation
> > > may be the best solution after all.  At least it's most flexible and not
> > > much slower if at all.
> > 
> > Depends on your usage pattern. E.g. we don't run software that uses
> > atomicity.h (i.e. no C++ code), but heavily use pthread_mutex_lock() etc.
> > The few uses of atomicity.h internal to glibc don't warrant
> > any optimizations. So, if the beql-Method would not exist, I would
> > consider implementing a new sysmips for compare_and_swap().
> 
> I didn't look at newer glibc sources (read: greater than 2.0.6), so the
> question. Why  is the difference between compare_and_swap and
> test_and_set so huge that it eats an exception penalty? ;-)

It is not. I wrote:
  ... But with LL/SC glibc can use compare-and-swap
  which enables a more efficient linux-threads mutex implementation.

This is what makes the difference, at least for glibc-2.2.5. Just
grep for HAS_COMPARE_AND_SWAP in your linuxthreads sources.

Current glibc from CVS (both HEAD an 2.2 branch) doesn't use sysmips
anymore, they rely on LL/SC (emulated or not).


Regards,
Johannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2002-10-17 14:13 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-09-16 16:40 Once again: test_and_set for CPUs w/o LL/SC Johannes Stezenbach
2002-10-07 14:47 ` Johannes Stezenbach
2002-10-07 16:21   ` Kevin D. Kissell
2002-10-07 16:21     ` Kevin D. Kissell
2002-10-07 18:43     ` Johannes Stezenbach
2002-10-07 18:51       ` Daniel Jacobowitz
2002-10-15 17:52         ` Johannes Stezenbach
2002-10-08  7:38       ` Kevin D. Kissell
2002-10-08  7:38         ` Kevin D. Kissell
2002-10-15 15:36       ` Maciej W. Rozycki
2002-10-15 17:21         ` Johannes Stezenbach
2002-10-16 12:20           ` Maciej W. Rozycki
2002-10-16 12:52             ` Johannes Stezenbach
2002-10-16 16:30               ` Johannes Stezenbach
2002-10-17  9:47                 ` Gleb O. Raiko
2002-10-17 12:02                   ` Maciej W. Rozycki
2002-10-17 13:11                     ` Johannes Stezenbach
2002-10-17 13:32                       ` Gleb O. Raiko
2002-10-17 14:13                         ` Johannes Stezenbach
2002-10-16 18:11         ` Johannes Stezenbach
2002-10-16 18:23           ` Johannes Stezenbach
2002-10-17 11:57           ` Maciej W. Rozycki
2002-10-17 13:25             ` Johannes Stezenbach
2002-10-15 15:17     ` Maciej W. Rozycki
2002-10-15 16:50       ` Johannes Stezenbach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox