From: Kumba <kumba@gentoo.org>
To: libc-ports@sources.redhat.com
Cc: Daniel Jacobowitz <drow@false.org>,
Linux MIPS List <linux-mips@linux-mips.org>
Subject: Re: [PATCH]: R10000 Needs LL/SC Workaround in Glibc
Date: Sat, 01 Nov 2008 03:33:33 -0400 [thread overview]
Message-ID: <490C064D.8050409@gentoo.org> (raw)
In-Reply-To: <490A912A.8030901@gentoo.org>
[-- Attachment #1: Type: text/plain, Size: 1420 bytes --]
Kumba wrote:
>
> The attached patch adds a workaround for R10000 CPUs to use the branch
> likely (beqzl) instruction in atomic operations, because revisions of
> the CPU before 3.0 misbehave, while revisions 2.6 and earlier will
> deadlock. This issue has been noted on SGI IP28 (Indigo2 Impact R10000)
> systems and SGI IP27 Origin systems.
>
> I drafted it after some discussion with several people in the Linux/MIPS
> IRC Channel after discovering glibc didn't work quite right on my IP28
> machine. The patch is based on Debian bug #462112, viewable here:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=462112
>
> Feedback would be welcome on any suggestions for improving this patch
> (please CC, as I'm not subscribed to the ML).
Had a typo in my original patch w/ some stray parenthesis. A fixed patch is
attached.
I've wondered this on the equivalent patch on the gcc-patches ML as well, on
whether this check should be strictly limited to when -march=r10000 is passed to
the compiler. I think -march=mips4 is probably better, but would having beqzl
used even when -march=mips2, which is the ISA level that added branch likely, be
even better?
--
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
"The past tempts us, the present confuses us, the future frightens us. And our
lives slip away, moment by moment, lost in that vast, terrible in-between."
--Emperor Turhan, Centauri Republic
[-- Attachment #2: glibc-trunk-r10k-beqzl.patch --]
[-- Type: text/plain, Size: 3764 bytes --]
diff -Naurp libc.orig/ports/sysdeps/mips/bits/atomic.h libc/ports/sysdeps/mips/bits/atomic.h
--- libc.orig/ports/sysdeps/mips/bits/atomic.h 2005-03-28 04:14:59.000000000 -0500
+++ libc/ports/sysdeps/mips/bits/atomic.h 2008-10-30 23:39:37.000000000 -0400
@@ -53,6 +53,31 @@ typedef uintmax_t uatomic_max_t;
#define MIPS_SYNC_STR_1(X) MIPS_SYNC_STR_2(X)
#define MIPS_SYNC_STR MIPS_SYNC_STR_1(MIPS_SYNC)
+/* Certain revisions of the R10000 Processor need an LL/SC Workaround
+ enabled. Revisions before 3.0 misbehave on atomic operations, and
+ Revs 2.6 and lower deadlock after several seconds due to other errata.
+
+ To quote the R10K Errata:
+ Workaround: The basic idea is to inhibit the four instructions
+ from simultaneously becoming active in R10000. Padding all
+ ll/sc sequences with nops or changing the looping branch in the
+ routines to a branch likely (which is always predicted taken
+ by R10000) will work. The nops should go after the loop, and the
+ number of them should be 28. This number could be decremented for
+ each additional instruction in the ll/sc loop such as the lock
+ modifier(s) between the ll and sc, the looping branch and its
+ delay slot. For typical short routines with one ll/sc loop, any
+ instructions after the loop could also count as a decrement. The
+ nop workaround pollutes the cache more but would be a few cycles
+ faster if all the code is in the cache and the looping branch
+ is predicted not taken. */
+
+#ifndef _MIPS_ARCH_R10000
+#define R10K_BEQZ_INSN "beqz %1,1b\n"
+#else
+#define R10K_BEQZ_INSN "beqzl %1,1b\n"
+#endif
+
/* Compare and exchange. For all of the "xxx" routines, we expect a
"__prev" and a "__cmp" variable to be provided by the enclosing scope,
in which values are returned. */
@@ -74,7 +99,7 @@ typedef uintmax_t uatomic_max_t;
"bne %0,%2,2f\n\t" \
"move %1,%3\n\t" \
"sc %1,%4\n\t" \
- "beqz %1,1b\n" \
+ R10K_BEQZ_INSN \
acq "\n\t" \
".set pop\n" \
"2:\n\t" \
@@ -98,7 +123,7 @@ typedef uintmax_t uatomic_max_t;
"bne %0,%2,2f\n\t" \
"move %1,%3\n\t" \
"scd %1,%4\n\t" \
- "beqz %1,1b\n" \
+ R10K_BEQZ_INSN \
acq "\n\t" \
".set pop\n" \
"2:\n\t" \
@@ -192,7 +217,7 @@ typedef uintmax_t uatomic_max_t;
"ll %0,%3\n\t" \
"move %1,%2\n\t" \
"sc %1,%3\n\t" \
- "beqz %1,1b\n" \
+ R10K_BEQZ_INSN \
acq "\n\t" \
".set pop\n" \
"2:\n\t" \
@@ -216,7 +241,7 @@ typedef uintmax_t uatomic_max_t;
"lld %0,%3\n\t" \
"move %1,%2\n\t" \
"scd %1,%3\n\t" \
- "beqz %1,1b\n" \
+ R10K_BEQZ_INSN \
acq "\n\t" \
".set pop\n" \
"2:\n\t" \
@@ -251,7 +276,7 @@ typedef uintmax_t uatomic_max_t;
"ll %0,%3\n\t" \
"addu %1,%0,%2\n\t" \
"sc %1,%3\n\t" \
- "beqz %1,1b\n" \
+ R10K_BEQZ_INSN \
acq "\n\t" \
".set pop\n" \
"2:\n\t" \
@@ -275,7 +300,7 @@ typedef uintmax_t uatomic_max_t;
"lld %0,%3\n\t" \
"daddu %1,%0,%2\n\t" \
"scd %1,%3\n\t" \
- "beqz %1,1b\n" \
+ R10K_BEQZ_INSN \
acq "\n\t" \
".set pop\n" \
"2:\n\t" \
next prev parent reply other threads:[~2008-11-01 8:43 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-31 5:01 [PATCH]: R10000 Needs LL/SC Workaround in Glibc Kumba
2008-11-01 7:33 ` Kumba [this message]
2008-11-01 11:26 ` Ralf Baechle
2008-11-04 7:16 ` Kumba
2008-11-01 17:23 ` James Perkins
2008-11-04 7:16 ` Kumba
2008-11-23 4:16 ` Kumba
2009-01-27 15:29 ` Daniel Jacobowitz
2009-01-27 16:13 ` Maciej W. Rozycki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=490C064D.8050409@gentoo.org \
--to=kumba@gentoo.org \
--cc=drow@false.org \
--cc=libc-ports@sources.redhat.com \
--cc=linux-mips@linux-mips.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.