* O2 RM7000 Issues
@ 2007-07-01 16:57 Kumba
2007-07-01 22:07 ` freshy98
` (2 more replies)
0 siblings, 3 replies; 26+ messages in thread
From: Kumba @ 2007-07-01 16:57 UTC (permalink / raw)
To: Linux MIPS List
So I finally managed to get my hands on one of those super-rare RM7000 CPU's for
the SGI O2, and, as expected, there are some problems.
For the most part, the system will boot into userland, but it seems userland
isn't at all very happy. It seems bash is the unhappiest customer so far (or
rather, the only userland program I've seen fail repeatedly). Running the
Gentoo init scripts at startup, several scripts will terminate will a variety of
messages, from Trace/breakpoint traps, to bus errors to illegal instructions.
However, in the init scripts, they happen at specific points; usually when
booting our network startup scripts (neth.eth0, net.lo), and usually on an
exit() function in the script. Our emerge process, while python at heart, seems
to fail sporadically on the bash sections (parsing the ebuild code) as well.
I've got a feeling this is likely a problem in the kernel more than it is a
problem in the userland, but the question is how to go about determining which
and where. The RM7K's are pretty rare, so I imagine there's probably a few
undiscovered quirks in the code (notably the SC code in arch/mips/mm/sc-rm7k.c).
Not to mention, we can't even use the 1MB tertiary cache these things have.
For reference, system info:
> hinv
System: IP32
Processor: 350 Mhz RM7000, with FPU
Primary I-cache size: 16 Kbytes
Primary D-cache size: 16 Kbytes
Secondary cache size: 256 Kbytes
Ternary cache size: 1024 Kbytes
Memory size: 512 Mbytes
Graphics: CRM, Rev C
Audio: A3 version 1
SCSI Disk: scsi(0)disk(2)
SCSI Disk: scsi(0)disk(3)
SCSI CDROM: scsi(0)cdrom(4)
# cat /proc/cpuinfo
system type : SGI O2
processor : 0
cpu model : RM7000 V3.3 FPU V2.0
BogoMIPS : 350.20
byteorder : big endian
wait instruction : yes
microsecond timers : yes
tlb_entries : 48
extra interrupt vector : no
hardware watchpoint : no
ASEs implemented :
VCED exceptions : not available
VCEI exceptions : not available
And errors (from various points in the execution and multiple reboots):
* Starting eth0
/sbin/runscript.sh: line 428: 2475 Illegal instruction ( function exit ()
* Starting lo
/sbin/runscript.sh: line 428: 1464 Illegal instruction ( function exit ()
* Starting eth0
/etc/init.d/net.eth0: line 248: 1650 Bus error ( u=0;
module_load_minimum "${MODULES[i]}" || u=1; if [[ ${u} == 0 ]]; then
/sbin/runscript.sh: line 428: 2779 Bus error ( function exit ()
{
* Stopping syslog-ng ... [ ok ]
/lib/rcscripts/sh/rc-services.sh: line 444: 4093 Illegal instruction (
"/etc/init.d/${service}" stop )
/lib/rcscripts/sh/rc-services.sh: line 384: 1095 Trace/breakpoint trap
"/etc/init.d/${service}" start
So if anyone's got some old rm7k patches sitting around they want tested, or
spots where to look/debug options to turn on, let me know. I'll try switching
back to an RM5200 and rebuild bash with -g and make sure gdb is installed, them
change back to the RM7000 to try and capture some asm call or something that's
causing these exit() failures in bash (which seem to be the primary symptom)
Cheers,
--Kumba
--
Gentoo/MIPS Team Lead
"Such is oft the course of deeds that move the wheels of the world: small hands
do them because they must, while the eyes of the great are elsewhere." --Elrond
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-01 16:57 O2 RM7000 Issues Kumba
@ 2007-07-01 22:07 ` freshy98
2007-07-02 13:08 ` sknauert
2007-07-02 13:08 ` sknauert
2007-07-02 14:34 ` Maciej W. Rozycki
2007-09-21 6:27 ` Sagar Borikar
2 siblings, 2 replies; 26+ messages in thread
From: freshy98 @ 2007-07-01 22:07 UTC (permalink / raw)
To: Kumba; +Cc: Linux MIPS List
Ah, finally someone else got a hold of these things :-)
Still got mine unused at all.
Not done much linux at al for the past few years, but always interested
to see how things might work out for either R5K or RM7K O2's.
Tom
Kumba wrote:
>
> So I finally managed to get my hands on one of those super-rare RM7000
> CPU's for the SGI O2, and, as expected, there are some problems.
>
> For the most part, the system will boot into userland, but it seems
> userland isn't at all very happy. It seems bash is the unhappiest
> customer so far (or rather, the only userland program I've seen fail
> repeatedly). Running the Gentoo init scripts at startup, several
> scripts will terminate will a variety of messages, from Trace/breakpoint
> traps, to bus errors to illegal instructions. However, in the init
> scripts, they happen at specific points; usually when booting our
> network startup scripts (neth.eth0, net.lo), and usually on an exit()
> function in the script. Our emerge process, while python at heart,
> seems to fail sporadically on the bash sections (parsing the ebuild
> code) as well.
>
> I've got a feeling this is likely a problem in the kernel more than it
> is a problem in the userland, but the question is how to go about
> determining which and where. The RM7K's are pretty rare, so I imagine
> there's probably a few undiscovered quirks in the code (notably the SC
> code in arch/mips/mm/sc-rm7k.c). Not to mention, we can't even use the
> 1MB tertiary cache these things have.
>
> For reference, system info:
>
> > hinv
> System: IP32
> Processor: 350 Mhz RM7000, with FPU
> Primary I-cache size: 16 Kbytes
> Primary D-cache size: 16 Kbytes
> Secondary cache size: 256 Kbytes
> Ternary cache size: 1024 Kbytes
> Memory size: 512 Mbytes
> Graphics: CRM, Rev C
> Audio: A3 version 1
> SCSI Disk: scsi(0)disk(2)
> SCSI Disk: scsi(0)disk(3)
> SCSI CDROM: scsi(0)cdrom(4)
>
>
> # cat /proc/cpuinfo
> system type : SGI O2
> processor : 0
> cpu model : RM7000 V3.3 FPU V2.0
> BogoMIPS : 350.20
> byteorder : big endian
> wait instruction : yes
> microsecond timers : yes
> tlb_entries : 48
> extra interrupt vector : no
> hardware watchpoint : no
> ASEs implemented :
> VCED exceptions : not available
> VCEI exceptions : not available
>
>
>
>
>
> And errors (from various points in the execution and multiple reboots):
>
> * Starting eth0
> /sbin/runscript.sh: line 428: 2475 Illegal instruction ( function
> exit ()
>
> * Starting lo
> /sbin/runscript.sh: line 428: 1464 Illegal instruction ( function
> exit ()
>
>
> * Starting eth0
> /etc/init.d/net.eth0: line 248: 1650 Bus error ( u=0;
> module_load_minimum "${MODULES[i]}" || u=1; if [[ ${u} == 0 ]]; then
>
> /sbin/runscript.sh: line 428: 2779 Bus error ( function
> exit ()
> {
>
> * Stopping syslog-ng
> ... [ ok ]
> /lib/rcscripts/sh/rc-services.sh: line 444: 4093 Illegal
> instruction ( "/etc/init.d/${service}" stop )
>
> /lib/rcscripts/sh/rc-services.sh: line 384: 1095 Trace/breakpoint trap
> "/etc/init.d/${service}" start
>
>
>
> So if anyone's got some old rm7k patches sitting around they want
> tested, or spots where to look/debug options to turn on, let me know.
> I'll try switching back to an RM5200 and rebuild bash with -g and make
> sure gdb is installed, them change back to the RM7000 to try and capture
> some asm call or something that's causing these exit() failures in bash
> (which seem to be the primary symptom)
>
>
> Cheers,
>
>
> --Kumba
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-01 22:07 ` freshy98
@ 2007-07-02 13:08 ` sknauert
2007-07-02 13:08 ` sknauert
1 sibling, 0 replies; 26+ messages in thread
From: sknauert @ 2007-07-02 13:08 UTC (permalink / raw)
To: Linux MIPS List
I have one of the 600 Mhz RM7000s, i.e. no tertiary cache since the module
was originally a 300 Mhz RM5200. However, mine hasn't given any problems
with Debian or Gentoo.
What kernel and target are you compiling for? I'm using 2.6.21.3 compiled
for R5K. All my userspace is compiled for R5K too. I'll compile a new
kernel for RM7000 and see if I have any issue then poke around to see what
kernel code gets changed. I'm not 100% sure, but I didn't think it was
that much so my gut reaction is this might be a gcc issue since the RM7000
isn't a common processor.
> Ah, finally someone else got a hold of these things :-)
> Still got mine unused at all.
> Not done much linux at al for the past few years, but always interested
> to see how things might work out for either R5K or RM7K O2's.
>
> Tom
>
>
> Kumba wrote:
>>
>> So I finally managed to get my hands on one of those super-rare RM7000
>> CPU's for the SGI O2, and, as expected, there are some problems.
>>
>> For the most part, the system will boot into userland, but it seems
>> userland isn't at all very happy. It seems bash is the unhappiest
>> customer so far (or rather, the only userland program I've seen fail
>> repeatedly). Running the Gentoo init scripts at startup, several
>> scripts will terminate will a variety of messages, from Trace/breakpoint
>> traps, to bus errors to illegal instructions. However, in the init
>> scripts, they happen at specific points; usually when booting our
>> network startup scripts (neth.eth0, net.lo), and usually on an exit()
>> function in the script. Our emerge process, while python at heart,
>> seems to fail sporadically on the bash sections (parsing the ebuild
>> code) as well.
>>
>> I've got a feeling this is likely a problem in the kernel more than it
>> is a problem in the userland, but the question is how to go about
>> determining which and where. The RM7K's are pretty rare, so I imagine
>> there's probably a few undiscovered quirks in the code (notably the SC
>> code in arch/mips/mm/sc-rm7k.c). Not to mention, we can't even use the
>> 1MB tertiary cache these things have.
>>
>> For reference, system info:
>>
>> > hinv
>> System: IP32
>> Processor: 350 Mhz RM7000, with FPU
>> Primary I-cache size: 16 Kbytes
>> Primary D-cache size: 16 Kbytes
>> Secondary cache size: 256 Kbytes
>> Ternary cache size: 1024 Kbytes
>> Memory size: 512 Mbytes
>> Graphics: CRM, Rev C
>> Audio: A3 version 1
>> SCSI Disk: scsi(0)disk(2)
>> SCSI Disk: scsi(0)disk(3)
>> SCSI CDROM: scsi(0)cdrom(4)
>>
>>
>> # cat /proc/cpuinfo
>> system type : SGI O2
>> processor : 0
>> cpu model : RM7000 V3.3 FPU V2.0
>> BogoMIPS : 350.20
>> byteorder : big endian
>> wait instruction : yes
>> microsecond timers : yes
>> tlb_entries : 48
>> extra interrupt vector : no
>> hardware watchpoint : no
>> ASEs implemented :
>> VCED exceptions : not available
>> VCEI exceptions : not available
>>
>>
>>
>>
>>
>> And errors (from various points in the execution and multiple reboots):
>>
>> * Starting eth0
>> /sbin/runscript.sh: line 428: 2475 Illegal instruction ( function
>> exit ()
>>
>> * Starting lo
>> /sbin/runscript.sh: line 428: 1464 Illegal instruction ( function
>> exit ()
>>
>>
>> * Starting eth0
>> /etc/init.d/net.eth0: line 248: 1650 Bus error ( u=0;
>> module_load_minimum "${MODULES[i]}" || u=1; if [[ ${u} == 0 ]]; then
>>
>> /sbin/runscript.sh: line 428: 2779 Bus error ( function
>> exit ()
>> {
>>
>> * Stopping syslog-ng
>> ... [ ok ]
>> /lib/rcscripts/sh/rc-services.sh: line 444: 4093 Illegal
>> instruction ( "/etc/init.d/${service}" stop )
>>
>> /lib/rcscripts/sh/rc-services.sh: line 384: 1095 Trace/breakpoint trap
>> "/etc/init.d/${service}" start
>>
>>
>>
>> So if anyone's got some old rm7k patches sitting around they want
>> tested, or spots where to look/debug options to turn on, let me know.
>> I'll try switching back to an RM5200 and rebuild bash with -g and make
>> sure gdb is installed, them change back to the RM7000 to try and capture
>> some asm call or something that's causing these exit() failures in bash
>> (which seem to be the primary symptom)
>>
>>
>> Cheers,
>>
>>
>> --Kumba
>>
>
>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-01 22:07 ` freshy98
2007-07-02 13:08 ` sknauert
@ 2007-07-02 13:08 ` sknauert
2007-07-04 15:27 ` Ralf Baechle
1 sibling, 1 reply; 26+ messages in thread
From: sknauert @ 2007-07-02 13:08 UTC (permalink / raw)
To: Linux MIPS List
I have one of the 600 Mhz RM7000s, i.e. no tertiary cache since the module
was originally a 300 Mhz RM5200. However, mine hasn't given any problems
with Debian or Gentoo.
What kernel and target are you compiling for? I'm using 2.6.21.3 compiled
for R5K. All my userspace is compiled for R5K too. I'll compile a new
kernel for RM7000 and see if I have any issue then poke around to see what
kernel code gets changed. I'm not 100% sure, but I didn't think it was
that much so my gut reaction is this might be a gcc issue since the RM7000
isn't a common processor.
> Ah, finally someone else got a hold of these things :-)
> Still got mine unused at all.
> Not done much linux at al for the past few years, but always interested
> to see how things might work out for either R5K or RM7K O2's.
>
> Tom
>
>
> Kumba wrote:
>>
>> So I finally managed to get my hands on one of those super-rare RM7000
>> CPU's for the SGI O2, and, as expected, there are some problems.
>>
>> For the most part, the system will boot into userland, but it seems
>> userland isn't at all very happy. It seems bash is the unhappiest
>> customer so far (or rather, the only userland program I've seen fail
>> repeatedly). Running the Gentoo init scripts at startup, several
>> scripts will terminate will a variety of messages, from Trace/breakpoint
>> traps, to bus errors to illegal instructions. However, in the init
>> scripts, they happen at specific points; usually when booting our
>> network startup scripts (neth.eth0, net.lo), and usually on an exit()
>> function in the script. Our emerge process, while python at heart,
>> seems to fail sporadically on the bash sections (parsing the ebuild
>> code) as well.
>>
>> I've got a feeling this is likely a problem in the kernel more than it
>> is a problem in the userland, but the question is how to go about
>> determining which and where. The RM7K's are pretty rare, so I imagine
>> there's probably a few undiscovered quirks in the code (notably the SC
>> code in arch/mips/mm/sc-rm7k.c). Not to mention, we can't even use the
>> 1MB tertiary cache these things have.
>>
>> For reference, system info:
>>
>> > hinv
>> System: IP32
>> Processor: 350 Mhz RM7000, with FPU
>> Primary I-cache size: 16 Kbytes
>> Primary D-cache size: 16 Kbytes
>> Secondary cache size: 256 Kbytes
>> Ternary cache size: 1024 Kbytes
>> Memory size: 512 Mbytes
>> Graphics: CRM, Rev C
>> Audio: A3 version 1
>> SCSI Disk: scsi(0)disk(2)
>> SCSI Disk: scsi(0)disk(3)
>> SCSI CDROM: scsi(0)cdrom(4)
>>
>>
>> # cat /proc/cpuinfo
>> system type : SGI O2
>> processor : 0
>> cpu model : RM7000 V3.3 FPU V2.0
>> BogoMIPS : 350.20
>> byteorder : big endian
>> wait instruction : yes
>> microsecond timers : yes
>> tlb_entries : 48
>> extra interrupt vector : no
>> hardware watchpoint : no
>> ASEs implemented :
>> VCED exceptions : not available
>> VCEI exceptions : not available
>>
>>
>>
>>
>>
>> And errors (from various points in the execution and multiple reboots):
>>
>> * Starting eth0
>> /sbin/runscript.sh: line 428: 2475 Illegal instruction ( function
>> exit ()
>>
>> * Starting lo
>> /sbin/runscript.sh: line 428: 1464 Illegal instruction ( function
>> exit ()
>>
>>
>> * Starting eth0
>> /etc/init.d/net.eth0: line 248: 1650 Bus error ( u=0;
>> module_load_minimum "${MODULES[i]}" || u=1; if [[ ${u} == 0 ]]; then
>>
>> /sbin/runscript.sh: line 428: 2779 Bus error ( function
>> exit ()
>> {
>>
>> * Stopping syslog-ng
>> ... [ ok ]
>> /lib/rcscripts/sh/rc-services.sh: line 444: 4093 Illegal
>> instruction ( "/etc/init.d/${service}" stop )
>>
>> /lib/rcscripts/sh/rc-services.sh: line 384: 1095 Trace/breakpoint trap
>> "/etc/init.d/${service}" start
>>
>>
>>
>> So if anyone's got some old rm7k patches sitting around they want
>> tested, or spots where to look/debug options to turn on, let me know.
>> I'll try switching back to an RM5200 and rebuild bash with -g and make
>> sure gdb is installed, them change back to the RM7000 to try and capture
>> some asm call or something that's causing these exit() failures in bash
>> (which seem to be the primary symptom)
>>
>>
>> Cheers,
>>
>>
>> --Kumba
>>
>
>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-01 16:57 O2 RM7000 Issues Kumba
2007-07-01 22:07 ` freshy98
@ 2007-07-02 14:34 ` Maciej W. Rozycki
2007-09-21 6:27 ` Sagar Borikar
2 siblings, 0 replies; 26+ messages in thread
From: Maciej W. Rozycki @ 2007-07-02 14:34 UTC (permalink / raw)
To: Kumba; +Cc: Linux MIPS List
On Sun, 1 Jul 2007, Kumba wrote:
> I've got a feeling this is likely a problem in the kernel more than it is a
> problem in the userland, but the question is how to go about determining which
> and where. The RM7K's are pretty rare, so I imagine there's probably a few
> undiscovered quirks in the code (notably the SC code in
> arch/mips/mm/sc-rm7k.c).
FYI, I had problems with the secondary cache of this CPU the last time I
tried it with a Malta too -- random hangs of user processes. So far I
have had no time to dig into it unfortunately.
Maciej
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-02 13:08 ` sknauert
@ 2007-07-04 15:27 ` Ralf Baechle
2007-07-04 19:22 ` Ralf Baechle
0 siblings, 1 reply; 26+ messages in thread
From: Ralf Baechle @ 2007-07-04 15:27 UTC (permalink / raw)
To: sknauert; +Cc: Linux MIPS List
On Mon, Jul 02, 2007 at 09:08:43AM -0400, sknauert@wesleyan.edu wrote:
> From: sknauert@wesleyan.edu
> Date: Mon, 2 Jul 2007 09:08:43 -0400 (EDT)
> To: Linux MIPS List <linux-mips@linux-mips.org>
> Subject: Re: O2 RM7000 Issues
> Content-Type: text/plain;charset=UTF-8
>
> I have one of the 600 Mhz RM7000s, i.e. no tertiary cache since the module
> was originally a 300 Mhz RM5200. However, mine hasn't given any problems
> with Debian or Gentoo.
>
> What kernel and target are you compiling for? I'm using 2.6.21.3 compiled
> for R5K. All my userspace is compiled for R5K too. I'll compile a new
> kernel for RM7000 and see if I have any issue then poke around to see what
> kernel code gets changed. I'm not 100% sure, but I didn't think it was
> that much so my gut reaction is this might be a gcc issue since the RM7000
> isn't a common processor.
R5000, RM5200 and RM7000 are all MIPS IV processors so have the same
instruction set. That leaves the usual suspects - pipeline hazards,
cache problems and CPU bugs to research.
Ralf
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-04 15:27 ` Ralf Baechle
@ 2007-07-04 19:22 ` Ralf Baechle
2007-07-16 11:53 ` Sergey Rogozhkin
2007-07-17 9:04 ` Sergey Rogozhkin
0 siblings, 2 replies; 26+ messages in thread
From: Ralf Baechle @ 2007-07-04 19:22 UTC (permalink / raw)
To: sknauert; +Cc: Linux MIPS List
On Wed, Jul 04, 2007 at 04:27:29PM +0100, Ralf Baechle wrote:
> R5000, RM5200 and RM7000 are all MIPS IV processors so have the same
> instruction set. That leaves the usual suspects - pipeline hazards,
> cache problems and CPU bugs to research.
Big loud bell began ringing. The RM7000 fetches and decodes multiple
instructions in one go. And just like the E9000 cores it does
throw an exception if it doesn't like one of the opcodes even if that
doesn't actually get executed. The kernel has a workaround for this
PMC-Sierra peculiarity (I call it a bug) but it's only being activated
for E9000 platforms.
Ralf
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-04 19:22 ` Ralf Baechle
@ 2007-07-16 11:53 ` Sergey Rogozhkin
2007-07-16 12:33 ` Ralf Baechle
2007-07-17 9:04 ` Sergey Rogozhkin
1 sibling, 1 reply; 26+ messages in thread
From: Sergey Rogozhkin @ 2007-07-16 11:53 UTC (permalink / raw)
To: Ralf Baechle, kumba; +Cc: linux-mips
>
> Big loud bell began ringing. The RM7000 fetches and decodes multiple
> instructions in one go. And just like the E9000 cores it does
> throw an exception if it doesn't like one of the opcodes even if that
> doesn't actually get executed. The kernel has a workaround for this
> PMC-Sierra peculiarity (I call it a bug) but it's only being activated
> for E9000 platforms.
We have had a similar problems with shell on RM7000 based system. It
seems, the reason listed above is only half of the problem, another is:
linux works incorrectly with RM7000 caches hierarchy. One visible effect
is errors in userspace on signal delivery trampolines.
Lets imagine we deliver a signal to application: we write signal
trampoline instructions to stack, writeback (and invalidate)
corresponding dcache line, invalidate corresponding icache line. Thats
all, and we think that we can safely execute the trampoline, but this is
wrong on RM7000! Our trampoline is now in scache, and everything seems
to be ok, but after some number of load/stores corresponding scache line
can be moved to dcache, replaced in scache by another data and not
written to memory (this is a feature of RM7000 caches, its dcache is not
a subset of scache, you can find a possible scenario of similar (but not
the same) cache line transference in RM7000 manual (7.1.5 Orphaned Cache
Lines)). After that it is possible that on signal trampoline execution
icache fetch old memory content instead of instruction written. If we
want to execute instruction written by cpu, we must not only writeback
corresponding dcache lines, but also writeback corresponding scache
lines after it. The error is very sensitively to kernel/user code and
data arrangement, it can be visible with one kernel configuration and
irreproducible with another.
The problem affects not only signal trampoline flush to memory, but most
cases of icache invalidation in kernel.
Sergey Rogozhkin.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-16 11:53 ` Sergey Rogozhkin
@ 2007-07-16 12:33 ` Ralf Baechle
2007-07-16 17:38 ` Andrew Sharp
2007-07-17 7:54 ` Gleb O. Raiko
0 siblings, 2 replies; 26+ messages in thread
From: Ralf Baechle @ 2007-07-16 12:33 UTC (permalink / raw)
To: Sergey Rogozhkin; +Cc: kumba, linux-mips
On Mon, Jul 16, 2007 at 03:53:18PM +0400, Sergey Rogozhkin wrote:
> >Big loud bell began ringing. The RM7000 fetches and decodes multiple
> >instructions in one go. And just like the E9000 cores it does
> >throw an exception if it doesn't like one of the opcodes even if that
> >doesn't actually get executed. The kernel has a workaround for this
> >PMC-Sierra peculiarity (I call it a bug) but it's only being activated
> >for E9000 platforms.
>
> We have had a similar problems with shell on RM7000 based system. It
> seems, the reason listed above is only half of the problem, another is:
> linux works incorrectly with RM7000 caches hierarchy. One visible effect
> is errors in userspace on signal delivery trampolines.
> Lets imagine we deliver a signal to application: we write signal
> trampoline instructions to stack, writeback (and invalidate)
> corresponding dcache line, invalidate corresponding icache line. Thats
> all, and we think that we can safely execute the trampoline, but this is
> wrong on RM7000! Our trampoline is now in scache, and everything seems
> to be ok, but after some number of load/stores corresponding scache line
> can be moved to dcache, replaced in scache by another data and not
> written to memory (this is a feature of RM7000 caches, its dcache is not
> a subset of scache, you can find a possible scenario of similar (but not
> the same) cache line transference in RM7000 manual (7.1.5 Orphaned Cache
> Lines)). After that it is possible that on signal trampoline execution
> icache fetch old memory content instead of instruction written. If we
> want to execute instruction written by cpu, we must not only writeback
> corresponding dcache lines, but also writeback corresponding scache
> lines after it. The error is very sensitively to kernel/user code and
> data arrangement, it can be visible with one kernel configuration and
> irreproducible with another.
> The problem affects not only signal trampoline flush to memory, but most
> cases of icache invalidation in kernel.
Hmm... Makes sense. I guess I can cook up a patch based on that analysis.
Thanks!
Ralf
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-16 12:33 ` Ralf Baechle
@ 2007-07-16 17:38 ` Andrew Sharp
2007-07-17 14:01 ` Kumba
2007-07-17 7:54 ` Gleb O. Raiko
1 sibling, 1 reply; 26+ messages in thread
From: Andrew Sharp @ 2007-07-16 17:38 UTC (permalink / raw)
To: linux-mips
On Mon, 16 Jul 2007 13:33:44 +0100 Ralf Baechle <ralf@linux-mips.org>
wrote:
> On Mon, Jul 16, 2007 at 03:53:18PM +0400, Sergey Rogozhkin wrote:
>
> > >Big loud bell began ringing. The RM7000 fetches and decodes
> > >multiple instructions in one go. And just like the E9000 cores it
> > >does throw an exception if it doesn't like one of the opcodes even
> > >if that doesn't actually get executed. The kernel has a
> > >workaround for this PMC-Sierra peculiarity (I call it a bug) but
> > >it's only being activated for E9000 platforms.
> >
> > We have had a similar problems with shell on RM7000 based system.
> > It seems, the reason listed above is only half of the problem,
> > another is: linux works incorrectly with RM7000 caches hierarchy.
> > One visible effect is errors in userspace on signal delivery
> > trampolines. Lets imagine we deliver a signal to application: we
> > write signal trampoline instructions to stack, writeback (and
> > invalidate) corresponding dcache line, invalidate corresponding
> > icache line. Thats all, and we think that we can safely execute the
> > trampoline, but this is wrong on RM7000! Our trampoline is now in
> > scache, and everything seems to be ok, but after some number of
> > load/stores corresponding scache line can be moved to dcache,
> > replaced in scache by another data and not written to memory (this
> > is a feature of RM7000 caches, its dcache is not a subset of
> > scache, you can find a possible scenario of similar (but not the
> > same) cache line transference in RM7000 manual (7.1.5 Orphaned
> > Cache Lines)). After that it is possible that on signal trampoline
> > execution icache fetch old memory content instead of instruction
> > written. If we want to execute instruction written by cpu, we must
> > not only writeback corresponding dcache lines, but also writeback
> > corresponding scache lines after it. The error is very sensitively
> > to kernel/user code and data arrangement, it can be visible with
> > one kernel configuration and irreproducible with another. The
> > problem affects not only signal trampoline flush to memory, but
> > most cases of icache invalidation in kernel.
>
> Hmm... Makes sense. I guess I can cook up a patch based on that
> analysis.
I hungrily await said patch, as I believe this is a problem on RM9000
processors as well. I'm seeing "random" SIGILLs on user processes,
particularly large complicated shell scripts like configure on an RM9k
platform.
Cheers,
a
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-16 12:33 ` Ralf Baechle
2007-07-16 17:38 ` Andrew Sharp
@ 2007-07-17 7:54 ` Gleb O. Raiko
1 sibling, 0 replies; 26+ messages in thread
From: Gleb O. Raiko @ 2007-07-17 7:54 UTC (permalink / raw)
To: Ralf Baechle; +Cc: Sergey Rogozhkin, kumba, linux-mips
Ralf,
Considering RM7k the latest kernel improperly sets some hazards. At
least, mtc0_tlbw_hazard and tlbw_use_hazard shall contain 4 nops, not 2.
Also, there shall be 10 nops after modification of the K0 field of the
config register. The suspicious place I see is in
arch/mips/mm/c-r4k.c:coherency_setup():
change_c0_config(CONF_CM_CMASK, CONF_CM_DEFAULT);
If the K0 field has the value different than CONF_CM_DEFAULT, we
definitely need nops here and, I'm afraid, even the line shall be
executed uncached.
Strictly speaking, manual doesn't clearly define the term
"modification". I expect, if I write the same value in the K0 field, it
doesn't consider "modification".
And I guess all boards with RM7k select DMA_NONCOHERENT. Otherwise,
CONF_CM_DEFAULT will have a garbage in case of RM7k. Perhaps, it's worth
to select DMA_NONCOHERENT inside the "config CPU_RM7000" block.
Regards,
Gleb.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-04 19:22 ` Ralf Baechle
2007-07-16 11:53 ` Sergey Rogozhkin
@ 2007-07-17 9:04 ` Sergey Rogozhkin
2007-07-17 10:14 ` Ralf Baechle
2007-07-17 12:27 ` Ralf Baechle
1 sibling, 2 replies; 26+ messages in thread
From: Sergey Rogozhkin @ 2007-07-17 9:04 UTC (permalink / raw)
To: Ralf Baechle; +Cc: Linux MIPS List, Gleb O. Raiko, Kumba
> Big loud bell began ringing. The RM7000 fetches and decodes multiple
> instructions in one go. And just like the E9000 cores it does
> throw an exception if it doesn't like one of the opcodes even if that
> doesn't actually get executed. The kernel has a workaround for this
> PMC-Sierra peculiarity (I call it a bug) but it's only being activated
> for E9000 platforms.
Are you really sure RM7000 has this bug? Workaround mentioned above
breaks gcc signal frame unwinding mechanism: it search for sigcontext
struct at fixed offset from signal trampoline.
And one another known RM7000 bug, maybe not taken into account by linux:
errata 38. r4k_wait is not suitable for RM7000 on some systems. I don't
know if "O2" is affected.
Sergey Rogozhkin
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-17 9:04 ` Sergey Rogozhkin
@ 2007-07-17 10:14 ` Ralf Baechle
2007-07-17 12:27 ` Ralf Baechle
1 sibling, 0 replies; 26+ messages in thread
From: Ralf Baechle @ 2007-07-17 10:14 UTC (permalink / raw)
To: Sergey Rogozhkin; +Cc: Linux MIPS List, Gleb O. Raiko, Kumba
On Tue, Jul 17, 2007 at 01:04:00PM +0400, Sergey Rogozhkin wrote:
> Are you really sure RM7000 has this bug? Workaround mentioned above
> breaks gcc signal frame unwinding mechanism: it search for sigcontext
> struct at fixed offset from signal trampoline.
Sigh. Yes, I am certain - this is information right from the CPU designers.
When I did modify the signal frame for PMC's E9000 core I knew some
software such as debuggers was likely to break, so I was careful to only
use the mechanism if absolutly necessary that is on E9000 cores. The
problem semmed to strike rather frequently on E9000 but there had been no
reports of application crashes matching the problem's fingerprint on RM7000
so the issue felt as if it was rather theoretical on RM7000. So I choose to
not enable the workaround for RM7000 until recently.
Ralf
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-17 9:04 ` Sergey Rogozhkin
2007-07-17 10:14 ` Ralf Baechle
@ 2007-07-17 12:27 ` Ralf Baechle
2007-09-17 23:04 ` Steve Graham
2007-09-17 23:20 ` David Daney
1 sibling, 2 replies; 26+ messages in thread
From: Ralf Baechle @ 2007-07-17 12:27 UTC (permalink / raw)
To: Sergey Rogozhkin; +Cc: Linux MIPS List, Gleb O. Raiko, Kumba
On Tue, Jul 17, 2007 at 01:04:00PM +0400, Sergey Rogozhkin wrote:
> >for E9000 platforms.
>
> Are you really sure RM7000 has this bug? Workaround mentioned above
> breaks gcc signal frame unwinding mechanism: it search for sigcontext
> struct at fixed offset from signal trampoline.
>
> And one another known RM7000 bug, maybe not taken into account by linux:
> errata 38. r4k_wait is not suitable for RM7000 on some systems. I don't
> know if "O2" is affected.
The fingerprint of this bug would be write data getting corrupted to
contain its physical address instead. I haven't seen such bug reports
ever but a hand full cycles of latency to the idle loop sounds like the
safe thing. Untested fix below.
What's really astonishing about this is that affects basically the entire
QED family of processors - R4600, R4700, R4640, R5000, RM52xx and RM7000.
Which also is yet again empirical proof for the WAIT instruction being
hard to get right ...
Ralf
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index f599e79..7ee0cb0 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -75,6 +75,26 @@ static void r4k_wait_irqoff(void)
local_irq_enable();
}
+/*
+ * The RM7000 variant has to handle erratum 38. The workaround is to not
+ * have any pending stores when the WAIT instruction is executed.
+ */
+static void rm7k_wait_irqoff(void)
+{
+ local_irq_disable();
+ if (!need_resched())
+ __asm__(
+ " .set push \n"
+ " .set mips3 \n"
+ " .set noat \n"
+ " mfc0 $1, $12 \n"
+ " sync \n"
+ " mtc0 $1, $12 \n"
+ " wait \n"
+ " .set pop \n");
+ local_irq_enable();
+}
+
/* The Au1xxx wait is available only if using 32khz counter or
* external timer source, but specifically not CP0 Counter. */
int allow_au1k_wait;
@@ -132,7 +152,6 @@ static inline void check_wait(void)
case CPU_R4700:
case CPU_R5000:
case CPU_NEVADA:
- case CPU_RM7000:
case CPU_4KC:
case CPU_4KEC:
case CPU_4KSC:
@@ -142,6 +161,10 @@ static inline void check_wait(void)
cpu_wait = r4k_wait;
break;
+ case CPU_RM7000:
+ cpu_wait = rm7k_wait_irqoff;
+ break;
+
case CPU_24K:
case CPU_34K:
cpu_wait = r4k_wait;
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-16 17:38 ` Andrew Sharp
@ 2007-07-17 14:01 ` Kumba
2007-07-19 18:58 ` Andrew Sharp
0 siblings, 1 reply; 26+ messages in thread
From: Kumba @ 2007-07-17 14:01 UTC (permalink / raw)
To: Andrew Sharp; +Cc: linux-mips
Andrew Sharp wrote:
>
> I hungrily await said patch, as I believe this is a problem on RM9000
> processors as well. I'm seeing "random" SIGILLs on user processes,
> particularly large complicated shell scripts like configure on an RM9k
> platform.
This was more or less exactly what I was seeing on an O2 RM7000 setup until the
fix for errata #28 was put in (which should already be enabled for RM9000 systems).
Check include/asm-mips/war.h and make sure your machine is included in the list
that define ICACHE_REFILLS_WORKAROUND_WAR. If not, add it and test; and fire
off a patch. Should fix that issue (especially if bash is the only userland
process dying while complex g++ compiles behave fine)
--Kumba
--
Gentoo/MIPS Team Lead
"Such is oft the course of deeds that move the wheels of the world: small hands
do them because they must, while the eyes of the great are elsewhere." --Elrond
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-17 14:01 ` Kumba
@ 2007-07-19 18:58 ` Andrew Sharp
2007-07-19 22:26 ` Shane McDonald
0 siblings, 1 reply; 26+ messages in thread
From: Andrew Sharp @ 2007-07-19 18:58 UTC (permalink / raw)
To: Kumba; +Cc: linux-mips
On Tue, 17 Jul 2007 10:01:24 -0400 Kumba <kumba@gentoo.org> wrote:
> Andrew Sharp wrote:
> >
> > I hungrily await said patch, as I believe this is a problem on
> > RM9000 processors as well. I'm seeing "random" SIGILLs on user
> > processes, particularly large complicated shell scripts like
> > configure on an RM9k platform.
>
> This was more or less exactly what I was seeing on an O2 RM7000 setup
> until the fix for errata #28 was put in (which should already be
> enabled for RM9000 systems).
>
> Check include/asm-mips/war.h and make sure your machine is included
> in the list that define ICACHE_REFILLS_WORKAROUND_WAR. If not, add
> it and test; and fire off a patch. Should fix that issue (especially
> if bash is the only userland process dying while complex g++ compiles
> behave fine)
Thanks, I had added this about a month ago, but the l-users were
reporting that the problem persisted. Now that I've had a chance to
examine it myself, it appears they were confused. There's a first time
for everything.
I will be sending some patches to be sure, once I get all the bugs
worked out. This architecture, a bifurcated RM9000x2 together with a
marvell south bridge, is a searing pain I have to deal with daily.
Cheers,
a
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-19 18:58 ` Andrew Sharp
@ 2007-07-19 22:26 ` Shane McDonald
0 siblings, 0 replies; 26+ messages in thread
From: Shane McDonald @ 2007-07-19 22:26 UTC (permalink / raw)
To: Andrew Sharp; +Cc: Kumba, linux-mips
[-- Attachment #1: Type: text/plain, Size: 2052 bytes --]
I have been having similar problems to Andrew and Kumba on my setup -- a
PMC-Sierra Xiao Hu thin client computer (RM7035C based) running Debian etch
with a PMC 2.6.18 kernel. Running large complicated shell scripts, such as
inetutils' configure script, consistently dies on (usually) an illegal
instruction, but always in a different place. I've just added my machine to
the ICACHE_REFILLS_WORKAROUND_WAR, and that seems to have fixed it.
I also tried adding in Ralf's rm7k_wait_irqoff() patch, but it didn't
improve things, although it didn't appear to break anything, either. Is
there some behaviour I should be looking for to notice if WAIT is / isn't
working on my platform?
Shane
On 7/19/07, Andrew Sharp <andy.sharp@onstor.com> wrote:
>
> On Tue, 17 Jul 2007 10:01:24 -0400 Kumba <kumba@gentoo.org> wrote:
>
> > Andrew Sharp wrote:
> > >
> > > I hungrily await said patch, as I believe this is a problem on
> > > RM9000 processors as well. I'm seeing "random" SIGILLs on user
> > > processes, particularly large complicated shell scripts like
> > > configure on an RM9k platform.
> >
> > This was more or less exactly what I was seeing on an O2 RM7000 setup
> > until the fix for errata #28 was put in (which should already be
> > enabled for RM9000 systems).
> >
> > Check include/asm-mips/war.h and make sure your machine is included
> > in the list that define ICACHE_REFILLS_WORKAROUND_WAR. If not, add
> > it and test; and fire off a patch. Should fix that issue (especially
> > if bash is the only userland process dying while complex g++ compiles
> > behave fine)
>
> Thanks, I had added this about a month ago, but the l-users were
> reporting that the problem persisted. Now that I've had a chance to
> examine it myself, it appears they were confused. There's a first time
> for everything.
>
> I will be sending some patches to be sure, once I get all the bugs
> worked out. This architecture, a bifurcated RM9000x2 together with a
> marvell south bridge, is a searing pain I have to deal with daily.
>
> Cheers,
>
> a
>
>
[-- Attachment #2: Type: text/html, Size: 2612 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-17 12:27 ` Ralf Baechle
@ 2007-09-17 23:04 ` Steve Graham
2007-09-18 8:52 ` Ralf Baechle
2007-09-17 23:20 ` David Daney
1 sibling, 1 reply; 26+ messages in thread
From: Steve Graham @ 2007-09-17 23:04 UTC (permalink / raw)
To: linux-mips
I am having a similar problem where complicated bash scripts on boot randomly
throw SIGILL. I am running on a PMC MSP8510 platform - E9000 core. I have
applied the patch to "war.h" mentioned in this thread and that did greatly
reduce the number of occurences of this problem but has not fixed it. I was
getting at least 2 illegal instructions every boot and now I can boot
without any problems about 90% of the time.
Does the patch you mention below apply to the E9000 core as well?
Ralf Baechle DL5RB wrote:
>
> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
>
> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
> index f599e79..7ee0cb0 100644
> --- a/arch/mips/kernel/cpu-probe.c
> +++ b/arch/mips/kernel/cpu-probe.c
> @@ -75,6 +75,26 @@ static void r4k_wait_irqoff(void)
> local_irq_enable();
> }
>
> +/*
> + * The RM7000 variant has to handle erratum 38. The workaround is to not
> + * have any pending stores when the WAIT instruction is executed.
> + */
> +static void rm7k_wait_irqoff(void)
> +{
> + local_irq_disable();
> + if (!need_resched())
> + __asm__(
> + " .set push \n"
> + " .set mips3 \n"
> + " .set noat \n"
> + " mfc0 $1, $12 \n"
> + " sync \n"
> + " mtc0 $1, $12 \n"
> + " wait \n"
> + " .set pop \n");
> + local_irq_enable();
> +}
> +
> /* The Au1xxx wait is available only if using 32khz counter or
> * external timer source, but specifically not CP0 Counter. */
> int allow_au1k_wait;
> @@ -132,7 +152,6 @@ static inline void check_wait(void)
> case CPU_R4700:
> case CPU_R5000:
> case CPU_NEVADA:
> - case CPU_RM7000:
> case CPU_4KC:
> case CPU_4KEC:
> case CPU_4KSC:
> @@ -142,6 +161,10 @@ static inline void check_wait(void)
> cpu_wait = r4k_wait;
> break;
>
> + case CPU_RM7000:
> + cpu_wait = rm7k_wait_irqoff;
> + break;
> +
> case CPU_24K:
> case CPU_34K:
> cpu_wait = r4k_wait;
>
>
>
--
View this message in context: http://www.nabble.com/O2-RM7000-Issues-tf4008392.html#a12746880
Sent from the linux-mips main mailing list archive at Nabble.com.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-07-17 12:27 ` Ralf Baechle
2007-09-17 23:04 ` Steve Graham
@ 2007-09-17 23:20 ` David Daney
2007-09-18 8:47 ` Ralf Baechle
1 sibling, 1 reply; 26+ messages in thread
From: David Daney @ 2007-09-17 23:20 UTC (permalink / raw)
To: Ralf Baechle; +Cc: Sergey Rogozhkin, Linux MIPS List, Gleb O. Raiko, Kumba
Ralf Baechle wrote:
> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
>
> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
> index f599e79..7ee0cb0 100644
> --- a/arch/mips/kernel/cpu-probe.c
> +++ b/arch/mips/kernel/cpu-probe.c
> @@ -75,6 +75,26 @@ static void r4k_wait_irqoff(void)
> local_irq_enable();
> }
>
> +/*
> + * The RM7000 variant has to handle erratum 38. The workaround is to not
> + * have any pending stores when the WAIT instruction is executed.
> + */
> +static void rm7k_wait_irqoff(void)
> +{
> + local_irq_disable();
> + if (!need_resched())
> + __asm__(
> + " .set push \n"
> + " .set mips3 \n"
> + " .set noat \n"
> + " mfc0 $1, $12 \n"
> + " sync \n"
> + " mtc0 $1, $12 \n"
> + " wait \n"
> + " .set pop \n");
> + local_irq_enable();
> +}
> +
Technically, Shouldn't that __asm__ be volatile?
David Daney
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-09-17 23:20 ` David Daney
@ 2007-09-18 8:47 ` Ralf Baechle
0 siblings, 0 replies; 26+ messages in thread
From: Ralf Baechle @ 2007-09-18 8:47 UTC (permalink / raw)
To: David Daney; +Cc: Sergey Rogozhkin, Linux MIPS List, Gleb O. Raiko, Kumba
On Mon, Sep 17, 2007 at 04:20:28PM -0700, David Daney wrote:
Hi David,
> Ralf Baechle wrote:
>
> >Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
> >
> >diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
> >index f599e79..7ee0cb0 100644
> >--- a/arch/mips/kernel/cpu-probe.c
> >+++ b/arch/mips/kernel/cpu-probe.c
> >@@ -75,6 +75,26 @@ static void r4k_wait_irqoff(void)
> > local_irq_enable();
> > }
> >
> >+/*
> >+ * The RM7000 variant has to handle erratum 38. The workaround is to not
> >+ * have any pending stores when the WAIT instruction is executed.
> >+ */
> >+static void rm7k_wait_irqoff(void)
> >+{
> >+ local_irq_disable();
> >+ if (!need_resched())
> >+ __asm__(
> >+ " .set push \n"
> >+ " .set mips3 \n"
> >+ " .set noat \n"
> >+ " mfc0 $1, $12 \n"
> >+ " sync \n"
> >+ " mtc0 $1, $12 \n"
> >+ " wait \n"
> >+ " .set pop \n");
> >+ local_irq_enable();
> >+}
> >+
>
> Technically, shouldn't that __asm__ be volatile?
Gcc won't delete this asm because it has no return value that is it will
treat it like a volatile asm.
Ralf
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-09-17 23:04 ` Steve Graham
@ 2007-09-18 8:52 ` Ralf Baechle
0 siblings, 0 replies; 26+ messages in thread
From: Ralf Baechle @ 2007-09-18 8:52 UTC (permalink / raw)
To: Steve Graham; +Cc: linux-mips
On Mon, Sep 17, 2007 at 04:04:52PM -0700, Steve Graham wrote:
> I am having a similar problem where complicated bash scripts on boot randomly
> throw SIGILL. I am running on a PMC MSP8510 platform - E9000 core. I have
> applied the patch to "war.h" mentioned in this thread and that did greatly
> reduce the number of occurences of this problem but has not fixed it. I was
> getting at least 2 illegal instructions every boot and now I can boot
> without any problems about 90% of the time.
>
> Does the patch you mention below apply to the E9000 core as well?
Not to my knowledge - but I'm lacking any halfwayrecent errata information
for the RM9000 series, maybe somebody from PMC can jump in?
Ralf
^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: O2 RM7000 Issues
@ 2007-09-21 6:27 ` Sagar Borikar
2007-09-21 13:47 ` Ralf Baechle
0 siblings, 1 reply; 26+ messages in thread
From: Sagar Borikar @ 2007-09-21 6:27 UTC (permalink / raw)
To: 'Ralf Baechle', Steve Graham; +Cc: linux-mips
rm7k_wait_irqoff patch doesn't seem to improve the things for illegal instructions for E9k core.
But many customers have reported that they do get illegal instruction when ICACHE_REFILLS_WORKAROUND_WAR is not enabled.
Thanks
Sagar
-----Original Message-----
From: linux-mips-bounce@linux-mips.org [mailto:linux-mips-bounce@linux-mips.org] On Behalf Of Ralf Baechle
Sent: Tuesday, September 18, 2007 2:23 PM
To: Steve Graham
Cc: linux-mips@linux-mips.org
Subject: Re: O2 RM7000 Issues
On Mon, Sep 17, 2007 at 04:04:52PM -0700, Steve Graham wrote:
> I am having a similar problem where complicated bash scripts on boot
> randomly throw SIGILL. I am running on a PMC MSP8510 platform - E9000
> core. I have applied the patch to "war.h" mentioned in this thread
> and that did greatly reduce the number of occurences of this problem
> but has not fixed it. I was getting at least 2 illegal instructions
> every boot and now I can boot without any problems about 90% of the time.
>
> Does the patch you mention below apply to the E9000 core as well?
Not to my knowledge - but I'm lacking any halfwayrecent errata information for the RM9000 series, maybe somebody from PMC can jump in?
Ralf
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-09-21 6:27 ` Sagar Borikar
@ 2007-09-21 13:47 ` Ralf Baechle
2007-09-22 3:20 ` Steve Graham
0 siblings, 1 reply; 26+ messages in thread
From: Ralf Baechle @ 2007-09-21 13:47 UTC (permalink / raw)
To: Sagar Borikar; +Cc: Steve Graham, linux-mips
On Thu, Sep 20, 2007 at 11:27:18PM -0700, Sagar Borikar wrote:
> rm7k_wait_irqoff patch doesn't seem to improve the things for illegal instructions for E9k core.
> But many customers have reported that they do get illegal instruction when ICACHE_REFILLS_WORKAROUND_WAR is not enabled.
This is due to a very unfortunate design issue in the RM7000 / RM9000. So
it wasn't done by accident but with full intention but I call it a bug
anyway. So this also means all revs of these cores are affected and if
the issue is not being hit for a particular workload and where
ICACHE_REFILLS_WORKAROUND_WAR is disabled then that that's by coincidence
not engineering.
Ralf
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-09-21 13:47 ` Ralf Baechle
@ 2007-09-22 3:20 ` Steve Graham
2007-09-24 11:58 ` Ralf Baechle
0 siblings, 1 reply; 26+ messages in thread
From: Steve Graham @ 2007-09-22 3:20 UTC (permalink / raw)
To: linux-mips
I've just recently fixed this problem on my E9000 core which is a MSP85XX. I
did some digging and found that the problem started to occur in 2.6.16 and
is not there in 2.6.15. I looked into the deltas and found the specific
change that broke me. The file is c-r4k.c.
In the function "local_r4k_flush_cache_sigtramp" there is a conditional:
if (!cpu_icache_snoops_remote_store && scache_size)
protected_writeback_scache_line(addr & ~(sc_lsize - 1));
This additional "scache_size" has been added to this conditional. On my
platform, "scache_size" is set to zero so the
"protected_writeback_scache_line" is now not being called. I took out the
"scache_size" from the conditional and now I boot without any illegal
instructions.
As a side note, I also took out the workaround in "war.h". This workaround
only hid the problem, it didn't fix it. Before I changed the conditional, I
would crash on every boot without the workaround. The workaround reduced
the crashes to maybe 1 in 3. Now, without the workaround, and with the
change in the conditional, I haven't experienced any problems.
I'm sure this change was made for a reason in 2.6.16 so I'm not sure what
the official fix needs to be but that solved my issues on my platform.
Let me know if there is anything anyone wants me to try on my platform to
help come to an official fix for this problem.
Steve...
--
View this message in context: http://www.nabble.com/O2-RM7000-Issues-tf4008392.html#a12833079
Sent from the linux-mips main mailing list archive at Nabble.com.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-09-22 3:20 ` Steve Graham
@ 2007-09-24 11:58 ` Ralf Baechle
2007-09-26 17:06 ` Steve Graham
0 siblings, 1 reply; 26+ messages in thread
From: Ralf Baechle @ 2007-09-24 11:58 UTC (permalink / raw)
To: Steve Graham; +Cc: linux-mips
On Fri, Sep 21, 2007 at 08:20:13PM -0700, Steve Graham wrote:
> I've just recently fixed this problem on my E9000 core which is a MSP85XX. I
> did some digging and found that the problem started to occur in 2.6.16 and
> is not there in 2.6.15. I looked into the deltas and found the specific
> change that broke me. The file is c-r4k.c.
>
> In the function "local_r4k_flush_cache_sigtramp" there is a conditional:
>
> if (!cpu_icache_snoops_remote_store && scache_size)
> protected_writeback_scache_line(addr & ~(sc_lsize - 1));
>
> This additional "scache_size" has been added to this conditional. On my
> platform, "scache_size" is set to zero so the
> "protected_writeback_scache_line" is now not being called. I took out the
> "scache_size" from the conditional and now I boot without any illegal
> instructions.
In this case the question is, why is scache_size 0 on your platform? I
suppose that's because sc-rm7k.c has it's own scache_size so c-r4k.c never
gets to see the right value so maybe the sanest fix would be to move
sc-rm7k.c into c-r4k.c.
> As a side note, I also took out the workaround in "war.h". This workaround
> only hid the problem, it didn't fix it. Before I changed the conditional, I
> would crash on every boot without the workaround. The workaround reduced
> the crashes to maybe 1 in 3. Now, without the workaround, and with the
> change in the conditional, I haven't experienced any problems.
>
> I'm sure this change was made for a reason in 2.6.16 so I'm not sure what
> the official fix needs to be but that solved my issues on my platform.
ICACHE_REFILLS_WORKAROUND_WAR is a separate issue - you need to enable it
for all RM7000 and also unless PMC changed mind also all E9000 cores. So
while I can understand that disabling this for testing a fix for the real
issue you definately should reenable this once you're done.
> Let me know if there is anything anyone wants me to try on my platform to
> help come to an official fix for this problem.
I wrote most of that stuff anyway ...
Ralf
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: O2 RM7000 Issues
2007-09-24 11:58 ` Ralf Baechle
@ 2007-09-26 17:06 ` Steve Graham
0 siblings, 0 replies; 26+ messages in thread
From: Steve Graham @ 2007-09-26 17:06 UTC (permalink / raw)
To: linux-mips
Yes, the reason my platform has scache_size=0 is because the sc-rm7k.c
handles this. As a result, I found a few more issues in the current c-r4k.c
file. In "r4k_blash_scache_page_setup",
"r4k_blash_scache_page_indexed_setup", "r4k_blash_scache_setup", and
"local_r4k_flush_icache_range", there are similar checks on "scache_size"
that need to be removed for my platform. I was still getting the odd "seg
fault" during boots and this seems to have fixed it.
One question I have regarding this is, does anyone have a tool that I can
run to test this cache code and really exercise the cache? The problems are
so random and infrequent that it's difficult to know if the problem is gone
or just more hidden. Ralf, I imagine since you are the one with intimate
knowledge of this code that you may have developed a tool or have used a
tool to really test this code.
--
View this message in context: http://www.nabble.com/O2-RM7000-Issues-tf4008392.html#a12905373
Sent from the linux-mips main mailing list archive at Nabble.com.
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2007-09-26 17:09 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-01 16:57 O2 RM7000 Issues Kumba
2007-07-01 22:07 ` freshy98
2007-07-02 13:08 ` sknauert
2007-07-02 13:08 ` sknauert
2007-07-04 15:27 ` Ralf Baechle
2007-07-04 19:22 ` Ralf Baechle
2007-07-16 11:53 ` Sergey Rogozhkin
2007-07-16 12:33 ` Ralf Baechle
2007-07-16 17:38 ` Andrew Sharp
2007-07-17 14:01 ` Kumba
2007-07-19 18:58 ` Andrew Sharp
2007-07-19 22:26 ` Shane McDonald
2007-07-17 7:54 ` Gleb O. Raiko
2007-07-17 9:04 ` Sergey Rogozhkin
2007-07-17 10:14 ` Ralf Baechle
2007-07-17 12:27 ` Ralf Baechle
2007-09-17 23:04 ` Steve Graham
2007-09-18 8:52 ` Ralf Baechle
2007-09-17 23:20 ` David Daney
2007-09-18 8:47 ` Ralf Baechle
2007-07-02 14:34 ` Maciej W. Rozycki
2007-09-21 6:27 ` Sagar Borikar
2007-09-21 13:47 ` Ralf Baechle
2007-09-22 3:20 ` Steve Graham
2007-09-24 11:58 ` Ralf Baechle
2007-09-26 17:06 ` Steve Graham
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.