linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* VIA Ezra CentaurHauls
@ 2003-06-18 14:18 Guennadi Liakhovetski
  2003-06-18 14:42 ` P
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Guennadi Liakhovetski @ 2003-06-18 14:18 UTC (permalink / raw)
  To: linux-kernel, debian-glibc

Hello

We have a platform with the above processor, and we happened to have 2
revisions thereof: stepping 8 and 10. With stepping 8 we are getting
"random" application crashes (segfaults), sometimes with kernel-Oopses.
The distribution is Debian-Woody. I saw some messages on the Debian
mailing list about problems with exactly this CPU, however, it was not
related to different revisions (stepping), perhaps, the author only had
 / tried stepping 8. The fix was to upgrade libc. I've done this (to
version libc6_2.3.1-16, but it didn't help. Any ideas?

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: VIA Ezra CentaurHauls
  2003-06-18 14:18 VIA Ezra CentaurHauls Guennadi Liakhovetski
@ 2003-06-18 14:42 ` P
  2003-06-18 16:15   ` Guennadi Liakhovetski
  2003-06-18 17:17   ` Guennadi Liakhovetski
  2003-06-20 11:11 ` Daniel Egger
  2003-06-27  6:18 ` Alex Belits
  2 siblings, 2 replies; 8+ messages in thread
From: P @ 2003-06-18 14:42 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: linux-kernel, debian-glibc

Guennadi Liakhovetski wrote:
> Hello
> 
> We have a platform with the above processor, and we happened to have 2
> revisions thereof: stepping 8 and 10. With stepping 8 we are getting
> "random" application crashes (segfaults), sometimes with kernel-Oopses.
> The distribution is Debian-Woody.

Interesting, so stepping 10 is OK?

> I saw some messages on the Debian
> mailing list about problems with exactly this CPU, however, it was not
> related to different revisions (stepping), perhaps, the author only had
>  / tried stepping 8. The fix was to upgrade libc.

so is it a glibc bug or CPU bug?

> I've done this (to
> version libc6_2.3.1-16, but it didn't help. Any ideas?

You could search for CMOV instructions on your system,
which could cause wierdness, like:

find / -perm +111 -type f |
while read bin; do
     objdump --disassemble $bin 2>/dev/null |
     grep -q cmov && echo "$bin has cmov"
done

Note C3 Nehemiah do have CMOV (but no 3dnow).

Pádraig.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: VIA Ezra CentaurHauls
  2003-06-18 14:42 ` P
@ 2003-06-18 16:15   ` Guennadi Liakhovetski
  2003-06-18 17:17   ` Guennadi Liakhovetski
  1 sibling, 0 replies; 8+ messages in thread
From: Guennadi Liakhovetski @ 2003-06-18 16:15 UTC (permalink / raw)
  To: P; +Cc: linux-kernel, debian-glibc

On Wed, 18 Jun 2003 P@draigBrady.com wrote:
> Guennadi Liakhovetski wrote:
> > We have a platform with the above processor, and we happened to have 2
> > revisions thereof: stepping 8 and 10. With stepping 8 we are getting
> > "random" application crashes (segfaults), sometimes with kernel-Oopses.
> > The distribution is Debian-Woody.
>
> Interesting, so stepping 10 is OK?

Looks so.

> > I saw some messages on the Debian
> > mailing list about problems with exactly this CPU, however, it was not
> > related to different revisions (stepping), perhaps, the author only had
> >  / tried stepping 8. The fix was to upgrade libc.
>
> so is it a glibc bug or CPU bug?

Good question...

> > I've done this (to version libc6_2.3.1-16, but it didn't help. Any ideas?
>
> You could search for CMOV instructions on your system,
> which could cause wierdness, like:
>
> find / -perm +111 -type f |
> while read bin; do
>      objdump --disassemble $bin 2>/dev/null |
>      grep -q cmov && echo "$bin has cmov"
> done

Yeah, will try. Plus libraries...

> Note C3 Nehemiah do have CMOV (but no 3dnow).

Meanwhile, I've written a micro-program with an assembly-inline with cmov.
I have no idea about the ix86 assembly, so, I've just done

int main(void)
{
	int x=0,y=1;

	__asm__(
	"testl	%0, %0\n"
"	cmovnz	%0, %1":"=r" (x) :"r" (y));
	exit(x);
}

On "10" the exit code is 1, which is correct (?), on "8" the exit code is
76. Funny enough, strace on "8" produces also
semget(2, 1074927648, 0)                = -1 ENOSYS (Function not implemented)
but this, most probably, comes from the new libc6, that I installed there.

Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: VIA Ezra CentaurHauls
  2003-06-18 14:42 ` P
  2003-06-18 16:15   ` Guennadi Liakhovetski
@ 2003-06-18 17:17   ` Guennadi Liakhovetski
  2003-06-20 16:57     ` GOTO Masanori
  1 sibling, 1 reply; 8+ messages in thread
From: Guennadi Liakhovetski @ 2003-06-18 17:17 UTC (permalink / raw)
  To: P; +Cc: linux-kernel, debian-glibc

> find / -perm +111 -type f |
> while read bin; do
>      objdump --disassemble $bin 2>/dev/null |
>      grep -q cmov && echo "$bin has cmov"
> done

So, using the above for libraries I found 3 libraries on the system, that
use cmov:

libldap.so.2.0.15
libcrypto.so.0.9.6
libqt-mt.so.3.0.5

So, the libraries have nothing to do with the kernel, the Debian guys
might take a notice of them (not glibc, but still...). But what I do find
interesting and noteworthy - is that this problem is specific only to some
revisions of this CPU, which might be of interest to all.

Thanks for the tip with the script!
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: VIA Ezra CentaurHauls
  2003-06-18 14:18 VIA Ezra CentaurHauls Guennadi Liakhovetski
  2003-06-18 14:42 ` P
@ 2003-06-20 11:11 ` Daniel Egger
  2003-06-27  6:18 ` Alex Belits
  2 siblings, 0 replies; 8+ messages in thread
From: Daniel Egger @ 2003-06-20 11:11 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 397 bytes --]

Am Mit, 2003-06-18 um 16.18 schrieb Guennadi Liakhovetski:

>  / tried stepping 8. The fix was to upgrade libc. I've done this (to
> version libc6_2.3.1-16, but it didn't help. Any ideas?

IIRC there were some versions of glibc in Debian which activated the 686
and higher optimized versions for the cmov-less Ezra. A workaround is to
(re)move /usr/lib/686.

-- 
Servus,
       Daniel

[-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: VIA Ezra CentaurHauls
  2003-06-18 17:17   ` Guennadi Liakhovetski
@ 2003-06-20 16:57     ` GOTO Masanori
  0 siblings, 0 replies; 8+ messages in thread
From: GOTO Masanori @ 2003-06-20 16:57 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: P, linux-kernel, debian-glibc

At Wed, 18 Jun 2003 19:17:11 +0200 (CEST),
Guennadi Liakhovetski wrote:
> So, using the above for libraries I found 3 libraries on the system, that
> use cmov:
> 
> libldap.so.2.0.15
> libcrypto.so.0.9.6
> libqt-mt.so.3.0.5
> 
> So, the libraries have nothing to do with the kernel, the Debian guys
> might take a notice of them (not glibc, but still...). But what I do find
> interesting and noteworthy - is that this problem is specific only to some
> revisions of this CPU, which might be of interest to all.

I think it's not debian glibc problem.  If you hit cmov problem, then
your application says "illegal instruction".  At least debian glibc
2.3.1-16 has trick not to load cmov-contained dynamic libraries.  All
libraries you pointed out are put under /.../lib/.../cmov/* in debian
sid.  I guess it's your CPU or thermal issue.

Regards,
-- gotom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: VIA Ezra CentaurHauls
  2003-06-18 14:18 VIA Ezra CentaurHauls Guennadi Liakhovetski
  2003-06-18 14:42 ` P
  2003-06-20 11:11 ` Daniel Egger
@ 2003-06-27  6:18 ` Alex Belits
  2 siblings, 0 replies; 8+ messages in thread
From: Alex Belits @ 2003-06-27  6:18 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: linux-kernel, debian-glibc

On Wed, 18 Jun 2003, Guennadi Liakhovetski wrote:

> Hello
>
> We have a platform with the above processor, and we happened to have 2
> revisions thereof: stepping 8 and 10. With stepping 8 we are getting
> "random" application crashes (segfaults), sometimes with kernel-Oopses.
> The distribution is Debian-Woody. I saw some messages on the Debian
> mailing list about problems with exactly this CPU, however, it was not
> related to different revisions (stepping), perhaps, the author only had
>  / tried stepping 8. The fix was to upgrade libc. I've done this (to
> version libc6_2.3.1-16, but it didn't help. Any ideas?

  I have two EPIA 800 motherboards with different CPUs:

1. The board has "Revision B" printed on it.
CPU is:
processor       : 0
vendor_id       : CentaurHauls
cpu family      : 6
model           : 7
model name      : VIA Ezra
stepping        : 8
cpu MHz         : 800.047
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu de tsc msr cx8 mtrr pge mmx 3dnow
bogomips        : 1595.80

2. The board is "Revision D", but otherwise looks exactly the same.
CPU is:
processor       : 0
vendor_id       : CentaurHauls
cpu family      : 6
model           : 7
model name      : VIA Samuel 2
stepping        : 3
cpu MHz         : 800.047
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu de tsc msr cx8 mtrr pge mmx 3dnow
bogomips        : 1595.80

  Both boards were used in exactly the same environment, as a replacement
for a failed FV24 motherboard in SV24 box that had Debian Woody already
installed. Obviously, to make it boot I had to recompile the kernel, and
I had to recompile mplayer that I have previously built for i686
(everything was done with gcc 2.95.4).

  First board worked perfectly until I have started (as a regular user)
RealPlayer 8. After that the box became unstable, and other applications
(mozilla, mplayer, gcc) started to crash randomly with SEGV. However I
have not seen a kernel crash.

  I have tried different X drivers (trident from 4.3.0, trident from
current, vesa), different memory, one or two sticks, different power
supplies (including a known-good ATX power supply just in case), different
CMOS settings (including "safe" default, all caches off, etc.), and the
result was the same -- no problems without RealPlayer, application crashes
after it started. TRplayer, that uses RealPlayer's libraries, has the same
effect, however mplayer (tested only with non-Realmedia sources) worked
perfectly, and even shown an impressive performance by playing SVCD in
vidix mode with no dropped frames (as long as I was not doing anything
else at the same time). When I have started RealPlayer, programs started
to randomly crash, and whatever the problem was, it was not confined to
the userid that RealPlayer was running as -- mplayer and gcc were running
as root when they crashed. Usually things crashed with segmentation fault,
however once RealPlayer crashed with floating point exception. Puzzled, I
have ran memtest86, and all memory that ever was in that box passed all
basic tests (I had no patience for anything more than that).

  When I have installed the second motherboard (obviously, with no other
modifications), all problems disappeared. I have also checked the
RealPlayer binaries, and objdump shown no cmov.

  I don't know what exactly happens, but it looks for me very strange that
a single piece of code causes all this havoc, and that over all that time
apparently no SIGILLs happened. I can only speculate that "something"
leaves some piece of state in CPU (or maybe in cache) that survives a
context switch, and messes up the state of other processes (registers or
maybe memory). And whatever it is, "VIA Samuel 2, stepping 3" does not
have it.

-- 
Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: VIA Ezra CentaurHauls
@ 2003-06-27  9:18 Miklos Szeredi
  0 siblings, 0 replies; 8+ messages in thread
From: Miklos Szeredi @ 2003-06-27  9:18 UTC (permalink / raw)
  To: Alex Belits; +Cc: linux-kernel


>   First board worked perfectly until I have started (as a regular user)
> RealPlayer 8. After that the box became unstable, and other applications
> (mozilla, mplayer, gcc) started to crash randomly with SEGV. However I
> have not seen a kernel crash.

I had very similar experiences with the same CPU: VIA Ezra Stepping 8
(see: http://marc.theaimsgroup.com/?t=104262312700003&r=1&w=2).  And
that was not with an EPIA, but an ASUS CUV4X-C MB.

I had the processor replaced, because I narrowed the problem down to
that, but it didn't help.  

My feeling is ever stronger as I see these posts, that it is really
this modell that is buggy.  If that is true, then VIA should either
replace these CPUs with a non-buggy one, or find a workaround for
whatever operating systems are affected.

BTW, I could reliably cure this broblem by turning off the L2 cache in
BIOS.  Maybe it is some memory interaction problem, but I'm not an
expert on this subject.

Miklos

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-06-27  9:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-18 14:18 VIA Ezra CentaurHauls Guennadi Liakhovetski
2003-06-18 14:42 ` P
2003-06-18 16:15   ` Guennadi Liakhovetski
2003-06-18 17:17   ` Guennadi Liakhovetski
2003-06-20 16:57     ` GOTO Masanori
2003-06-20 11:11 ` Daniel Egger
2003-06-27  6:18 ` Alex Belits
  -- strict thread matches above, loose matches on Subject: below --
2003-06-27  9:18 Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).