Duron kernel crash (i686 works)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Duron kernel crash (i686 works)
@ 2001-09-11  1:11 Roberto Jung Drebes
  2001-09-11  3:30 ` Roberto Jung Drebes
  2001-09-11 14:47 ` Alan Cox
  0 siblings, 2 replies; 13+ messages in thread
From: Roberto Jung Drebes @ 2001-09-11  1:11 UTC (permalink / raw)
  To: linux-kernel

Hi,

Today I updated the BIOS of my motherboard, a ABIT KT7A (VIA Apollo KT133A
chipset). The kernel I had (2.4.9) started crashing on boot with an
invalid page fault, usually right after starting init. I tryed a i686
kernel and noticed it works OK, so I recompiled my crashy kernel only
switching the processor type and it also worked. changed it back to
Athlon/K7/Duron and it starts crashing.

Anyone else experiencing this?

TIA,

--
Roberto Jung Drebes <drebes@inf.ufrgs.br>
Porto Alegre, RS - Brasil
http://www.inf.ufrgs.br/~drebes/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Duron kernel crash (i686 works)
  2001-09-11  1:11 Duron kernel crash (i686 works) Roberto Jung Drebes
@ 2001-09-11  3:30 ` Roberto Jung Drebes
  2001-09-11 14:47 ` Alan Cox
  1 sibling, 0 replies; 13+ messages in thread
From: Roberto Jung Drebes @ 2001-09-11  3:30 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1468 bytes --]

On Mon, 10 Sep 2001, Roberto Jung Drebes wrote:

> 
> Hi,
> 
> Today I updated the BIOS of my motherboard, a ABIT KT7A (VIA Apollo KT133A
> chipset). The kernel I had (2.4.9) started crashing on boot with an
> invalid page fault, usually right after starting init. I tryed a i686
> kernel and noticed it works OK, so I recompiled my crashy kernel only
> switching the processor type and it also worked. changed it back to
> Athlon/K7/Duron and it starts crashing.
> 
> Anyone else experiencing this?

I captured the log via serial console, and the problem does not always
happens at exactly the same time. The time I did the log, it happened
right after init would mount /proc. So I passes init=/bin/sh to the kernel
and initted "manually". At first things seemed to be OK. I then mounted
/proc, no probs, swapon -a, ok, then started just playing around with ls,
etc. Soon the loader would complain that there was no memory to be alloced
to run the programs. After a while, the kernel would complain about trying
to free a page that was not malloced or something like that. It seems to
be related to the VM subsys.

Anyway here is the trace done on the captured log.

Latter, I found on the archives that some people solved instability
problems on the KT133A by disabling the 3DNOW. I changed it to n in
arch/i386/config.in, but it seems to have no effect.

TIA,

--
Roberto Jung Drebes <drebes@inf.ufrgs.br>
Porto Alegre, RS - Brasil
http://www.inf.ufrgs.br/~drebes/

[-- Attachment #2: Type: TEXT/PLAIN, Size: 16651 bytes --]

ksymoops 2.3.4 on i686 2.4.9.  Options used
     -v /usr/src/linux/vmlinux (specified)
     -K (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.4.9/ (specified)
     -m /usr/src/linux/System.map (specified)

No modules in ksyms, skipping objects
No ksyms, skipping lsmod
invalid operand: 0000
CPU:    0
EIP:    0010:[<c0129e5a>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: 00000020   ebx: 00000001   ecx: 00000001   edx: c0246ea8
esi: c11012dc   edi: c02481f8   ebp: 00000000   esp: c3ccfe4c
ds: 0018   es: 0018   ss: 0018
Process logger (pid: 33, stackpage=c3ccf000)
Stack: c020a34c c020a440 000000cc c02481d4 c0248360 00000000 c3c8a134 00002c83 
       00000282 c0248204 00000000 c02481d4 c0129fb3 000000d2 c116f500 c116dc40 
       c3c8a134 00000001 c024835c 000000d2 c0129f36 00000000 c01208f2 00000000 
Call Trace: [<c0129fb3>] [<c0129f36>] [<c01208f2>] [<c012098f>] [<c0120a9b>] 
   [<c0110980>] [<c0110ae3>] [<c0110980>] [<c01234c3>] [<c0123400>] [<c012f593>] 
   [<c0106f48>] 
Code: 0f 0b 83 c4 0c 90 89 f0 eb 1c 47 83 44 24 18 0c 83 ff 09 0f 

>>EIP; c0129e5a <rmqueue+23a/270>   <=====
Trace; c0129fb3 <__alloc_pages+73/280>
Trace; c0129f36 <_alloc_pages+16/20>
Trace; c01208f2 <do_anonymous_page+32/a0>
Trace; c012098f <do_no_page+2f/e0>
Trace; c0120a9b <handle_mm_fault+5b/c0>
Trace; c0110980 <do_page_fault+0/460>
Trace; c0110ae3 <do_page_fault+163/460>
Trace; c0110980 <do_page_fault+0/460>
Trace; c01234c3 <generic_file_read+63/80>
Trace; c0123400 <file_read_actor+0/60>
Trace; c012f593 <sys_read+c3/d0>
Trace; c0106f48 <error_code+34/3c>
Code;  c0129e5a <rmqueue+23a/270>
00000000 <_EIP>:
Code;  c0129e5a <rmqueue+23a/270>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0129e5c <rmqueue+23c/270>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c0129e5f <rmqueue+23f/270>
   5:   90                        nop    
Code;  c0129e60 <rmqueue+240/270>
   6:   89 f0                     mov    %esi,%eax
Code;  c0129e62 <rmqueue+242/270>
   8:   eb 1c                     jmp    26 <_EIP+0x26> c0129e80 <rmqueue+260/270>
Code;  c0129e64 <rmqueue+244/270>
   a:   47                        inc    %edi
Code;  c0129e65 <rmqueue+245/270>
   b:   83 44 24 18 0c            addl   $0xc,0x18(%esp,1)
Code;  c0129e6a <rmqueue+24a/270>
  10:   83 ff 09                  cmp    $0x9,%edi
Code;  c0129e6d <rmqueue+24d/270>
  13:   0f 00 00                  sldt   (%eax)

invalid operand: 0000
CPU:    0
EIP:    0010:[<c0129942>]
EFLAGS: 00010286
eax: 0000001f   ebx: c11012dc   ecx: 00000001   edx: c0246ea8
esi: c11012dc   edi: 00000001   ebp: 00000000   esp: c3c77b4c
ds: 0018   es: 0018   ss: 0018
Process minilogd (pid: 31, stackpage=c3c77000)
Stack: c020a34c c020a440 00000049 c1105478 c11012dc 00000001 00000001 c11012dc 
       c01faf19 c3d7a000 c3c83000 c01239b3 c012a22a c01239c6 c0247eb0 c116dd00 
       c116f440 00000001 00000002 c3fd1ff4 c3c310e4 c3c31040 c3cbe3c0 c01209b2 
Call Trace: [<c01faf19>] [<c01239b3>] [<c012a22a>] [<c01239c6>] [<c01209b2>] 
   [<c0120a9b>] [<c0110980>] [<c0110ae3>] [<c0110980>] [<c01233f2>] [<c01218d6>] 
   [<c0106f48>] [<c0106f48>] [<c0106f48>] [<c01faace>] [<c01462ec>] [<c01472b8>] 
   [<c01469a0>] [<c0137f57>] [<c01381dc>] [<c01381f3>] [<c0105b3f>] [<c0106e57>] 
Code: 0f 0b 83 c4 0c 83 7b 08 00 74 16 6a 4b 68 40 a4 20 c0 68 4c 

>>EIP; c0129942 <__free_pages_ok+22/300>   <=====
Trace; c01faf19 <mmx_copy_page+29/30>
Trace; c01239b3 <filemap_nopage+103/3e0>
Trace; c012a22a <__free_pages+1a/20>
Trace; c01239c6 <filemap_nopage+116/3e0>
Trace; c01209b2 <do_no_page+52/e0>
Trace; c0120a9b <handle_mm_fault+5b/c0>
Trace; c0110980 <do_page_fault+0/460>
Trace; c0110ae3 <do_page_fault+163/460>
Trace; c0110980 <do_page_fault+0/460>
Trace; c01233f2 <do_generic_file_read+502/510>
Trace; c01218d6 <do_munmap+56/250>
Trace; c0106f48 <error_code+34/3c>
Trace; c0106f48 <error_code+34/3c>
Trace; c0106f48 <error_code+34/3c>
Trace; c01faace <clear_user+2e/40>
Trace; c01462ec <padzero+1c/20>
Trace; c01472b8 <load_elf_binary+918/a80>
Trace; c01469a0 <load_elf_binary+0/a80>
Trace; c0137f57 <search_binary_handler+67/170>
Trace; c01381dc <do_execve+17c/1f0>
Trace; c01381f3 <do_execve+193/1f0>
Trace; c0105b3f <sys_execve+2f/60>
Trace; c0106e57 <system_call+33/38>
Code;  c0129942 <__free_pages_ok+22/300>
00000000 <_EIP>:
Code;  c0129942 <__free_pages_ok+22/300>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0129944 <__free_pages_ok+24/300>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c0129947 <__free_pages_ok+27/300>
   5:   83 7b 08 00               cmpl   $0x0,0x8(%ebx)
Code;  c012994b <__free_pages_ok+2b/300>
   9:   74 16                     je     21 <_EIP+0x21> c0129963 <__free_pages_ok+43/300>
Code;  c012994d <__free_pages_ok+2d/300>
   b:   6a 4b                     push   $0x4b
Code;  c012994f <__free_pages_ok+2f/300>
   d:   68 40 a4 20 c0            push   $0xc020a440
Code;  c0129954 <__free_pages_ok+34/300>
  12:   68 4c 00 00 00            push   $0x4c

invalid operand: 0000
CPU:    0
EIP:    0010:[<c0129e5a>]
EFLAGS: 00010282
eax: 00000020   ebx: 00000001   ecx: 00000001   edx: c0246ea8
esi: c11012dc   edi: c02481f8   ebp: 00000000   esp: c3ed9e74
ds: 0018   es: 0018   ss: 0018
Process rc (pid: 44, stackpage=c3ed9000)
Stack: c020a34c c020a440 000000cc c02481d4 c0248360 00000000 03e63065 00002c83 
       00000286 c0248204 00000000 c02481d4 c0129fb3 000000d2 c110925c ffffffff 
       03e63065 00000001 c024835c 000000d2 c0129f36 080c4f04 c01203b7 080c4f04 
Call Trace: [<c0129fb3>] [<c0129f36>] [<c01203b7>] [<c0120acd>] [<c0110980>] 
   [<c0110ae3>] [<c0110980>] [<c01faca3>] [<c01058a8>] [<c01132f7>] [<c0105ab4>] 
   [<c0106f48>] 
Code: 0f 0b 83 c4 0c 90 89 f0 eb 1c 47 83 44 24 18 0c 83 ff 09 0f 

>>EIP; c0129e5a <rmqueue+23a/270>   <=====
Trace; c0129fb3 <__alloc_pages+73/280>
Trace; c0129f36 <_alloc_pages+16/20>
Trace; c01203b7 <do_wp_page+157/250>
Trace; c0120acd <handle_mm_fault+8d/c0>
Trace; c0110980 <do_page_fault+0/460>
Trace; c0110ae3 <do_page_fault+163/460>
Trace; c0110980 <do_page_fault+0/460>
Trace; c01faca3 <_mmx_memcpy+53/100>
Trace; c01058a8 <copy_thread+88/a0>
Trace; c01132f7 <do_fork+5f7/6b0>
Trace; c0105ab4 <sys_fork+14/20>
Trace; c0106f48 <error_code+34/3c>
Code;  c0129e5a <rmqueue+23a/270>
00000000 <_EIP>:
Code;  c0129e5a <rmqueue+23a/270>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0129e5c <rmqueue+23c/270>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c0129e5f <rmqueue+23f/270>
   5:   90                        nop    
Code;  c0129e60 <rmqueue+240/270>
   6:   89 f0                     mov    %esi,%eax
Code;  c0129e62 <rmqueue+242/270>
   8:   eb 1c                     jmp    26 <_EIP+0x26> c0129e80 <rmqueue+260/270>
Code;  c0129e64 <rmqueue+244/270>
   a:   47                        inc    %edi
Code;  c0129e65 <rmqueue+245/270>
   b:   83 44 24 18 0c            addl   $0xc,0x18(%esp,1)
Code;  c0129e6a <rmqueue+24a/270>
  10:   83 ff 09                  cmp    $0x9,%edi
Code;  c0129e6d <rmqueue+24d/270>
  13:   0f 00 00                  sldt   (%eax)

invalid operand: 0000
CPU:    0
EIP:    0010:[<c0129e5a>]
EFLAGS: 00010282
eax: 00000020   ebx: 00000001   ecx: 00000001   edx: c0246ea8
esi: c1101298   edi: c02481f8   ebp: 00000000   esp: c3d35e74
ds: 0018   es: 0018   ss: 0018
Process rc (pid: 45, stackpage=c3d35000)
Stack: c020a34c c020a440 000000cc c02481d4 c0248360 00000000 03f5e065 00002c82 
       00000286 c02481f8 00000000 c02481d4 c0129fb3 000000d2 c110d508 ffffffff 
       03f5e065 00000001 c024835c 000000d2 c0129f36 40016b68 c01203b7 40016b68 
Call Trace: [<c0129fb3>] [<c0129f36>] [<c01203b7>] [<c0120acd>] [<c0110980>] 
   [<c0110ae3>] [<c0110980>] [<c01391da>] [<c013a3dc>] [<c012ee3d>] [<c012ed72>] 
   [<c012f16c>] [<c012f1c3>] [<c0106f48>] 
Code: 0f 0b 83 c4 0c 90 89 f0 eb 1c 47 83 44 24 18 0c 83 ff 09 0f 

>>EIP; c0129e5a <rmqueue+23a/270>   <=====
Trace; c0129fb3 <__alloc_pages+73/280>
Trace; c0129f36 <_alloc_pages+16/20>
Trace; c01203b7 <do_wp_page+157/250>
Trace; c0120acd <handle_mm_fault+8d/c0>
Trace; c0110980 <do_page_fault+0/460>
Trace; c0110ae3 <do_page_fault+163/460>
Trace; c0110980 <do_page_fault+0/460>
Trace; c01391da <permission+2a/30>
Trace; c013a3dc <open_namei+32c/5b0>
Trace; c012ee3d <dentry_open+bd/140>
Trace; c012ed72 <filp_open+52/60>
Trace; c012f16c <filp_close+5c/70>
Trace; c012f1c3 <sys_close+43/60>
Trace; c0106f48 <error_code+34/3c>
Code;  c0129e5a <rmqueue+23a/270>
00000000 <_EIP>:
Code;  c0129e5a <rmqueue+23a/270>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0129e5c <rmqueue+23c/270>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c0129e5f <rmqueue+23f/270>
   5:   90                        nop    
Code;  c0129e60 <rmqueue+240/270>
   6:   89 f0                     mov    %esi,%eax
Code;  c0129e62 <rmqueue+242/270>
   8:   eb 1c                     jmp    26 <_EIP+0x26> c0129e80 <rmqueue+260/270>
Code;  c0129e64 <rmqueue+244/270>
   a:   47                        inc    %edi
Code;  c0129e65 <rmqueue+245/270>
   b:   83 44 24 18 0c            addl   $0xc,0x18(%esp,1)
Code;  c0129e6a <rmqueue+24a/270>
  10:   83 ff 09                  cmp    $0x9,%edi
Code;  c0129e6d <rmqueue+24d/270>
  13:   0f 00 00                  sldt   (%eax)

invalid operand: 0000
CPU:    0
EIP:    0010:[<c0129942>]
EFLAGS: 00010286
eax: 0000001f   ebx: c1101298   ecx: 00000001   edx: c0246ea8
esi: c1101298   edi: c3d4b084   ebp: 00000000   esp: c3d35cd8
ds: 0018   es: 0018   ss: 0018
Process rc (pid: 45, stackpage=c3d35000)
Stack: c020a34c c020a440 00000049 c1101298 c1101298 c3d4b084 c3d4a16c c02a1640 
       00000000 c0299240 00000000 00000046 c012a22a c012a76a c1101298 00000069 
       c011f739 c1101298 c116ddc0 c116f140 08048000 0007c000 00448000 08448000 
Call Trace: [<c012a22a>] [<c012a76a>] [<c011f739>] [<c0121de5>] [<c0112606>] 
   [<c01162dd>] [<c01075d0>] [<c0107392>] [<c010764f>] [<c0129e5a>] [<c011ac0f>] 
   [<c011773a>] [<c0117659>] [<c011743a>] [<c01082ed>] [<c0106ee0>] [<c0106f48>] 
   [<c0129e5a>] [<c0129fb3>] [<c0129f36>] [<c01203b7>] [<c0120acd>] [<c0110980>] 
   [<c0110ae3>] [<c0110980>] [<c01391da>] [<c013a3dc>] [<c012ee3d>] [<c012ed72>] 
   [<c012f16c>] [<c012f1c3>] [<c0106f48>] 
Code: 0f 0b 83 c4 0c 83 7b 08 00 74 16 6a 4b 68 40 a4 20 c0 68 4c 

>>EIP; c0129942 <__free_pages_ok+22/300>   <=====
Trace; c012a22a <__free_pages+1a/20>
Trace; c012a76a <free_page_and_swap_cache+ba/c0>
Trace; c011f739 <zap_page_range+1b9/250>
Trace; c0121de5 <exit_mmap+b5/120>
Trace; c0112606 <mmput+26/50>
Trace; c01162dd <do_exit+9d/200>
Trace; c01075d0 <do_invalid_op+0/90>
Trace; c0107392 <die+42/50>
Trace; c010764f <do_invalid_op+7f/90>
Trace; c0129e5a <rmqueue+23a/270>
Trace; c011ac0f <do_timer+3f/70>
Trace; c011773a <bh_action+1a/50>
Trace; c0117659 <tasklet_hi_action+59/80>
Trace; c011743a <do_softirq+5a/b0>
Trace; c01082ed <do_IRQ+9d/b0>
Trace; c0106ee0 <ret_from_intr+0/7>
Trace; c0106f48 <error_code+34/3c>
Trace; c0129e5a <rmqueue+23a/270>
Trace; c0129fb3 <__alloc_pages+73/280>
Trace; c0129f36 <_alloc_pages+16/20>
Trace; c01203b7 <do_wp_page+157/250>
Trace; c0120acd <handle_mm_fault+8d/c0>
Trace; c0110980 <do_page_fault+0/460>
Trace; c0110ae3 <do_page_fault+163/460>
Trace; c0110980 <do_page_fault+0/460>
Trace; c01391da <permission+2a/30>
Trace; c013a3dc <open_namei+32c/5b0>
Trace; c012ee3d <dentry_open+bd/140>
Trace; c012ed72 <filp_open+52/60>
Trace; c012f16c <filp_close+5c/70>
Trace; c012f1c3 <sys_close+43/60>
Trace; c0106f48 <error_code+34/3c>
Code;  c0129942 <__free_pages_ok+22/300>
00000000 <_EIP>:
Code;  c0129942 <__free_pages_ok+22/300>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0129944 <__free_pages_ok+24/300>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c0129947 <__free_pages_ok+27/300>
   5:   83 7b 08 00               cmpl   $0x0,0x8(%ebx)
Code;  c012994b <__free_pages_ok+2b/300>
   9:   74 16                     je     21 <_EIP+0x21> c0129963 <__free_pages_ok+43/300>
Code;  c012994d <__free_pages_ok+2d/300>
   b:   6a 4b                     push   $0x4b
Code;  c012994f <__free_pages_ok+2f/300>
   d:   68 40 a4 20 c0            push   $0xc020a440
Code;  c0129954 <__free_pages_ok+34/300>
  12:   68 4c 00 00 00            push   $0x4c

invalid operand: 0000
CPU:    0
EIP:    0010:[<c0129e5a>]
EFLAGS: 00010282
eax: 00000020   ebx: 00000002   ecx: 00000001   edx: c0246ea8
esi: c1104dd4   edi: c0248204   ebp: 00000001   esp: c3eeff20
ds: 0018   es: 0018   ss: 0018
Process rc (pid: 41, stackpage=c3eef000)
Stack: c020a34c c020a440 000000cc c02481d4 c0248340 00000001 c3eeffbc 00002d61 
       00000296 c0248204 00000001 c02481d4 c0129fb3 000000f0 bffff28c 00000000 
       c3eeffbc 00000000 c024833c 000000f0 c0129f36 c3eee000 c012a1ca c0112d40 
Call Trace: [<c0129fb3>] [<c0129f36>] [<c012a1ca>] [<c0112d40>] [<c0105ab4>] 
   [<c0106e57>] 
Code: 0f 0b 83 c4 0c 90 89 f0 eb 1c 47 83 44 24 18 0c 83 ff 09 0f 

>>EIP; c0129e5a <rmqueue+23a/270>   <=====
Trace; c0129fb3 <__alloc_pages+73/280>
Trace; c0129f36 <_alloc_pages+16/20>
Trace; c012a1ca <__get_free_pages+a/20>
Trace; c0112d40 <do_fork+40/6b0>
Trace; c0105ab4 <sys_fork+14/20>
Trace; c0106e57 <system_call+33/38>
Code;  c0129e5a <rmqueue+23a/270>
00000000 <_EIP>:
Code;  c0129e5a <rmqueue+23a/270>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0129e5c <rmqueue+23c/270>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c0129e5f <rmqueue+23f/270>
   5:   90                        nop    
Code;  c0129e60 <rmqueue+240/270>
   6:   89 f0                     mov    %esi,%eax
Code;  c0129e62 <rmqueue+242/270>
   8:   eb 1c                     jmp    26 <_EIP+0x26> c0129e80 <rmqueue+260/270>
Code;  c0129e64 <rmqueue+244/270>
   a:   47                        inc    %edi
Code;  c0129e65 <rmqueue+245/270>
   b:   83 44 24 18 0c            addl   $0xc,0x18(%esp,1)
Code;  c0129e6a <rmqueue+24a/270>
  10:   83 ff 09                  cmp    $0x9,%edi
Code;  c0129e6d <rmqueue+24d/270>
  13:   0f 00 00                  sldt   (%eax)

invalid operand: 0000
CPU:    0
EIP:    0010:[<c0129e5a>]
EFLAGS: 00010282
eax: 00000020   ebx: 00000001   ecx: 00000001   edx: c0246ea8
esi: c1104e5c   edi: c02481f8   ebp: 00000000   esp: c3fc7e74
ds: 0018   es: 0018   ss: 0018
Process init (pid: 1, stackpage=c3fc7000)
Stack: c020a34c c020a440 000000cc c02481d4 c0248360 00000000 03ce6065 00002d63 
       00000286 c0248204 00000000 c02481d4 c0129fb3 000000d2 c1102d28 ffffffff 
       03ce6065 00000001 c024835c 000000d2 c0129f36 40144740 c01203b7 40144740 
Call Trace: [<c0129fb3>] [<c0129f36>] [<c01203b7>] [<c0120acd>] [<c0110980>] 
   [<c0110ae3>] [<c0110980>] [<c01391da>] [<c013a3dc>] [<c012ee3d>] [<c012ed72>] 
   [<c012f4c0>] [<c0106f48>] 
Code: 0f 0b 83 c4 0c 90 89 f0 eb 1c 47 83 44 24 18 0c 83 ff 09 0f 

>>EIP; c0129e5a <rmqueue+23a/270>   <=====
Trace; c0129fb3 <__alloc_pages+73/280>
Trace; c0129f36 <_alloc_pages+16/20>
Trace; c01203b7 <do_wp_page+157/250>
Trace; c0120acd <handle_mm_fault+8d/c0>
Trace; c0110980 <do_page_fault+0/460>
Trace; c0110ae3 <do_page_fault+163/460>
Trace; c0110980 <do_page_fault+0/460>
Trace; c01391da <permission+2a/30>
Trace; c013a3dc <open_namei+32c/5b0>
Trace; c012ee3d <dentry_open+bd/140>
Trace; c012ed72 <filp_open+52/60>
Trace; c012f4c0 <sys_llseek+c0/d0>
Trace; c0106f48 <error_code+34/3c>
Code;  c0129e5a <rmqueue+23a/270>
00000000 <_EIP>:
Code;  c0129e5a <rmqueue+23a/270>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0129e5c <rmqueue+23c/270>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c0129e5f <rmqueue+23f/270>
   5:   90                        nop    
Code;  c0129e60 <rmqueue+240/270>
   6:   89 f0                     mov    %esi,%eax
Code;  c0129e62 <rmqueue+242/270>
   8:   eb 1c                     jmp    26 <_EIP+0x26> c0129e80 <rmqueue+260/270>
Code;  c0129e64 <rmqueue+244/270>
   a:   47                        inc    %edi
Code;  c0129e65 <rmqueue+245/270>
   b:   83 44 24 18 0c            addl   $0xc,0x18(%esp,1)
Code;  c0129e6a <rmqueue+24a/270>
  10:   83 ff 09                  cmp    $0x9,%edi
Code;  c0129e6d <rmqueue+24d/270>
  13:   0f 00 00                  sldt   (%eax)

Kernel panic: Attempted to kill init!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Duron kernel crash (i686 works)
  2001-09-11  1:11 Duron kernel crash (i686 works) Roberto Jung Drebes
  2001-09-11  3:30 ` Roberto Jung Drebes
@ 2001-09-11 14:47 ` Alan Cox
  2001-09-12  6:59   ` VDA
  1 sibling, 1 reply; 13+ messages in thread
From: Alan Cox @ 2001-09-11 14:47 UTC (permalink / raw)
  To: Roberto Jung Drebes; +Cc: linux-kernel

> Today I updated the BIOS of my motherboard, a ABIT KT7A (VIA Apollo KT133A
> chipset). The kernel I had (2.4.9) started crashing on boot with an
> invalid page fault, usually right after starting init. I tryed a i686
> kernel and noticed it works OK, so I recompiled my crashy kernel only
>
> Anyone else experiencing this?

Several reports. Back down the BIOS version

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Duron kernel crash (i686 works)
  2001-09-11 14:47 ` Alan Cox
@ 2001-09-12  6:59   ` VDA
  2001-09-12 10:51     ` Steffen Persvold
  0 siblings, 1 reply; 13+ messages in thread
From: VDA @ 2001-09-12  6:59 UTC (permalink / raw)
  To: linux-kernel

>> Today I updated the BIOS of my motherboard, a ABIT KT7A (VIA Apollo KT133A
>> chipset). The kernel I had (2.4.9) started crashing on boot with an
>> invalid page fault, usually right after starting init. I tryed a i686
>> kernel and noticed it works OK, so I recompiled my crashy kernel only
>> Anyone else experiencing this?

AC> Several reports. Back down the BIOS version

Aha, we need to make kernel reprogram KT133A so that we won't be blamed
for BIOS flaws. Does anybody have a clue what's exactly changed in
chipset programming from YH to 3R BIOS? BIOSes are on
ftp://ftp.leo.org/pub/comp/general/devices/abit/bios/kt7/

>>         ...
>>         kernel_fpu_end();
>> +       from-=4096;
>> +       to-=4096;
>> +       if(memcmp(from,to,4096)!=0) {
>> +               printk("Athlon bug!"); //add printout of from,to,...?
>> +               memcpy(to,from,4096);
>> +       }
>> }

RJD> I then get 'Athlon bug!' Still oopses.

Waah! That means movntq's moved data to some other place in memory!
memcmp detected that and memcpy fixed, but that 'other place' was
corrupted and that's the cause of oops.
You may change printk to see when this happens:
    printk(KERN_ERR "Athlon bug! from=%08X to=%08X\n", from, to);
If you do, please post from/to pairs printed to lkml.

>> Comparing K7 and MMX fast_copy_page...
>> 
>> Does replacing movntq->movq makes oops go avay?

RJD> Yes, it does! Didn't tested exaustively, but it seems to go away!

This is a check to dismiss "bad PSU/memory/..." theories.
It is indeed CPU/chipset interaction bug fixable by chipset
programming.

RJD> As said earlier, this happens after upgrading from BIOS YH to 3R in the
RJD> KT7A-RAID. The processor is a Duron 800 not overclocked.
-- 
Best regards, VDA
mailto:VDA@port.imtp.ilyichevsk.odessa.ua
http://port.imtp.ilyichevsk.odessa.ua/vda/



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Duron kernel crash (i686 works)
  2001-09-12  6:59   ` VDA
@ 2001-09-12 10:51     ` Steffen Persvold
  2001-09-12 11:08       ` VDA
  0 siblings, 1 reply; 13+ messages in thread
From: Steffen Persvold @ 2001-09-12 10:51 UTC (permalink / raw)
  To: VDA; +Cc: linux-kernel

VDA wrote:
> 
> >>         ...
> >>         kernel_fpu_end();
> >> +       from-=4096;
> >> +       to-=4096;
> >> +       if(memcmp(from,to,4096)!=0) {
> >> +               printk("Athlon bug!"); //add printout of from,to,...?
> >> +               memcpy(to,from,4096);
> >> +       }
> >> }
> 
> RJD> I then get 'Athlon bug!' Still oopses.
> 
> Waah! That means movntq's moved data to some other place in memory!
> memcmp detected that and memcpy fixed, but that 'other place' was
> corrupted and that's the cause of oops.
Well, not necessarily. It might be that data just hasn't "arrived" yet because
of the movntq instruction.

One thing that also puzzels me is that my is the fast_copy_page() routine laid
out like this :

		"2: movq (%0), %%mm0\n"
		"   movntq %%mm0, (%1)\n"
		"   movq 8(%0), %%mm1\n"
		"   movntq %%mm1, 8(%1)\n"
		"   movq 16(%0), %%mm2\n"
		"   movntq %%mm2, 16(%1)\n"
		"   movq 24(%0), %%mm3\n"
		"   movntq %%mm3, 24(%1)\n"
		"   movq 32(%0), %%mm4\n"
		"   movntq %%mm4, 32(%1)\n"
		"   movq 40(%0), %%mm5\n"
		"   movntq %%mm5, 40(%1)\n"
		"   movq 48(%0), %%mm6\n"
		"   movntq %%mm6, 48(%1)\n"
		"   movq 56(%0), %%mm7\n"
		"   movntq %%mm7, 56(%1)\n"

When it's more intuitively more effective to fill the registers with reads first
and then write it with "movntq" like this :

		"2: movq (%0), %%mm0\n"
		"   movq 8(%0), %%mm1\n"
		"   movq 16(%0), %%mm2\n"
		"   movq 24(%0), %%mm3\n"
		"   movq 32(%0), %%mm4\n"
		"   movq 40(%0), %%mm5\n"
		"   movq 48(%0), %%mm6\n"
		"   movq 56(%0), %%mm7\n"
		"   movntq %%mm0, (%1)\n"
		"   movntq %%mm1, 8(%1)\n"
		"   movntq %%mm2, 16(%1)\n"
		"   movntq %%mm3, 24(%1)\n"
		"   movntq %%mm4, 32(%1)\n"
		"   movntq %%mm5, 40(%1)\n"
		"   movntq %%mm6, 48(%1)\n"
		"   movntq %%mm7, 56(%1)\n"

Regards,
-- 
  Steffen Persvold   |  Scali Computer AS   |   Try out the world's best   
 mailto:sp@scali.no  | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 |  Olaf Helsets vei 6  |      - ScaMPI 1.12.2 -         
Fax: (+47) 2262 8951 |  N0621 Oslo, NORWAY  | >300MBytes/s and <4uS latency

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Duron kernel crash (i686 works)
  2001-09-12 10:51     ` Steffen Persvold
@ 2001-09-12 11:08       ` VDA
  2001-09-12 11:23         ` Arjan van de Ven
  2001-09-12 12:48         ` Alan Cox
  0 siblings, 2 replies; 13+ messages in thread
From: VDA @ 2001-09-12 11:08 UTC (permalink / raw)
  To: linux-kernel

Hello Steffen,
Wednesday, September 12, 2001, 1:51:55 PM, you wrote:
>> >>         ...
>> >>         kernel_fpu_end();
>> >> +       from-=4096;
>> >> +       to-=4096;
>> >> +       if(memcmp(from,to,4096)!=0) {
>> >> +               printk("Athlon bug!"); //add printout of from,to,...?
>> >> +               memcpy(to,from,4096);
>> >> +       }
>> >> }
>> 
>> RJD> I then get 'Athlon bug!' Still oopses.
>> 
>> Waah! That means movntq's moved data to some other place in memory!
>> memcmp detected that and memcpy fixed, but that 'other place' was
>> corrupted and that's the cause of oops.

SP> Well, not necessarily. It might be that data just hasn't "arrived" yet because
SP> of the movntq instruction.

So why it is oopses then?
Also, we don't want this data to arrive late or whatever.
fast_copy_page must copy page (make it so that memcpy()==0).
If it does not, it is too much "optimized".

SP> One thing that also puzzels me is that my is the fast_copy_page() routine laid
SP> out like this :

SP>                 "2: movq (%0), %%mm0\n"
SP>                 "   movntq %%mm0, (%1)\n"
SP>                 "   movq 8(%0), %%mm1\n"
SP>                 "   movntq %%mm1, 8(%1)\n"
SP>                 "   movq 16(%0), %%mm2\n"
SP>                 "   movntq %%mm2, 16(%1)\n"
SP>                 "   movq 24(%0), %%mm3\n"
SP>                 "   movntq %%mm3, 24(%1)\n"
SP>                 "   movq 32(%0), %%mm4\n"
SP>                 "   movntq %%mm4, 32(%1)\n"
SP>                 "   movq 40(%0), %%mm5\n"
SP>                 "   movntq %%mm5, 40(%1)\n"
SP>                 "   movq 48(%0), %%mm6\n"
SP>                 "   movntq %%mm6, 48(%1)\n"
SP>                 "   movq 56(%0), %%mm7\n"
SP>                 "   movntq %%mm7, 56(%1)\n"

SP> When it's more intuitively more effective to fill the registers with reads first
SP> and then write it with "movntq" like this :

SP>                 "2: movq (%0), %%mm0\n"
SP>                 "   movq 8(%0), %%mm1\n"
SP>                 "   movq 16(%0), %%mm2\n"
SP>                 "   movq 24(%0), %%mm3\n"
SP>                 "   movq 32(%0), %%mm4\n"
SP>                 "   movq 40(%0), %%mm5\n"
SP>                 "   movq 48(%0), %%mm6\n"
SP>                 "   movq 56(%0), %%mm7\n"
SP>                 "   movntq %%mm0, (%1)\n"
SP>                 "   movntq %%mm1, 8(%1)\n"
SP>                 "   movntq %%mm2, 16(%1)\n"
SP>                 "   movntq %%mm3, 24(%1)\n"
SP>                 "   movntq %%mm4, 32(%1)\n"
SP>                 "   movntq %%mm5, 40(%1)\n"
SP>                 "   movntq %%mm6, 48(%1)\n"
SP>                 "   movntq %%mm7, 56(%1)\n"

A better way to do it is to bencmark several routines at
startup time and pick the best one. It is done now
for RAID xor'ing routine.
-- 
Best regards, VDA
mailto:VDA@port.imtp.ilyichevsk.odessa.ua
http://port.imtp.ilyichevsk.odessa.ua/vda/



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Duron kernel crash (i686 works)
  2001-09-12 11:08       ` VDA
@ 2001-09-12 11:23         ` Arjan van de Ven
  2001-09-12 12:48         ` Alan Cox
  1 sibling, 0 replies; 13+ messages in thread
From: Arjan van de Ven @ 2001-09-12 11:23 UTC (permalink / raw)
  To: VDA; +Cc: linux-kernel

VDA wrote:

> SP> Well, not necessarily. It might be that data just hasn't "arrived" yet because
> SP> of the movntq instruction.

this is wrong; the CPU _internal_ view of the data is always consistent,
regardless of movntq vs movq.
It's only the EXTERNAL view that is slightly different. "sfence" takes
care of syncing that.

> So why it is oopses then?
> Also, we don't want this data to arrive late or whatever.
> fast_copy_page must copy page (make it so that memcpy()==0).
> If it does not, it is too much "optimized".

It does; but if you read it back from memory and is corrupted, your
chipset corrupted it.

> SP> One thing that also puzzels me is that my is the fast_copy_page() routine laid
> SP> out like this :

[snip]

A better way to do it is to bencmark several routines at
> startup time and pick the best one. It is done now
> for RAID xor'ing routine.

I benchmarked several versions, see the testprogram at
http://www.fenrus.demon.nl/athlon.c

The interleaved one is faster on athlons because it seems to help AMD's
register aliasing logic
to operate better....

Anyway, since this code works for like 99% of the machines, and only 1%
seems to be affected, it really really really looks like a hardware bug.
This is also more or less proven by the reports that certain
biosversions "break" working setups by doing things to the via chipset
that make it break....

Greetings,
   Arjan van de Ven

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Duron kernel crash (i686 works)
  2001-09-12 11:08       ` VDA
  2001-09-12 11:23         ` Arjan van de Ven
@ 2001-09-12 12:48         ` Alan Cox
  2001-09-12 13:48           ` VDA
  1 sibling, 1 reply; 13+ messages in thread
From: Alan Cox @ 2001-09-12 12:48 UTC (permalink / raw)
  To: VDA; +Cc: linux-kernel

> So why it is oopses then?

On correct hardware it doesnt seem to oops. 

> Also, we don't want this data to arrive late or whatever.
> fast_copy_page must copy page (make it so that memcpy()==0).
> If it does not, it is too much "optimized".

See the "sfence" instruction

> A better way to do it is to bencmark several routines at
> startup time and pick the best one. It is done now
> for RAID xor'ing routine.

Not in this case. It is Athlon specific code. It was fine tuned when it
was written

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Duron kernel crash (i686 works)
  2001-09-12 12:48         ` Alan Cox
@ 2001-09-12 13:48           ` VDA
  2001-09-12 18:09             ` Mike Fedyk
  0 siblings, 1 reply; 13+ messages in thread
From: VDA @ 2001-09-12 13:48 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Hello Alan,
Wednesday, September 12, 2001, 3:48:20 PM, you wrote:
>> So why it is oopses then?
AC> On correct hardware it doesnt seem to oops.

>> Also, we don't want this data to arrive late or whatever.
>> fast_copy_page must copy page (make it so that memcpy()==0).
>> If it does not, it is too much "optimized".
AC> See the "sfence" instruction

I meant instrumented fast_copy_page() cannot fail due to
late movntq commit to memory since memcmp() is behind sfence
and kernel_fpu_end():
>>         ...
>>         kernel_fpu_end();
>> +       /* Check for "Athlon bug" - remove when resolved */
>> +       from-=4096;
>> +       to-=4096;
>> +       if(memcmp(from,to,4096)!=0) {
>> +               printk(KERN_ERR "Athlon bug! from=%08X to=%08X\n", from, to);
>> +               memcpy(to,from,4096);
>> +       }
>> }

If we still see an oops with this instrumentation,
then fast_copy_page() must be clobbering
RAM elsewhere, right?

>> A better way to do it is to bencmark several routines at
>> startup time and pick the best one. It is done now
>> for RAID xor'ing routine.
AC> Not in this case. It is Athlon specific code. It was fine
AC> tuned when it was written

Yes, but sometimes we have routines which perform
differently on different CPUs. See inslude/asm-i386/string.h
and string-486.h: on Pentium rep movsd is faster, on 386 unrolled
loop is faster... so optimal routine can be picked only at runtime.
CPU-specific routines can compete in such runtime benchmark
too when proper processor is detected - see how KNI-specific
RAID xor routine does that.
-- 
Best regards, VDA
mailto:VDA@port.imtp.ilyichevsk.odessa.ua
http://port.imtp.ilyichevsk.odessa.ua/vda/



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Duron kernel crash (i686 works)
  2001-09-12 13:48           ` VDA
@ 2001-09-12 18:09             ` Mike Fedyk
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Fedyk @ 2001-09-12 18:09 UTC (permalink / raw)
  To: linux-kernel

On Wed, Sep 12, 2001 at 04:48:00PM +0300, VDA wrote:
> >> A better way to do it is to bencmark several routines at
> >> startup time and pick the best one. It is done now
> >> for RAID xor'ing routine.

> AC> Not in this case. It is Athlon specific code. It was fine
> AC> tuned when it was written

> Yes, but sometimes we have routines which perform
> differently on different CPUs. See inslude/asm-i386/string.h
> and string-486.h: on Pentium rep movsd is faster, on 386 unrolled
> loop is faster... so optimal routine can be picked only at runtime.
> CPU-specific routines can compete in such runtime benchmark
> too when proper processor is detected - see how KNI-specific
> RAID xor routine does that.

Hmm, just how far do you want to take that?  Compile in all of the
optimizations and test which is fastest on each processor at startup?

Hmm, that might not be a bad idea for dev kernels, as it might show
optimization problems on certain processors...

^ permalink raw reply	[flat|nested] 13+ messages in thread

[parent not found: <154763769.20010911115644@port.imtp.ilyichevsk.odessa.ua>]

* Re: Duron kernel crash (i686 works)
       [not found] <154763769.20010911115644@port.imtp.ilyichevsk.odessa.ua>
@ 2001-09-11 19:38 ` Roberto Jung Drebes
  2001-09-12  6:56   ` Liakakis Kostas
  0 siblings, 1 reply; 13+ messages in thread
From: Roberto Jung Drebes @ 2001-09-11 19:38 UTC (permalink / raw)
  To: VDA; +Cc: linux-kernel

On Tue, 11 Sep 2001, VDA wrote:

> Please report to lkml as much details about your BIOSes (both older
> and newer) as you can.
> 
> You may be interesting in this msg:
> --------------------------------------------------------------
>         ...
>         kernel_fpu_end();
> +       from-=4096;
> +       to-=4096;
> +       if(memcmp(from,to,4096)!=0) {
> +               printk("Athlon bug!"); //add printout of from,to,...?
> +               memcpy(to,from,4096);
> +       }
> }

I then get 'Athlon bug!' Still oopses.

> 
> Comparing K7 and MMX fast_copy_page...
> 
> Does replacing movntq->movq fix makes oops go avay?

Yes, it does! Didn't tested exaustively, but it seems to go away!

As said earlier, this happens after upgrading from BIOS YH to 3R in the
KT7A-RAID. The processor is a Duron 800 not overclocked.

--
Roberto Jung Drebes <drebes@inf.ufrgs.br>
Porto Alegre, RS - Brasil
http://www.inf.ufrgs.br/~drebes/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Duron kernel crash (i686 works)
  2001-09-11 19:38 ` Roberto Jung Drebes
@ 2001-09-12  6:56   ` Liakakis Kostas
  2001-09-12  8:21     ` Morten Helgesen
  0 siblings, 1 reply; 13+ messages in thread
From: Liakakis Kostas @ 2001-09-12  6:56 UTC (permalink / raw)
  To: Roberto Jung Drebes; +Cc: VDA, linux-kernel

On Tue, 11 Sep 2001, Roberto Jung Drebes wrote:

> As said earlier, this happens after upgrading from BIOS YH to 3R in the
> KT7A-RAID. The processor is a Duron 800 not overclocked.

What exactly did the new bios version change? 
Enabled STPGNT by any chance?

-K.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re:  Duron kernel crash (i686 works)
  2001-09-12  6:56   ` Liakakis Kostas
@ 2001-09-12  8:21     ` Morten Helgesen
  0 siblings, 0 replies; 13+ messages in thread
From: Morten Helgesen @ 2001-09-12  8:21 UTC (permalink / raw)
  To: Liakakis Kostas; +Cc: linux-kernel

The 3R BIOS includes the following changes :

1. Update BIOS code.
2. Enhance ISA PnP compatibility.
3. Enhance SCSI adapters compatibility.
4. Add 1400(133) Athlon support for KT7A/KT7A-RAID.
5. Add new option "1200 above" for high speed Athlons
   with 100FSB. KT7 / KT7-RAID / KT7A / KT7A-RAID / KT7E only support
   1.4G(100) Athlon with L1 bridges disconnected. Check L1 before
   you buy a CPU please!
6. For Creative SBLive 5.1 sound card users, you may try these
   options while experience sound quality issue.
   * PCI master read caching, default setting=Disabled
   * PCI master time-out, Default setting=1
   Setting above options to Disabled/3 will lead to the same result
   with VIA Latency patch V0.14 and may help SB Live 5.1 sound
   issue. If the system experiences low performance after these
   settings, enable the "PCI master read caching" please.
7. Enhance the high speed 133FSB Athlon stability for KT7A/KT7A-RAID.
8. HPT 370 RAID BIOS version 1.11.0402 for KT7-RAID/KT7A-RAID.
   This BIOS version is also for non RAID boards and HPT BIOS will
   be automatically disabled while RAID controller chip not detected.

I must admit that these specifications weren`t exactly the most technical I have
ever seen, but anyway ... taken from : http://fae.abit.com.tw/eng/download/bios/kt7.htm

On Wed, Sep 12, 2001 at 09:56:00AM +0300, Liakakis Kostas wrote:
> On Tue, 11 Sep 2001, Roberto Jung Drebes wrote:
> 
> > As said earlier, this happens after upgrading from BIOS YH to 3R in the
> > KT7A-RAID. The processor is a Duron 800 not overclocked.
> 
> What exactly did the new bios version change? 
> Enabled STPGNT by any chance?
> 
> -K.
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

== Morten

-- 
mvh
Morten Helgesen 
UNIX System Administrator & C Developer 
Nextframe AS
admin@nextframe.net / 93445641
http://www.nextframe.net

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2001-09-12 18:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-09-11  1:11 Duron kernel crash (i686 works) Roberto Jung Drebes
2001-09-11  3:30 ` Roberto Jung Drebes
2001-09-11 14:47 ` Alan Cox
2001-09-12  6:59   ` VDA
2001-09-12 10:51     ` Steffen Persvold
2001-09-12 11:08       ` VDA
2001-09-12 11:23         ` Arjan van de Ven
2001-09-12 12:48         ` Alan Cox
2001-09-12 13:48           ` VDA
2001-09-12 18:09             ` Mike Fedyk
     [not found] <154763769.20010911115644@port.imtp.ilyichevsk.odessa.ua>
2001-09-11 19:38 ` Roberto Jung Drebes
2001-09-12  6:56   ` Liakakis Kostas
2001-09-12  8:21     ` Morten Helgesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox