* Duron kernel crash (i686 works) @ 2001-09-11 1:11 Roberto Jung Drebes 2001-09-11 3:30 ` Roberto Jung Drebes 2001-09-11 14:47 ` Alan Cox 0 siblings, 2 replies; 13+ messages in thread From: Roberto Jung Drebes @ 2001-09-11 1:11 UTC (permalink / raw) To: linux-kernel Hi, Today I updated the BIOS of my motherboard, a ABIT KT7A (VIA Apollo KT133A chipset). The kernel I had (2.4.9) started crashing on boot with an invalid page fault, usually right after starting init. I tryed a i686 kernel and noticed it works OK, so I recompiled my crashy kernel only switching the processor type and it also worked. changed it back to Athlon/K7/Duron and it starts crashing. Anyone else experiencing this? TIA, -- Roberto Jung Drebes <drebes@inf.ufrgs.br> Porto Alegre, RS - Brasil http://www.inf.ufrgs.br/~drebes/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Duron kernel crash (i686 works) 2001-09-11 1:11 Duron kernel crash (i686 works) Roberto Jung Drebes @ 2001-09-11 3:30 ` Roberto Jung Drebes 2001-09-11 14:47 ` Alan Cox 1 sibling, 0 replies; 13+ messages in thread From: Roberto Jung Drebes @ 2001-09-11 3:30 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: TEXT/PLAIN, Size: 1468 bytes --] On Mon, 10 Sep 2001, Roberto Jung Drebes wrote: > > Hi, > > Today I updated the BIOS of my motherboard, a ABIT KT7A (VIA Apollo KT133A > chipset). The kernel I had (2.4.9) started crashing on boot with an > invalid page fault, usually right after starting init. I tryed a i686 > kernel and noticed it works OK, so I recompiled my crashy kernel only > switching the processor type and it also worked. changed it back to > Athlon/K7/Duron and it starts crashing. > > Anyone else experiencing this? I captured the log via serial console, and the problem does not always happens at exactly the same time. The time I did the log, it happened right after init would mount /proc. So I passes init=/bin/sh to the kernel and initted "manually". At first things seemed to be OK. I then mounted /proc, no probs, swapon -a, ok, then started just playing around with ls, etc. Soon the loader would complain that there was no memory to be alloced to run the programs. After a while, the kernel would complain about trying to free a page that was not malloced or something like that. It seems to be related to the VM subsys. Anyway here is the trace done on the captured log. Latter, I found on the archives that some people solved instability problems on the KT133A by disabling the 3DNOW. I changed it to n in arch/i386/config.in, but it seems to have no effect. TIA, -- Roberto Jung Drebes <drebes@inf.ufrgs.br> Porto Alegre, RS - Brasil http://www.inf.ufrgs.br/~drebes/ [-- Attachment #2: Type: TEXT/PLAIN, Size: 16651 bytes --] ksymoops 2.3.4 on i686 2.4.9. Options used -v /usr/src/linux/vmlinux (specified) -K (specified) -l /proc/modules (default) -o /lib/modules/2.4.9/ (specified) -m /usr/src/linux/System.map (specified) No modules in ksyms, skipping objects No ksyms, skipping lsmod invalid operand: 0000 CPU: 0 EIP: 0010:[<c0129e5a>] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010286 eax: 00000020 ebx: 00000001 ecx: 00000001 edx: c0246ea8 esi: c11012dc edi: c02481f8 ebp: 00000000 esp: c3ccfe4c ds: 0018 es: 0018 ss: 0018 Process logger (pid: 33, stackpage=c3ccf000) Stack: c020a34c c020a440 000000cc c02481d4 c0248360 00000000 c3c8a134 00002c83 00000282 c0248204 00000000 c02481d4 c0129fb3 000000d2 c116f500 c116dc40 c3c8a134 00000001 c024835c 000000d2 c0129f36 00000000 c01208f2 00000000 Call Trace: [<c0129fb3>] [<c0129f36>] [<c01208f2>] [<c012098f>] [<c0120a9b>] [<c0110980>] [<c0110ae3>] [<c0110980>] [<c01234c3>] [<c0123400>] [<c012f593>] [<c0106f48>] Code: 0f 0b 83 c4 0c 90 89 f0 eb 1c 47 83 44 24 18 0c 83 ff 09 0f >>EIP; c0129e5a <rmqueue+23a/270> <===== Trace; c0129fb3 <__alloc_pages+73/280> Trace; c0129f36 <_alloc_pages+16/20> Trace; c01208f2 <do_anonymous_page+32/a0> Trace; c012098f <do_no_page+2f/e0> Trace; c0120a9b <handle_mm_fault+5b/c0> Trace; c0110980 <do_page_fault+0/460> Trace; c0110ae3 <do_page_fault+163/460> Trace; c0110980 <do_page_fault+0/460> Trace; c01234c3 <generic_file_read+63/80> Trace; c0123400 <file_read_actor+0/60> Trace; c012f593 <sys_read+c3/d0> Trace; c0106f48 <error_code+34/3c> Code; c0129e5a <rmqueue+23a/270> 00000000 <_EIP>: Code; c0129e5a <rmqueue+23a/270> <===== 0: 0f 0b ud2a <===== Code; c0129e5c <rmqueue+23c/270> 2: 83 c4 0c add $0xc,%esp Code; c0129e5f <rmqueue+23f/270> 5: 90 nop Code; c0129e60 <rmqueue+240/270> 6: 89 f0 mov %esi,%eax Code; c0129e62 <rmqueue+242/270> 8: eb 1c jmp 26 <_EIP+0x26> c0129e80 <rmqueue+260/270> Code; c0129e64 <rmqueue+244/270> a: 47 inc %edi Code; c0129e65 <rmqueue+245/270> b: 83 44 24 18 0c addl $0xc,0x18(%esp,1) Code; c0129e6a <rmqueue+24a/270> 10: 83 ff 09 cmp $0x9,%edi Code; c0129e6d <rmqueue+24d/270> 13: 0f 00 00 sldt (%eax) invalid operand: 0000 CPU: 0 EIP: 0010:[<c0129942>] EFLAGS: 00010286 eax: 0000001f ebx: c11012dc ecx: 00000001 edx: c0246ea8 esi: c11012dc edi: 00000001 ebp: 00000000 esp: c3c77b4c ds: 0018 es: 0018 ss: 0018 Process minilogd (pid: 31, stackpage=c3c77000) Stack: c020a34c c020a440 00000049 c1105478 c11012dc 00000001 00000001 c11012dc c01faf19 c3d7a000 c3c83000 c01239b3 c012a22a c01239c6 c0247eb0 c116dd00 c116f440 00000001 00000002 c3fd1ff4 c3c310e4 c3c31040 c3cbe3c0 c01209b2 Call Trace: [<c01faf19>] [<c01239b3>] [<c012a22a>] [<c01239c6>] [<c01209b2>] [<c0120a9b>] [<c0110980>] [<c0110ae3>] [<c0110980>] [<c01233f2>] [<c01218d6>] [<c0106f48>] [<c0106f48>] [<c0106f48>] [<c01faace>] [<c01462ec>] [<c01472b8>] [<c01469a0>] [<c0137f57>] [<c01381dc>] [<c01381f3>] [<c0105b3f>] [<c0106e57>] Code: 0f 0b 83 c4 0c 83 7b 08 00 74 16 6a 4b 68 40 a4 20 c0 68 4c >>EIP; c0129942 <__free_pages_ok+22/300> <===== Trace; c01faf19 <mmx_copy_page+29/30> Trace; c01239b3 <filemap_nopage+103/3e0> Trace; c012a22a <__free_pages+1a/20> Trace; c01239c6 <filemap_nopage+116/3e0> Trace; c01209b2 <do_no_page+52/e0> Trace; c0120a9b <handle_mm_fault+5b/c0> Trace; c0110980 <do_page_fault+0/460> Trace; c0110ae3 <do_page_fault+163/460> Trace; c0110980 <do_page_fault+0/460> Trace; c01233f2 <do_generic_file_read+502/510> Trace; c01218d6 <do_munmap+56/250> Trace; c0106f48 <error_code+34/3c> Trace; c0106f48 <error_code+34/3c> Trace; c0106f48 <error_code+34/3c> Trace; c01faace <clear_user+2e/40> Trace; c01462ec <padzero+1c/20> Trace; c01472b8 <load_elf_binary+918/a80> Trace; c01469a0 <load_elf_binary+0/a80> Trace; c0137f57 <search_binary_handler+67/170> Trace; c01381dc <do_execve+17c/1f0> Trace; c01381f3 <do_execve+193/1f0> Trace; c0105b3f <sys_execve+2f/60> Trace; c0106e57 <system_call+33/38> Code; c0129942 <__free_pages_ok+22/300> 00000000 <_EIP>: Code; c0129942 <__free_pages_ok+22/300> <===== 0: 0f 0b ud2a <===== Code; c0129944 <__free_pages_ok+24/300> 2: 83 c4 0c add $0xc,%esp Code; c0129947 <__free_pages_ok+27/300> 5: 83 7b 08 00 cmpl $0x0,0x8(%ebx) Code; c012994b <__free_pages_ok+2b/300> 9: 74 16 je 21 <_EIP+0x21> c0129963 <__free_pages_ok+43/300> Code; c012994d <__free_pages_ok+2d/300> b: 6a 4b push $0x4b Code; c012994f <__free_pages_ok+2f/300> d: 68 40 a4 20 c0 push $0xc020a440 Code; c0129954 <__free_pages_ok+34/300> 12: 68 4c 00 00 00 push $0x4c invalid operand: 0000 CPU: 0 EIP: 0010:[<c0129e5a>] EFLAGS: 00010282 eax: 00000020 ebx: 00000001 ecx: 00000001 edx: c0246ea8 esi: c11012dc edi: c02481f8 ebp: 00000000 esp: c3ed9e74 ds: 0018 es: 0018 ss: 0018 Process rc (pid: 44, stackpage=c3ed9000) Stack: c020a34c c020a440 000000cc c02481d4 c0248360 00000000 03e63065 00002c83 00000286 c0248204 00000000 c02481d4 c0129fb3 000000d2 c110925c ffffffff 03e63065 00000001 c024835c 000000d2 c0129f36 080c4f04 c01203b7 080c4f04 Call Trace: [<c0129fb3>] [<c0129f36>] [<c01203b7>] [<c0120acd>] [<c0110980>] [<c0110ae3>] [<c0110980>] [<c01faca3>] [<c01058a8>] [<c01132f7>] [<c0105ab4>] [<c0106f48>] Code: 0f 0b 83 c4 0c 90 89 f0 eb 1c 47 83 44 24 18 0c 83 ff 09 0f >>EIP; c0129e5a <rmqueue+23a/270> <===== Trace; c0129fb3 <__alloc_pages+73/280> Trace; c0129f36 <_alloc_pages+16/20> Trace; c01203b7 <do_wp_page+157/250> Trace; c0120acd <handle_mm_fault+8d/c0> Trace; c0110980 <do_page_fault+0/460> Trace; c0110ae3 <do_page_fault+163/460> Trace; c0110980 <do_page_fault+0/460> Trace; c01faca3 <_mmx_memcpy+53/100> Trace; c01058a8 <copy_thread+88/a0> Trace; c01132f7 <do_fork+5f7/6b0> Trace; c0105ab4 <sys_fork+14/20> Trace; c0106f48 <error_code+34/3c> Code; c0129e5a <rmqueue+23a/270> 00000000 <_EIP>: Code; c0129e5a <rmqueue+23a/270> <===== 0: 0f 0b ud2a <===== Code; c0129e5c <rmqueue+23c/270> 2: 83 c4 0c add $0xc,%esp Code; c0129e5f <rmqueue+23f/270> 5: 90 nop Code; c0129e60 <rmqueue+240/270> 6: 89 f0 mov %esi,%eax Code; c0129e62 <rmqueue+242/270> 8: eb 1c jmp 26 <_EIP+0x26> c0129e80 <rmqueue+260/270> Code; c0129e64 <rmqueue+244/270> a: 47 inc %edi Code; c0129e65 <rmqueue+245/270> b: 83 44 24 18 0c addl $0xc,0x18(%esp,1) Code; c0129e6a <rmqueue+24a/270> 10: 83 ff 09 cmp $0x9,%edi Code; c0129e6d <rmqueue+24d/270> 13: 0f 00 00 sldt (%eax) invalid operand: 0000 CPU: 0 EIP: 0010:[<c0129e5a>] EFLAGS: 00010282 eax: 00000020 ebx: 00000001 ecx: 00000001 edx: c0246ea8 esi: c1101298 edi: c02481f8 ebp: 00000000 esp: c3d35e74 ds: 0018 es: 0018 ss: 0018 Process rc (pid: 45, stackpage=c3d35000) Stack: c020a34c c020a440 000000cc c02481d4 c0248360 00000000 03f5e065 00002c82 00000286 c02481f8 00000000 c02481d4 c0129fb3 000000d2 c110d508 ffffffff 03f5e065 00000001 c024835c 000000d2 c0129f36 40016b68 c01203b7 40016b68 Call Trace: [<c0129fb3>] [<c0129f36>] [<c01203b7>] [<c0120acd>] [<c0110980>] [<c0110ae3>] [<c0110980>] [<c01391da>] [<c013a3dc>] [<c012ee3d>] [<c012ed72>] [<c012f16c>] [<c012f1c3>] [<c0106f48>] Code: 0f 0b 83 c4 0c 90 89 f0 eb 1c 47 83 44 24 18 0c 83 ff 09 0f >>EIP; c0129e5a <rmqueue+23a/270> <===== Trace; c0129fb3 <__alloc_pages+73/280> Trace; c0129f36 <_alloc_pages+16/20> Trace; c01203b7 <do_wp_page+157/250> Trace; c0120acd <handle_mm_fault+8d/c0> Trace; c0110980 <do_page_fault+0/460> Trace; c0110ae3 <do_page_fault+163/460> Trace; c0110980 <do_page_fault+0/460> Trace; c01391da <permission+2a/30> Trace; c013a3dc <open_namei+32c/5b0> Trace; c012ee3d <dentry_open+bd/140> Trace; c012ed72 <filp_open+52/60> Trace; c012f16c <filp_close+5c/70> Trace; c012f1c3 <sys_close+43/60> Trace; c0106f48 <error_code+34/3c> Code; c0129e5a <rmqueue+23a/270> 00000000 <_EIP>: Code; c0129e5a <rmqueue+23a/270> <===== 0: 0f 0b ud2a <===== Code; c0129e5c <rmqueue+23c/270> 2: 83 c4 0c add $0xc,%esp Code; c0129e5f <rmqueue+23f/270> 5: 90 nop Code; c0129e60 <rmqueue+240/270> 6: 89 f0 mov %esi,%eax Code; c0129e62 <rmqueue+242/270> 8: eb 1c jmp 26 <_EIP+0x26> c0129e80 <rmqueue+260/270> Code; c0129e64 <rmqueue+244/270> a: 47 inc %edi Code; c0129e65 <rmqueue+245/270> b: 83 44 24 18 0c addl $0xc,0x18(%esp,1) Code; c0129e6a <rmqueue+24a/270> 10: 83 ff 09 cmp $0x9,%edi Code; c0129e6d <rmqueue+24d/270> 13: 0f 00 00 sldt (%eax) invalid operand: 0000 CPU: 0 EIP: 0010:[<c0129942>] EFLAGS: 00010286 eax: 0000001f ebx: c1101298 ecx: 00000001 edx: c0246ea8 esi: c1101298 edi: c3d4b084 ebp: 00000000 esp: c3d35cd8 ds: 0018 es: 0018 ss: 0018 Process rc (pid: 45, stackpage=c3d35000) Stack: c020a34c c020a440 00000049 c1101298 c1101298 c3d4b084 c3d4a16c c02a1640 00000000 c0299240 00000000 00000046 c012a22a c012a76a c1101298 00000069 c011f739 c1101298 c116ddc0 c116f140 08048000 0007c000 00448000 08448000 Call Trace: [<c012a22a>] [<c012a76a>] [<c011f739>] [<c0121de5>] [<c0112606>] [<c01162dd>] [<c01075d0>] [<c0107392>] [<c010764f>] [<c0129e5a>] [<c011ac0f>] [<c011773a>] [<c0117659>] [<c011743a>] [<c01082ed>] [<c0106ee0>] [<c0106f48>] [<c0129e5a>] [<c0129fb3>] [<c0129f36>] [<c01203b7>] [<c0120acd>] [<c0110980>] [<c0110ae3>] [<c0110980>] [<c01391da>] [<c013a3dc>] [<c012ee3d>] [<c012ed72>] [<c012f16c>] [<c012f1c3>] [<c0106f48>] Code: 0f 0b 83 c4 0c 83 7b 08 00 74 16 6a 4b 68 40 a4 20 c0 68 4c >>EIP; c0129942 <__free_pages_ok+22/300> <===== Trace; c012a22a <__free_pages+1a/20> Trace; c012a76a <free_page_and_swap_cache+ba/c0> Trace; c011f739 <zap_page_range+1b9/250> Trace; c0121de5 <exit_mmap+b5/120> Trace; c0112606 <mmput+26/50> Trace; c01162dd <do_exit+9d/200> Trace; c01075d0 <do_invalid_op+0/90> Trace; c0107392 <die+42/50> Trace; c010764f <do_invalid_op+7f/90> Trace; c0129e5a <rmqueue+23a/270> Trace; c011ac0f <do_timer+3f/70> Trace; c011773a <bh_action+1a/50> Trace; c0117659 <tasklet_hi_action+59/80> Trace; c011743a <do_softirq+5a/b0> Trace; c01082ed <do_IRQ+9d/b0> Trace; c0106ee0 <ret_from_intr+0/7> Trace; c0106f48 <error_code+34/3c> Trace; c0129e5a <rmqueue+23a/270> Trace; c0129fb3 <__alloc_pages+73/280> Trace; c0129f36 <_alloc_pages+16/20> Trace; c01203b7 <do_wp_page+157/250> Trace; c0120acd <handle_mm_fault+8d/c0> Trace; c0110980 <do_page_fault+0/460> Trace; c0110ae3 <do_page_fault+163/460> Trace; c0110980 <do_page_fault+0/460> Trace; c01391da <permission+2a/30> Trace; c013a3dc <open_namei+32c/5b0> Trace; c012ee3d <dentry_open+bd/140> Trace; c012ed72 <filp_open+52/60> Trace; c012f16c <filp_close+5c/70> Trace; c012f1c3 <sys_close+43/60> Trace; c0106f48 <error_code+34/3c> Code; c0129942 <__free_pages_ok+22/300> 00000000 <_EIP>: Code; c0129942 <__free_pages_ok+22/300> <===== 0: 0f 0b ud2a <===== Code; c0129944 <__free_pages_ok+24/300> 2: 83 c4 0c add $0xc,%esp Code; c0129947 <__free_pages_ok+27/300> 5: 83 7b 08 00 cmpl $0x0,0x8(%ebx) Code; c012994b <__free_pages_ok+2b/300> 9: 74 16 je 21 <_EIP+0x21> c0129963 <__free_pages_ok+43/300> Code; c012994d <__free_pages_ok+2d/300> b: 6a 4b push $0x4b Code; c012994f <__free_pages_ok+2f/300> d: 68 40 a4 20 c0 push $0xc020a440 Code; c0129954 <__free_pages_ok+34/300> 12: 68 4c 00 00 00 push $0x4c invalid operand: 0000 CPU: 0 EIP: 0010:[<c0129e5a>] EFLAGS: 00010282 eax: 00000020 ebx: 00000002 ecx: 00000001 edx: c0246ea8 esi: c1104dd4 edi: c0248204 ebp: 00000001 esp: c3eeff20 ds: 0018 es: 0018 ss: 0018 Process rc (pid: 41, stackpage=c3eef000) Stack: c020a34c c020a440 000000cc c02481d4 c0248340 00000001 c3eeffbc 00002d61 00000296 c0248204 00000001 c02481d4 c0129fb3 000000f0 bffff28c 00000000 c3eeffbc 00000000 c024833c 000000f0 c0129f36 c3eee000 c012a1ca c0112d40 Call Trace: [<c0129fb3>] [<c0129f36>] [<c012a1ca>] [<c0112d40>] [<c0105ab4>] [<c0106e57>] Code: 0f 0b 83 c4 0c 90 89 f0 eb 1c 47 83 44 24 18 0c 83 ff 09 0f >>EIP; c0129e5a <rmqueue+23a/270> <===== Trace; c0129fb3 <__alloc_pages+73/280> Trace; c0129f36 <_alloc_pages+16/20> Trace; c012a1ca <__get_free_pages+a/20> Trace; c0112d40 <do_fork+40/6b0> Trace; c0105ab4 <sys_fork+14/20> Trace; c0106e57 <system_call+33/38> Code; c0129e5a <rmqueue+23a/270> 00000000 <_EIP>: Code; c0129e5a <rmqueue+23a/270> <===== 0: 0f 0b ud2a <===== Code; c0129e5c <rmqueue+23c/270> 2: 83 c4 0c add $0xc,%esp Code; c0129e5f <rmqueue+23f/270> 5: 90 nop Code; c0129e60 <rmqueue+240/270> 6: 89 f0 mov %esi,%eax Code; c0129e62 <rmqueue+242/270> 8: eb 1c jmp 26 <_EIP+0x26> c0129e80 <rmqueue+260/270> Code; c0129e64 <rmqueue+244/270> a: 47 inc %edi Code; c0129e65 <rmqueue+245/270> b: 83 44 24 18 0c addl $0xc,0x18(%esp,1) Code; c0129e6a <rmqueue+24a/270> 10: 83 ff 09 cmp $0x9,%edi Code; c0129e6d <rmqueue+24d/270> 13: 0f 00 00 sldt (%eax) invalid operand: 0000 CPU: 0 EIP: 0010:[<c0129e5a>] EFLAGS: 00010282 eax: 00000020 ebx: 00000001 ecx: 00000001 edx: c0246ea8 esi: c1104e5c edi: c02481f8 ebp: 00000000 esp: c3fc7e74 ds: 0018 es: 0018 ss: 0018 Process init (pid: 1, stackpage=c3fc7000) Stack: c020a34c c020a440 000000cc c02481d4 c0248360 00000000 03ce6065 00002d63 00000286 c0248204 00000000 c02481d4 c0129fb3 000000d2 c1102d28 ffffffff 03ce6065 00000001 c024835c 000000d2 c0129f36 40144740 c01203b7 40144740 Call Trace: [<c0129fb3>] [<c0129f36>] [<c01203b7>] [<c0120acd>] [<c0110980>] [<c0110ae3>] [<c0110980>] [<c01391da>] [<c013a3dc>] [<c012ee3d>] [<c012ed72>] [<c012f4c0>] [<c0106f48>] Code: 0f 0b 83 c4 0c 90 89 f0 eb 1c 47 83 44 24 18 0c 83 ff 09 0f >>EIP; c0129e5a <rmqueue+23a/270> <===== Trace; c0129fb3 <__alloc_pages+73/280> Trace; c0129f36 <_alloc_pages+16/20> Trace; c01203b7 <do_wp_page+157/250> Trace; c0120acd <handle_mm_fault+8d/c0> Trace; c0110980 <do_page_fault+0/460> Trace; c0110ae3 <do_page_fault+163/460> Trace; c0110980 <do_page_fault+0/460> Trace; c01391da <permission+2a/30> Trace; c013a3dc <open_namei+32c/5b0> Trace; c012ee3d <dentry_open+bd/140> Trace; c012ed72 <filp_open+52/60> Trace; c012f4c0 <sys_llseek+c0/d0> Trace; c0106f48 <error_code+34/3c> Code; c0129e5a <rmqueue+23a/270> 00000000 <_EIP>: Code; c0129e5a <rmqueue+23a/270> <===== 0: 0f 0b ud2a <===== Code; c0129e5c <rmqueue+23c/270> 2: 83 c4 0c add $0xc,%esp Code; c0129e5f <rmqueue+23f/270> 5: 90 nop Code; c0129e60 <rmqueue+240/270> 6: 89 f0 mov %esi,%eax Code; c0129e62 <rmqueue+242/270> 8: eb 1c jmp 26 <_EIP+0x26> c0129e80 <rmqueue+260/270> Code; c0129e64 <rmqueue+244/270> a: 47 inc %edi Code; c0129e65 <rmqueue+245/270> b: 83 44 24 18 0c addl $0xc,0x18(%esp,1) Code; c0129e6a <rmqueue+24a/270> 10: 83 ff 09 cmp $0x9,%edi Code; c0129e6d <rmqueue+24d/270> 13: 0f 00 00 sldt (%eax) Kernel panic: Attempted to kill init! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Duron kernel crash (i686 works) 2001-09-11 1:11 Duron kernel crash (i686 works) Roberto Jung Drebes 2001-09-11 3:30 ` Roberto Jung Drebes @ 2001-09-11 14:47 ` Alan Cox 2001-09-12 6:59 ` VDA 1 sibling, 1 reply; 13+ messages in thread From: Alan Cox @ 2001-09-11 14:47 UTC (permalink / raw) To: Roberto Jung Drebes; +Cc: linux-kernel > Today I updated the BIOS of my motherboard, a ABIT KT7A (VIA Apollo KT133A > chipset). The kernel I had (2.4.9) started crashing on boot with an > invalid page fault, usually right after starting init. I tryed a i686 > kernel and noticed it works OK, so I recompiled my crashy kernel only > > Anyone else experiencing this? Several reports. Back down the BIOS version ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Duron kernel crash (i686 works) 2001-09-11 14:47 ` Alan Cox @ 2001-09-12 6:59 ` VDA 2001-09-12 10:51 ` Steffen Persvold 0 siblings, 1 reply; 13+ messages in thread From: VDA @ 2001-09-12 6:59 UTC (permalink / raw) To: linux-kernel >> Today I updated the BIOS of my motherboard, a ABIT KT7A (VIA Apollo KT133A >> chipset). The kernel I had (2.4.9) started crashing on boot with an >> invalid page fault, usually right after starting init. I tryed a i686 >> kernel and noticed it works OK, so I recompiled my crashy kernel only >> Anyone else experiencing this? AC> Several reports. Back down the BIOS version Aha, we need to make kernel reprogram KT133A so that we won't be blamed for BIOS flaws. Does anybody have a clue what's exactly changed in chipset programming from YH to 3R BIOS? BIOSes are on ftp://ftp.leo.org/pub/comp/general/devices/abit/bios/kt7/ >> ... >> kernel_fpu_end(); >> + from-=4096; >> + to-=4096; >> + if(memcmp(from,to,4096)!=0) { >> + printk("Athlon bug!"); //add printout of from,to,...? >> + memcpy(to,from,4096); >> + } >> } RJD> I then get 'Athlon bug!' Still oopses. Waah! That means movntq's moved data to some other place in memory! memcmp detected that and memcpy fixed, but that 'other place' was corrupted and that's the cause of oops. You may change printk to see when this happens: printk(KERN_ERR "Athlon bug! from=%08X to=%08X\n", from, to); If you do, please post from/to pairs printed to lkml. >> Comparing K7 and MMX fast_copy_page... >> >> Does replacing movntq->movq makes oops go avay? RJD> Yes, it does! Didn't tested exaustively, but it seems to go away! This is a check to dismiss "bad PSU/memory/..." theories. It is indeed CPU/chipset interaction bug fixable by chipset programming. RJD> As said earlier, this happens after upgrading from BIOS YH to 3R in the RJD> KT7A-RAID. The processor is a Duron 800 not overclocked. -- Best regards, VDA mailto:VDA@port.imtp.ilyichevsk.odessa.ua http://port.imtp.ilyichevsk.odessa.ua/vda/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Duron kernel crash (i686 works) 2001-09-12 6:59 ` VDA @ 2001-09-12 10:51 ` Steffen Persvold 2001-09-12 11:08 ` VDA 0 siblings, 1 reply; 13+ messages in thread From: Steffen Persvold @ 2001-09-12 10:51 UTC (permalink / raw) To: VDA; +Cc: linux-kernel VDA wrote: > > >> ... > >> kernel_fpu_end(); > >> + from-=4096; > >> + to-=4096; > >> + if(memcmp(from,to,4096)!=0) { > >> + printk("Athlon bug!"); //add printout of from,to,...? > >> + memcpy(to,from,4096); > >> + } > >> } > > RJD> I then get 'Athlon bug!' Still oopses. > > Waah! That means movntq's moved data to some other place in memory! > memcmp detected that and memcpy fixed, but that 'other place' was > corrupted and that's the cause of oops. Well, not necessarily. It might be that data just hasn't "arrived" yet because of the movntq instruction. One thing that also puzzels me is that my is the fast_copy_page() routine laid out like this : "2: movq (%0), %%mm0\n" " movntq %%mm0, (%1)\n" " movq 8(%0), %%mm1\n" " movntq %%mm1, 8(%1)\n" " movq 16(%0), %%mm2\n" " movntq %%mm2, 16(%1)\n" " movq 24(%0), %%mm3\n" " movntq %%mm3, 24(%1)\n" " movq 32(%0), %%mm4\n" " movntq %%mm4, 32(%1)\n" " movq 40(%0), %%mm5\n" " movntq %%mm5, 40(%1)\n" " movq 48(%0), %%mm6\n" " movntq %%mm6, 48(%1)\n" " movq 56(%0), %%mm7\n" " movntq %%mm7, 56(%1)\n" When it's more intuitively more effective to fill the registers with reads first and then write it with "movntq" like this : "2: movq (%0), %%mm0\n" " movq 8(%0), %%mm1\n" " movq 16(%0), %%mm2\n" " movq 24(%0), %%mm3\n" " movq 32(%0), %%mm4\n" " movq 40(%0), %%mm5\n" " movq 48(%0), %%mm6\n" " movq 56(%0), %%mm7\n" " movntq %%mm0, (%1)\n" " movntq %%mm1, 8(%1)\n" " movntq %%mm2, 16(%1)\n" " movntq %%mm3, 24(%1)\n" " movntq %%mm4, 32(%1)\n" " movntq %%mm5, 40(%1)\n" " movntq %%mm6, 48(%1)\n" " movntq %%mm7, 56(%1)\n" Regards, -- Steffen Persvold | Scali Computer AS | Try out the world's best mailto:sp@scali.no | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.12.2 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >300MBytes/s and <4uS latency ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Duron kernel crash (i686 works) 2001-09-12 10:51 ` Steffen Persvold @ 2001-09-12 11:08 ` VDA 2001-09-12 11:23 ` Arjan van de Ven 2001-09-12 12:48 ` Alan Cox 0 siblings, 2 replies; 13+ messages in thread From: VDA @ 2001-09-12 11:08 UTC (permalink / raw) To: linux-kernel Hello Steffen, Wednesday, September 12, 2001, 1:51:55 PM, you wrote: >> >> ... >> >> kernel_fpu_end(); >> >> + from-=4096; >> >> + to-=4096; >> >> + if(memcmp(from,to,4096)!=0) { >> >> + printk("Athlon bug!"); //add printout of from,to,...? >> >> + memcpy(to,from,4096); >> >> + } >> >> } >> >> RJD> I then get 'Athlon bug!' Still oopses. >> >> Waah! That means movntq's moved data to some other place in memory! >> memcmp detected that and memcpy fixed, but that 'other place' was >> corrupted and that's the cause of oops. SP> Well, not necessarily. It might be that data just hasn't "arrived" yet because SP> of the movntq instruction. So why it is oopses then? Also, we don't want this data to arrive late or whatever. fast_copy_page must copy page (make it so that memcpy()==0). If it does not, it is too much "optimized". SP> One thing that also puzzels me is that my is the fast_copy_page() routine laid SP> out like this : SP> "2: movq (%0), %%mm0\n" SP> " movntq %%mm0, (%1)\n" SP> " movq 8(%0), %%mm1\n" SP> " movntq %%mm1, 8(%1)\n" SP> " movq 16(%0), %%mm2\n" SP> " movntq %%mm2, 16(%1)\n" SP> " movq 24(%0), %%mm3\n" SP> " movntq %%mm3, 24(%1)\n" SP> " movq 32(%0), %%mm4\n" SP> " movntq %%mm4, 32(%1)\n" SP> " movq 40(%0), %%mm5\n" SP> " movntq %%mm5, 40(%1)\n" SP> " movq 48(%0), %%mm6\n" SP> " movntq %%mm6, 48(%1)\n" SP> " movq 56(%0), %%mm7\n" SP> " movntq %%mm7, 56(%1)\n" SP> When it's more intuitively more effective to fill the registers with reads first SP> and then write it with "movntq" like this : SP> "2: movq (%0), %%mm0\n" SP> " movq 8(%0), %%mm1\n" SP> " movq 16(%0), %%mm2\n" SP> " movq 24(%0), %%mm3\n" SP> " movq 32(%0), %%mm4\n" SP> " movq 40(%0), %%mm5\n" SP> " movq 48(%0), %%mm6\n" SP> " movq 56(%0), %%mm7\n" SP> " movntq %%mm0, (%1)\n" SP> " movntq %%mm1, 8(%1)\n" SP> " movntq %%mm2, 16(%1)\n" SP> " movntq %%mm3, 24(%1)\n" SP> " movntq %%mm4, 32(%1)\n" SP> " movntq %%mm5, 40(%1)\n" SP> " movntq %%mm6, 48(%1)\n" SP> " movntq %%mm7, 56(%1)\n" A better way to do it is to bencmark several routines at startup time and pick the best one. It is done now for RAID xor'ing routine. -- Best regards, VDA mailto:VDA@port.imtp.ilyichevsk.odessa.ua http://port.imtp.ilyichevsk.odessa.ua/vda/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Duron kernel crash (i686 works) 2001-09-12 11:08 ` VDA @ 2001-09-12 11:23 ` Arjan van de Ven 2001-09-12 12:48 ` Alan Cox 1 sibling, 0 replies; 13+ messages in thread From: Arjan van de Ven @ 2001-09-12 11:23 UTC (permalink / raw) To: VDA; +Cc: linux-kernel VDA wrote: > SP> Well, not necessarily. It might be that data just hasn't "arrived" yet because > SP> of the movntq instruction. this is wrong; the CPU _internal_ view of the data is always consistent, regardless of movntq vs movq. It's only the EXTERNAL view that is slightly different. "sfence" takes care of syncing that. > So why it is oopses then? > Also, we don't want this data to arrive late or whatever. > fast_copy_page must copy page (make it so that memcpy()==0). > If it does not, it is too much "optimized". It does; but if you read it back from memory and is corrupted, your chipset corrupted it. > SP> One thing that also puzzels me is that my is the fast_copy_page() routine laid > SP> out like this : [snip] A better way to do it is to bencmark several routines at > startup time and pick the best one. It is done now > for RAID xor'ing routine. I benchmarked several versions, see the testprogram at http://www.fenrus.demon.nl/athlon.c The interleaved one is faster on athlons because it seems to help AMD's register aliasing logic to operate better.... Anyway, since this code works for like 99% of the machines, and only 1% seems to be affected, it really really really looks like a hardware bug. This is also more or less proven by the reports that certain biosversions "break" working setups by doing things to the via chipset that make it break.... Greetings, Arjan van de Ven ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Duron kernel crash (i686 works) 2001-09-12 11:08 ` VDA 2001-09-12 11:23 ` Arjan van de Ven @ 2001-09-12 12:48 ` Alan Cox 2001-09-12 13:48 ` VDA 1 sibling, 1 reply; 13+ messages in thread From: Alan Cox @ 2001-09-12 12:48 UTC (permalink / raw) To: VDA; +Cc: linux-kernel > So why it is oopses then? On correct hardware it doesnt seem to oops. > Also, we don't want this data to arrive late or whatever. > fast_copy_page must copy page (make it so that memcpy()==0). > If it does not, it is too much "optimized". See the "sfence" instruction > A better way to do it is to bencmark several routines at > startup time and pick the best one. It is done now > for RAID xor'ing routine. Not in this case. It is Athlon specific code. It was fine tuned when it was written ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Duron kernel crash (i686 works) 2001-09-12 12:48 ` Alan Cox @ 2001-09-12 13:48 ` VDA 2001-09-12 18:09 ` Mike Fedyk 0 siblings, 1 reply; 13+ messages in thread From: VDA @ 2001-09-12 13:48 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel Hello Alan, Wednesday, September 12, 2001, 3:48:20 PM, you wrote: >> So why it is oopses then? AC> On correct hardware it doesnt seem to oops. >> Also, we don't want this data to arrive late or whatever. >> fast_copy_page must copy page (make it so that memcpy()==0). >> If it does not, it is too much "optimized". AC> See the "sfence" instruction I meant instrumented fast_copy_page() cannot fail due to late movntq commit to memory since memcmp() is behind sfence and kernel_fpu_end(): >> ... >> kernel_fpu_end(); >> + /* Check for "Athlon bug" - remove when resolved */ >> + from-=4096; >> + to-=4096; >> + if(memcmp(from,to,4096)!=0) { >> + printk(KERN_ERR "Athlon bug! from=%08X to=%08X\n", from, to); >> + memcpy(to,from,4096); >> + } >> } If we still see an oops with this instrumentation, then fast_copy_page() must be clobbering RAM elsewhere, right? >> A better way to do it is to bencmark several routines at >> startup time and pick the best one. It is done now >> for RAID xor'ing routine. AC> Not in this case. It is Athlon specific code. It was fine AC> tuned when it was written Yes, but sometimes we have routines which perform differently on different CPUs. See inslude/asm-i386/string.h and string-486.h: on Pentium rep movsd is faster, on 386 unrolled loop is faster... so optimal routine can be picked only at runtime. CPU-specific routines can compete in such runtime benchmark too when proper processor is detected - see how KNI-specific RAID xor routine does that. -- Best regards, VDA mailto:VDA@port.imtp.ilyichevsk.odessa.ua http://port.imtp.ilyichevsk.odessa.ua/vda/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Duron kernel crash (i686 works) 2001-09-12 13:48 ` VDA @ 2001-09-12 18:09 ` Mike Fedyk 0 siblings, 0 replies; 13+ messages in thread From: Mike Fedyk @ 2001-09-12 18:09 UTC (permalink / raw) To: linux-kernel On Wed, Sep 12, 2001 at 04:48:00PM +0300, VDA wrote: > >> A better way to do it is to bencmark several routines at > >> startup time and pick the best one. It is done now > >> for RAID xor'ing routine. > AC> Not in this case. It is Athlon specific code. It was fine > AC> tuned when it was written > Yes, but sometimes we have routines which perform > differently on different CPUs. See inslude/asm-i386/string.h > and string-486.h: on Pentium rep movsd is faster, on 386 unrolled > loop is faster... so optimal routine can be picked only at runtime. > CPU-specific routines can compete in such runtime benchmark > too when proper processor is detected - see how KNI-specific > RAID xor routine does that. Hmm, just how far do you want to take that? Compile in all of the optimizations and test which is fastest on each processor at startup? Hmm, that might not be a bad idea for dev kernels, as it might show optimization problems on certain processors... ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <154763769.20010911115644@port.imtp.ilyichevsk.odessa.ua>]
* Re: Duron kernel crash (i686 works) [not found] <154763769.20010911115644@port.imtp.ilyichevsk.odessa.ua> @ 2001-09-11 19:38 ` Roberto Jung Drebes 2001-09-12 6:56 ` Liakakis Kostas 0 siblings, 1 reply; 13+ messages in thread From: Roberto Jung Drebes @ 2001-09-11 19:38 UTC (permalink / raw) To: VDA; +Cc: linux-kernel On Tue, 11 Sep 2001, VDA wrote: > Please report to lkml as much details about your BIOSes (both older > and newer) as you can. > > You may be interesting in this msg: > -------------------------------------------------------------- > ... > kernel_fpu_end(); > + from-=4096; > + to-=4096; > + if(memcmp(from,to,4096)!=0) { > + printk("Athlon bug!"); //add printout of from,to,...? > + memcpy(to,from,4096); > + } > } I then get 'Athlon bug!' Still oopses. > > Comparing K7 and MMX fast_copy_page... > > Does replacing movntq->movq fix makes oops go avay? Yes, it does! Didn't tested exaustively, but it seems to go away! As said earlier, this happens after upgrading from BIOS YH to 3R in the KT7A-RAID. The processor is a Duron 800 not overclocked. -- Roberto Jung Drebes <drebes@inf.ufrgs.br> Porto Alegre, RS - Brasil http://www.inf.ufrgs.br/~drebes/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Duron kernel crash (i686 works) 2001-09-11 19:38 ` Roberto Jung Drebes @ 2001-09-12 6:56 ` Liakakis Kostas 2001-09-12 8:21 ` Morten Helgesen 0 siblings, 1 reply; 13+ messages in thread From: Liakakis Kostas @ 2001-09-12 6:56 UTC (permalink / raw) To: Roberto Jung Drebes; +Cc: VDA, linux-kernel On Tue, 11 Sep 2001, Roberto Jung Drebes wrote: > As said earlier, this happens after upgrading from BIOS YH to 3R in the > KT7A-RAID. The processor is a Duron 800 not overclocked. What exactly did the new bios version change? Enabled STPGNT by any chance? -K. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Duron kernel crash (i686 works) 2001-09-12 6:56 ` Liakakis Kostas @ 2001-09-12 8:21 ` Morten Helgesen 0 siblings, 0 replies; 13+ messages in thread From: Morten Helgesen @ 2001-09-12 8:21 UTC (permalink / raw) To: Liakakis Kostas; +Cc: linux-kernel The 3R BIOS includes the following changes : 1. Update BIOS code. 2. Enhance ISA PnP compatibility. 3. Enhance SCSI adapters compatibility. 4. Add 1400(133) Athlon support for KT7A/KT7A-RAID. 5. Add new option "1200 above" for high speed Athlons with 100FSB. KT7 / KT7-RAID / KT7A / KT7A-RAID / KT7E only support 1.4G(100) Athlon with L1 bridges disconnected. Check L1 before you buy a CPU please! 6. For Creative SBLive 5.1 sound card users, you may try these options while experience sound quality issue. * PCI master read caching, default setting=Disabled * PCI master time-out, Default setting=1 Setting above options to Disabled/3 will lead to the same result with VIA Latency patch V0.14 and may help SB Live 5.1 sound issue. If the system experiences low performance after these settings, enable the "PCI master read caching" please. 7. Enhance the high speed 133FSB Athlon stability for KT7A/KT7A-RAID. 8. HPT 370 RAID BIOS version 1.11.0402 for KT7-RAID/KT7A-RAID. This BIOS version is also for non RAID boards and HPT BIOS will be automatically disabled while RAID controller chip not detected. I must admit that these specifications weren`t exactly the most technical I have ever seen, but anyway ... taken from : http://fae.abit.com.tw/eng/download/bios/kt7.htm On Wed, Sep 12, 2001 at 09:56:00AM +0300, Liakakis Kostas wrote: > On Tue, 11 Sep 2001, Roberto Jung Drebes wrote: > > > As said earlier, this happens after upgrading from BIOS YH to 3R in the > > KT7A-RAID. The processor is a Duron 800 not overclocked. > > What exactly did the new bios version change? > Enabled STPGNT by any chance? > > -K. > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ == Morten -- mvh Morten Helgesen UNIX System Administrator & C Developer Nextframe AS admin@nextframe.net / 93445641 http://www.nextframe.net ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2001-09-12 18:09 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-09-11 1:11 Duron kernel crash (i686 works) Roberto Jung Drebes
2001-09-11 3:30 ` Roberto Jung Drebes
2001-09-11 14:47 ` Alan Cox
2001-09-12 6:59 ` VDA
2001-09-12 10:51 ` Steffen Persvold
2001-09-12 11:08 ` VDA
2001-09-12 11:23 ` Arjan van de Ven
2001-09-12 12:48 ` Alan Cox
2001-09-12 13:48 ` VDA
2001-09-12 18:09 ` Mike Fedyk
[not found] <154763769.20010911115644@port.imtp.ilyichevsk.odessa.ua>
2001-09-11 19:38 ` Roberto Jung Drebes
2001-09-12 6:56 ` Liakakis Kostas
2001-09-12 8:21 ` Morten Helgesen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox