public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: test1[12] + sparc + bind 9.1.0b1 == bad things
@ 2000-12-13 23:34 Pete Zaitcev
  2000-12-14  0:00 ` Pete Toscano
  0 siblings, 1 reply; 4+ messages in thread
From: Pete Zaitcev @ 2000-12-13 23:34 UTC (permalink / raw)
  To: linux-kernel

> Is this the first OOPS it prints out? I don't think so. I am 
> very sure it printed out messages from die_if_kernel first and 
> we need that initial OOPS to diagnose this bug and fix it. 
> 
> All the rest of the OOPS messages are useless and won't tell 
> us what the real problem is. 

> Later, 
> David S. Miller 

Bad news about recursive Oops is that too often the system
cannot continue and oopsen never reach /var/log/messages.

This problem was so common on sparc(32) that I run all my
kernels with the attached patch. I think an application
of a similar change should be mandatory if you are insterested
in any sort of debugging.

The alternative is to use a serial console, captured at all times.

--Pete

diff -u -r1.63 traps.c
--- arch/sparc/kernel/traps.c   2000/06/04 06:23:52     1.63
+++ arch/sparc/kernel/traps.c   2000/06/26 18:19:10
@@ -114,18 +116,23 @@
 		 * bound in case our stack is trashed and we loop.
 		 */
 		while(rw                                        &&
-		      count++ < 30                              &&
+		      count++ < 10                              && /* P3 30 */
 		       (((unsigned long) rw) >= PAGE_OFFSET)    &&
 		      !(((unsigned long) rw) & 0x7)) {
 			printk("Caller[%08lx]\n", rw->ins[7]);
 			rw = (struct reg_window *)rw->ins[6];
 		}
 	}
+#if 0
 	printk("Instruction DUMP:");
 	instruction_dump ((unsigned long *) regs->pc);
 	if(regs->psr & PSR_PS)
		do_exit(SIGKILL);
 	do_exit(SIGSEGV);
+#else
+	printk("Looping...");
+	for (;;) { }
+#endif
 }
 
 void do_hw_interrupt(unsigned long type, unsigned long psr, unsigned long pc)
Index: arch/sparc/mm/fault.c
===================================================================
RCS file: /vger-cvs/linux/arch/sparc/mm/fault.c,v
retrieving revision 1.116
diff -u -r1.116 fault.c
--- arch/sparc/mm/fault.c       2000/05/03 06:37:03     1.116
+++ arch/sparc/mm/fault.c       2000/06/26 18:19:11
@@ -146,11 +146,15 @@
 		printk(KERN_ALERT "Unable to handle kernel paging request "
 			"at virtual address %08lx\n", address);
 	}
+	if (tsk->active_mm == NULL) {
+		printk(KERN_ALERT "tsk->active_mm = NULL\n");
+	} else {
 	printk(KERN_ALERT "tsk->{mm,active_mm}->context = %08lx\n",
 		(tsk->mm ? tsk->mm->context : tsk->active_mm->context));
 	printk(KERN_ALERT "tsk->{mm,active_mm}->pgd = %08lx\n",
 		(tsk->mm ? (unsigned long) tsk->mm->pgd :
 			(unsigned long) tsk->active_mm->pgd));
+	}
 	die_if_kernel("Oops", regs);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 4+ messages in thread
* test1[12] + sparc + bind 9.1.0b1 == bad things
@ 2000-12-13 19:17 Pete Toscano
  2000-12-13 19:11 ` David S. Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Pete Toscano @ 2000-12-13 19:17 UTC (permalink / raw)
  To: Linux Kernel List

[-- Attachment #1: Type: text/plain, Size: 4148 bytes --]

hello,

i'm tried using the first beta release of bind 9.1.0 on an ultra 5
running 2.4.0-test11 or test12 (modified redhat 6.2 distro -- mostly
ipv6-related mods).  as soon as i start up named, the machine goes nuts
and continuously prints the following oops (from test12):


              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
                 \__U_/
(10): Oops
TSTATE: 0000000080f09606 TPC: 0000000000448264 TNPC: 0000000000448268 Y: 01800000
g0: 000000000041a198 g1: ffffffffffffffff g2: 0000000000000000 g3: 3030386666666666
g4: fffff80000000000 g5: 000000000000000f g6: fffff800167dc000 g7: 0000000000000000
o0: 0000000000000001 o1: 000000000000000f o2: fffff800167dc178 o3: 0000000000000000
o4: 0000000000624f3b o5: 0000000000624f3f sp: fffff8001295bdd1 ret_pc: 0000000000443848
l0: 0000000000000006 l1: fffff800167dc000 l2: 0000000000629000 l3: 000000000068dc00
l4: 0000000000629000 l5: 0000000000003fff l6: 000000000000000f l7: 0000000000625318
i0: 0000000000000009 i1: 0000000000000400 i2: fffff800167dc000 i3: 0000000000000001
i4: 0000000000624f1b i5: 0000000000624f26 i6: fffff8001295be91 i7: 000000000041a198
Instruction DUMP: 10680005  90102000  c45aa008 <c470e008> c6708000 913a2000  c0728000  c072a008  91924000 
Aiee, killing interrupt handler
ÿÿ<1>Unable to handle kernel paging request in mna handler<1> at virtual address 303038666666666e
current->{mm,active_mm}->context = 00000000625318ff
current->{mm,active_mm}->pgd = 0000000003402a00

here's the ksymoops output:

ksymoops 2.3.5 on sparc64 2.4.0-test12-1.  Options used
     -V (default)
     -K (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.4.0-test12-1/ (default)
     -m /usr/src/linux/System.map (default)

No modules in ksyms, skipping objects
No ksyms, skipping lsmod
(10): Oops
TSTATE: 0000000080f09606 TPC: 0000000000448264 TNPC: 0000000000448268 Y: 01800000
Using defaults from ksymoops -t elf32-sparc -a sparc
g0: 000000000041a198 g1: ffffffffffffffff g2: 0000000000000000 g3: 3030386666666666
g4: fffff80000000000 g5: 000000000000000f g6: fffff800167dc000 g7: 0000000000000000
o0: 0000000000000001 o1: 000000000000000f o2: fffff800167dc178 o3: 0000000000000000
o4: 0000000000624f3b o5: 0000000000624f3f sp: fffff8001295bdd1 ret_pc: 0000000000443848
l0: 0000000000000006 l1: fffff800167dc000 l2: 0000000000629000 l3: 000000000068dc00
l4: 0000000000629000 l5: 0000000000003fff l6: 000000000000000f l7: 0000000000625318
i0: 0000000000000009 i1: 0000000000000400 i2: fffff800167dc000 i3: 0000000000000001
i4: 0000000000624f1b i5: 0000000000624f26 i6: fffff8001295be91 i7: 000000000041a198
Instruction DUMP: 10680005  90102000  c45aa008 <c470e008> c6708000 913a2000  c0728000  c072a008  91924000 

>>PC;  00448264 <del_timer+24/60>   <=====
>>O7;  00443848 <do_exit+68/220>
>>I7;  0041a198 <die_if_kernel+f8/120>
Code;  00448258 <del_timer+18/60>
0000000000000000 <_PC>:
Code;  00448258 <del_timer+18/60>
   0:   10 68 00 05       unknown
Code;  0044825c <del_timer+1c/60>
   4:   90 10 20 00       clr  %o0
Code;  00448260 <del_timer+20/60>
   8:   c4 5a a0 08       unknown
Code;  00448264 <del_timer+24/60>   <=====
   c:   c4 70 e0 08       unknown   <=====
Code;  00448268 <del_timer+28/60>
  10:   c6 70 80 00       unknown
Code;  0044826c <del_timer+2c/60>
  14:   91 3a 20 00       sra  %o0, 0, %o0
Code;  00448270 <del_timer+30/60>
  18:   c0 72 80 00       unknown
Code;  00448274 <del_timer+34/60>
  1c:   c0 72 a0 08       unknown
Code;  00448278 <del_timer+38/60>
  20:   91 92 40 00       unknown

Aiee, killing interrupt handler
<1>Unable to handle kernel paging request in mna handler<1> at virtual address 303038666666666e

is there any further info i can provide?  would the test11 oops help
too?

is it not bad enough that i spent the whole day frustrated, working with
this system? but then the computer had to keep making faces at me,
mocking me.  *sigh* =;]

pete

-- 
Pete Toscano    p:sigsegv@psinet.com     w:pete@research.netsol.com
GPG fingerprint: D8F5 A087 9A4C 56BB 8F78  B29C 1FF0 1BA7 9008 2736

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2000-12-14  0:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-12-13 23:34 test1[12] + sparc + bind 9.1.0b1 == bad things Pete Zaitcev
2000-12-14  0:00 ` Pete Toscano
  -- strict thread matches above, loose matches on Subject: below --
2000-12-13 19:17 Pete Toscano
2000-12-13 19:11 ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox