In-Kernel NaT consumption trap when debugging highly-parallel

public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed

* In-Kernel NaT consumption trap when debugging highly-parallel
@ 2004-03-25  1:15 Peter Chubb
  2004-03-25  2:15 ` David Mosberger
  2004-03-26  0:52 ` David Mosberger
  0 siblings, 2 replies; 3+ messages in thread
From: Peter Chubb @ 2004-03-25  1:15 UTC (permalink / raw)
  To: linux-ia64

X-Mailer: VM 7.17 under 21.4 (patch 15) "Security Through Obscurity" XEmacs Lucid
Comments: Hyperbole mail buttons accepted, v04.18.
X-Face: GgFg(Z>fx((4\32hvXq<)|jndSniCH~~$D)Ka:P@e@JR1P%Vr}EwUdfwf-4j\rUs#JR{'h#
 !]])6%Jh~b$VA|ALhnpPiHu[-x~@<"@Iv&|%R)Fq[[,(&Z'O)Q)xCqe1\M[F8#9l8~}#u$S$Rm`S9%
 \'T@`:&8>Sb*c5d'íYI&GF`+t[LfDH="MP5rwOO]w>ALi7'=QJHz&y&C&TE_3j!
--text follows this line--

Hi,
	I'm trying to debug a highly-threaded user-space program, but
gdb is triggering NaT consumption faults in the middle of ptrace.

Does anyone have any good ideas on what could be going wrong, and how
to debug this?

stracing gdb gives this:
....
wait4(-1, 0x60000fffffffa570, WNOHANG|__WCLONE, NULL) = -1 ECHILD (No
child proc
esses)
wait4(-1, [WIFSTOPPED(s) && WSTOPSIG(s) = SIGTRAP], WNOHANG, NULL) 5015
ptrace(PTRACE_PEEKUSER, 5015, f32, NULL) = 2305843009213940640
open("/proc/5015/status", O_RDONLY)     = 15
fstat(15, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x20
000000005b0000
read(15, "Name:\tgigaecho.bin\nState:\tT (sto"..., 1024) = 495
read(15, "", 1024)                      = 0
close(15)                               = 0
munmap(0x20000000005b0000, 65536)       = 0
ptrace(0x4201 /* PTRACE_??? */, 5015, 0, 0x60000fffffffa4a0) = 0
wait4(5018, [WIFSTOPPED(s) && WSTOPSIG(s) = SIGSTOP], 0, NULL) = 5018
ptrace(PTRACE_PEEKUSER, 5015, r1, NULL) = 2305843009216381440
ptrace(PTRACE_PEEKUSER, 5015, r2, NULL) = 2305843009216661592
ptrace(PTRACE_PEEKUSER, 5015, r3, NULL) = 2305843009216656032
ptrace(PTRACE_PEEKUSER, 5015, r4,  <unfinished ...>
+++ killed by SIGSEGV +++



And in the log:
 gdb[5014]: NaT consumption 17179869216 [5]

Pid: 5014, CPU 0, comm:                  gdb
psr : 0000101008026018 ifs : 8000000000000005 ip  :
[<a00000010003d320>]    Not tainted
ip is at unw_access_gr+0x1a0/0x5a0
unat: 0000000000000000 pfs : 000000000000048e rsc : 0000000000000003
rnat: 0000000000000000 bsps: a000000100a49840 pr  : 10000004aa996555
ldrs: 0000000000000000 ccv : 0000000000000008 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a00000010002dbb0 b6  : a00000010002f1a0 b7  : a00000010000d2c0
f6  : 1003e9e3779b97f4a7c16 f7  : 000000000000000000000
f8  : 1003e0000000000000077 f9  : 1003e0000000000000341
f10 : 1003e6db6db6db6db6db7 f11 : 1003efffffffffffffff7
r1  : a000000100a49840 r2  : 0000000000000003 r3  : 0000000000000000
r8  : 0000000000000000 r9  : e000000034ab7cd0 r10 : 1ffffffff0c48240
r11 : 1ffffffff0c48242 r12 : e000000034ab7c00 r13 : e000000034ab0000
r14 : 0000000000000000 r15 : 0000000000000038 r16 : 0100000000000000
r17 : 0000000000000000 r18 : 00000000000000c0 r19 : a00000010077628c
r20 : 0000000000000000 r21 : a000000100776290 r22 : 0000000000000017
r23 : 0000000000000002 r24 : 0000000000000000 r25 : a000000100852ff8
r26 : 0000000000000000 r27 : a00000010077628c r28 : a000000100776270
r29 : a000000100042030 r30 : ffffffffff9c4218 r31 : a00000010067de08

Call Trace:
 [<a000000100014880>] show_stack+0x80/0xa0
                                spà00000034ab7760 bspà00000034ab1278
 [<a000000100037520>] die+0x140/0x1e0
                                spà00000034ab7930 bspà00000034ab1250
 [<a0000001000381f0>] ia64_fault+0x150/0xb60
                                spà00000034ab7930 bspà00000034ab1210
 [<a00000010000db00>] ia64_leave_kernel+0x0/0x260
                                spà00000034ab7a30 bspà00000034ab1210
 [<a00000010003d320>] unw_access_gr+0x1a0/0x5a0
                                spà00000034ab7c00 bspà00000034ab11e0
 [<a00000010002dbb0>] access_uarea+0xc10/0xfe0
                                spà00000034ab7c00 bspà00000034ab1198
 [<a00000010002f4b0>] sys_ptrace+0x310/0x800
                                spà00000034ab7e20 bspà00000034ab1110
 [<a00000010000d7f0>] ia64_trace_syscall+0xd0/0x110
                                spà00000034ab7e30 bspà00000034ab1110

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: In-Kernel NaT consumption trap when debugging highly-parallel
  2004-03-25  1:15 In-Kernel NaT consumption trap when debugging highly-parallel Peter Chubb
@ 2004-03-25  2:15 ` David Mosberger
  2004-03-26  0:52 ` David Mosberger
  1 sibling, 0 replies; 3+ messages in thread
From: David Mosberger @ 2004-03-25  2:15 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Thu, 25 Mar 2004 12:15:51 +1100, Peter Chubb <peter@chubb.wattle.id.au> said:

  Peter> Hi, I'm trying to debug a highly-threaded user-space program,
  Peter> but gdb is triggering NaT consumption faults in the middle of
  Peter> ptrace.

  Peter> Does anyone have any good ideas on what could be going wrong,
  Peter> and how to debug this?

Almost certainly a NULL-pointer dereference in the kernel.  I thought
I had fixed a bug were unw_access_gr() was dereferencing a NULL
nat_addr some weeks ago, when I did the ptrace fixes.  It looks like
that fix perhaps fell through the cracks or I decided no fixing was
needed after all.  What kernel are you running and how easy is it to
reproduce the bug?

	--david

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: In-Kernel NaT consumption trap when debugging highly-parallel
  2004-03-25  1:15 In-Kernel NaT consumption trap when debugging highly-parallel Peter Chubb
  2004-03-25  2:15 ` David Mosberger
@ 2004-03-26  0:52 ` David Mosberger
  1 sibling, 0 replies; 3+ messages in thread
From: David Mosberger @ 2004-03-26  0:52 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Thu, 25 Mar 2004 12:15:51 +1100, Peter Chubb <peter@chubb.wattle.id.au> said:

  Peter> Hi, I'm trying to debug a highly-threaded user-space program,
  Peter> but gdb is triggering NaT consumption faults in the middle of
  Peter> ptrace.

The attached patch should fix it.  Bjorn, I think the same will be needed
for 2.4.

It appears that the combination of gdb 6.x and NPTL is triggering this
bug (which has been there forever).  What happened is that gdb was
trying to read r4 in a task that was in the middle of a clone2().
clone2() spills that register (via SAVE_SWITCH_STACK) along with the
(primary) UNaT.  Unfortunately, the typo in run_script() which is
fixed by the patch below caused the primary-UNaT address to be
calculated as 0, which would then cause the NULL-pointer dereference.

	--david

=== arch/ia64/kernel/unwind.c 1.37 vs edited ==--- 1.37/arch/ia64/kernel/unwind.c	Thu Feb 19 11:27:59 2004
+++ edited/arch/ia64/kernel/unwind.c	Thu Mar 25 16:34:00 2004
@@ -1746,7 +1746,7 @@
 			if (!state->pri_unat_loc)
 				state->pri_unat_loc = &state->sw->ar_unat;
 			/* register off. is a multiple of 8, so the least 3 bits (type) are 0 */
-			s[dst+1] = (*state->pri_unat_loc - s[dst]) | UNW_NAT_MEMSTK;
+			s[dst+1] = ((unsigned long) state->pri_unat_loc - s[dst]) | UNW_NAT_MEMSTK;
 			break;

 		      case UNW_INSN_SETNAT_TYPE:

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-03-26  0:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-25  1:15 In-Kernel NaT consumption trap when debugging highly-parallel Peter Chubb
2004-03-25  2:15 ` David Mosberger
2004-03-26  0:52 ` David Mosberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox