From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4FA875BF.7040103@domain.hid> Date: Tue, 08 May 2012 03:24:15 +0200 From: Gilles Chanteperdrix MIME-Version: 1.0 References: <4FA87007.1040105@domain.hid> In-Reply-To: <4FA87007.1040105@domain.hid> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-help] Debugging oops in xnheap_init List-Id: Help regarding installation and common use of Xenomai List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Doug Brunner Cc: xenomai@xenomai.org On 05/08/2012 02:59 AM, Doug Brunner wrote: > I just got an oops from running one of my POSIX skin RT applications: > > [183168.735823] BUG: unable to handle kernel paging request at 00700bf5 > [183168.737436] IP: [] xnheap_init+0x1cf/0x210 > [183168.738604] *pde = 00000000 > [183168.739406] Oops: 0002 [#1] PREEMPT > [183168.740173] last sysfs file: /sys/devices/virtual/bdi/0:19/uevent > [183168.740173] Modules linked in: e1000 xeno_rtipc lxfb cfbcopyarea > cfbimgblt cfbfillrect binfmt_misc psmouse usbhid serio_raw hid ata_piix > [last unloaded: e1000] > [183168.740173] > [183168.740173] Pid: 2557, comm: eve_dal Not tainted 2.6.37 #1 /Bochs > [183168.740173] EIP: 0060:[] EFLAGS: 00010246 CPU: 0 > [183168.740173] EIP is at xnheap_init+0x1cf/0x210 > [183168.740173] EAX: 00700bf1 EBX: eed0e210 ECX: eed0e730 EDX: eed0e2fc > [183168.740173] ESI: 00000000 EDI: 00000000 EBP: eed27da4 ESP: eed27d7c > [183168.740173] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 > [183168.740173] Process eve_dal (pid: 2557, ti=eed26000 > task=f5773280 task.ti=eed26000) > [183168.740173] I-pipe domain Linux > [183168.740173] Stack: > [183168.740173] eed0e304 00000030 c157d4a9 eed0e210 eed0e2fc > eed0e2fc 0000003e 00000000 > [183168.740173] f85ea000 eed0e210 eed27dc8 c10c0c44 00001000 > 00000000 f85aa000 00040000 > [183168.740173] eed0e200 fffffff4 eed0e210 eed27df0 c10cf198 > eed27de4 c1058b86 eed27f20 > [183168.740173] Call Trace: > [183168.740173] [] ? xnheap_init_mapped+0xd4/0x210 > [183168.740173] [] ? xnshadow_sys_event+0x68/0x210 > [183168.740173] [] ? commit_creds+0xe6/0x190 > [183168.740173] [] ? xnshadow_sys_bind+0x293/0x420 > [183168.740173] [] ? __d_lookup+0x12e/0x160 > [183168.740173] [] ? dput+0x66/0x1b0 > [183168.740173] [] ? path_to_nameidata+0x1e/0x50 > [183168.740173] [] ? link_path_walk+0x422/0x7c0 > [183168.740173] [] ? path_put+0x25/0x30 > [183168.740173] [] ? __ipipe_restore_root+0x1d/0x30 > [183168.740173] [] ? kmem_cache_free+0xa7/0x100 > [183168.740173] [] ? putname+0x2a/0x40 > [183168.740173] [] ? user_path_at+0x4a/0x80 > [183168.740173] [] ? losyscall_event+0xad/0x200 > [183168.740173] [] ? __ipipe_dispatch_event+0xb5/0x170 > [183168.740173] [] ? losyscall_event+0x0/0x200 > [183168.740173] [] ? __ipipe_syscall_root+0x45/0xd0 > [183168.740173] [] ? system_call+0x2d/0x53 > [183168.740173] Code: 24 e8 a6 cc 19 00 fa 8b 0d 28 36 61 c1 0f ba > 2d c0 1b 61 c1 00 19 f6 8b 55 e8 83 e6 01 89 8b f0 00 00 00 8b 01 89 83 > ec 00 00 00<89> 50 04 31 c0 89 11 8b 15 c0 1b 61 c1 83 05 2c 36 61 c1 01 83 > [183168.740173] EIP: [] xnheap_init+0x1cf/0x210 SS:ESP > 0068:eed27d7c > [183168.740173] CR2: 0000000000700bf5 > > As you can see, this happened with kernel 2.6.37, and I built it with > Xenomai 2.6.0. The offending instruction was at xnheap_init + 463: > > 0xc10c090b: mov -0x18(%ebp),%edx > 0xc10c090e: and $0x1,%esi > 0xc10c0911: mov %ecx,0xf0(%ebx) > 0xc10c0917: mov (%ecx),%eax > 0xc10c0919: mov %eax,0xec(%ebx) > 0xc10c091f: mov %edx,0x4(%eax) > 0xc10c0922: xor %eax,%eax > 0xc10c0924: mov %edx,(%ecx) > 0xc10c0926: mov 0xc1611bc0,%edx > 0xc10c092c: addl $0x1,0xc161362c > 0xc10c0933: addl $0x1,0xc17c83e4 > > This corresponds to ath(xnholder_t *, xnholder_t *) in > include/xenomai/nucleus/queue.h, line 48: > > 43 static inline void ath(xnholder_t *head, xnholder_t *holder) > 44 { > 45 /* Inserts the new element right after the heading one */ > 46 holder->last = head; > 47 holder->next = head->next; > 48 holder->next->last = holder; > 49 head->next = holder; > 50 } > > It's apparently the call to appendq() at > kernel/xenomai/nucleus/heap.c:332 that does this, with a junk pointer > dereference. So, heap->stat_link.next is not valid at the time of this > call, yet it's initialized by the call to inith() on line 319. I don't > know what would have changed that, unless it's a bad pointer elsewhere > that caused overwriting of this data. Any ideas where to go from here? > If the bug is reproducible, two things you can try: - enable CONFIG_XENO_OPT_DEBUG_QUEUES - enable the I-pipe tracer and panic freezes, you should get a trace when the bug happens. -- Gilles.