From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4FC716C1.2030701@xenomai.org> Date: Thu, 31 May 2012 08:59:13 +0200 From: Gilles Chanteperdrix MIME-Version: 1.0 References: <4FA87007.1040105@ebus.com> <4FA875BF.7040103@xenomai.org> <4FC6ADD6.8000509@ebus.com> In-Reply-To: <4FC6ADD6.8000509@ebus.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai] [Xenomai-help] Debugging oops in xnheap_init List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Doug Brunner Cc: xenomai@xenomai.org On 05/31/2012 01:31 AM, Doug Brunner wrote: > On 05/07/2012 06:24 PM, Gilles Chanteperdrix wrote: >> On 05/08/2012 02:59 AM, Doug Brunner wrote: >>> I just got an oops from running one of my POSIX skin RT applications: >>> >>> [183168.735823] BUG: unable to handle kernel paging request at 00700bf5 >>> [183168.737436] IP: [] xnheap_init+0x1cf/0x210 >>> [183168.738604] *pde = 00000000 >>> [183168.739406] Oops: 0002 [#1] PREEMPT >>> [183168.740173] last sysfs file: /sys/devices/virtual/bdi/0:19/uevent >>> [183168.740173] Modules linked in: e1000 xeno_rtipc lxfb cfbcopyarea >>> cfbimgblt cfbfillrect binfmt_misc psmouse usbhid serio_raw hid ata_piix >>> [last unloaded: e1000] >>> [183168.740173] >>> [183168.740173] Pid: 2557, comm: eve_dal Not tainted 2.6.37 #1 /Bochs >>> [183168.740173] EIP: 0060:[] EFLAGS: 00010246 CPU: 0 >>> [183168.740173] EIP is at xnheap_init+0x1cf/0x210 >>> [183168.740173] EAX: 00700bf1 EBX: eed0e210 ECX: eed0e730 EDX: eed0e2fc >>> [183168.740173] ESI: 00000000 EDI: 00000000 EBP: eed27da4 ESP: eed27d7c >>> [183168.740173] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 >>> [183168.740173] Process eve_dal (pid: 2557, ti=eed26000 >>> task=f5773280 task.ti=eed26000) >>> [183168.740173] I-pipe domain Linux >>> [183168.740173] Stack: >>> [183168.740173] eed0e304 00000030 c157d4a9 eed0e210 eed0e2fc >>> eed0e2fc 0000003e 00000000 >>> [183168.740173] f85ea000 eed0e210 eed27dc8 c10c0c44 00001000 >>> 00000000 f85aa000 00040000 >>> [183168.740173] eed0e200 fffffff4 eed0e210 eed27df0 c10cf198 >>> eed27de4 c1058b86 eed27f20 >>> [183168.740173] Call Trace: >>> [183168.740173] [] ? xnheap_init_mapped+0xd4/0x210 >>> [183168.740173] [] ? xnshadow_sys_event+0x68/0x210 >>> [183168.740173] [] ? commit_creds+0xe6/0x190 >>> [183168.740173] [] ? xnshadow_sys_bind+0x293/0x420 >>> [183168.740173] [] ? __d_lookup+0x12e/0x160 >>> [183168.740173] [] ? dput+0x66/0x1b0 >>> [183168.740173] [] ? path_to_nameidata+0x1e/0x50 >>> [183168.740173] [] ? link_path_walk+0x422/0x7c0 >>> [183168.740173] [] ? path_put+0x25/0x30 >>> [183168.740173] [] ? __ipipe_restore_root+0x1d/0x30 >>> [183168.740173] [] ? kmem_cache_free+0xa7/0x100 >>> [183168.740173] [] ? putname+0x2a/0x40 >>> [183168.740173] [] ? user_path_at+0x4a/0x80 >>> [183168.740173] [] ? losyscall_event+0xad/0x200 >>> [183168.740173] [] ? __ipipe_dispatch_event+0xb5/0x170 >>> [183168.740173] [] ? losyscall_event+0x0/0x200 >>> [183168.740173] [] ? __ipipe_syscall_root+0x45/0xd0 >>> [183168.740173] [] ? system_call+0x2d/0x53 >>> [183168.740173] Code: 24 e8 a6 cc 19 00 fa 8b 0d 28 36 61 c1 0f ba >>> 2d c0 1b 61 c1 00 19 f6 8b 55 e8 83 e6 01 89 8b f0 00 00 00 8b 01 89 83 >>> ec 00 00 00<89> 50 04 31 c0 89 11 8b 15 c0 1b 61 c1 83 05 2c 36 61 c1 01 83 >>> [183168.740173] EIP: [] xnheap_init+0x1cf/0x210 SS:ESP >>> 0068:eed27d7c >>> [183168.740173] CR2: 0000000000700bf5 >>> >>> As you can see, this happened with kernel 2.6.37, and I built it with >>> Xenomai 2.6.0. The offending instruction was at xnheap_init + 463: >>> >>> 0xc10c090b: mov -0x18(%ebp),%edx >>> 0xc10c090e: and $0x1,%esi >>> 0xc10c0911: mov %ecx,0xf0(%ebx) >>> 0xc10c0917: mov (%ecx),%eax >>> 0xc10c0919: mov %eax,0xec(%ebx) >>> 0xc10c091f: mov %edx,0x4(%eax) >>> 0xc10c0922: xor %eax,%eax >>> 0xc10c0924: mov %edx,(%ecx) >>> 0xc10c0926: mov 0xc1611bc0,%edx >>> 0xc10c092c: addl $0x1,0xc161362c >>> 0xc10c0933: addl $0x1,0xc17c83e4 >>> >>> This corresponds to ath(xnholder_t *, xnholder_t *) in >>> include/xenomai/nucleus/queue.h, line 48: >>> >>> 43 static inline void ath(xnholder_t *head, xnholder_t *holder) >>> 44 { >>> 45 /* Inserts the new element right after the heading one */ >>> 46 holder->last = head; >>> 47 holder->next = head->next; >>> 48 holder->next->last = holder; >>> 49 head->next = holder; >>> 50 } >>> >>> It's apparently the call to appendq() at >>> kernel/xenomai/nucleus/heap.c:332 that does this, with a junk pointer >>> dereference. So, heap->stat_link.next is not valid at the time of this >>> call, yet it's initialized by the call to inith() on line 319. I don't >>> know what would have changed that, unless it's a bad pointer elsewhere >>> that caused overwriting of this data. Any ideas where to go from here? >>> >> If the bug is reproducible, two things you can try: >> - enable CONFIG_XENO_OPT_DEBUG_QUEUES >> - enable the I-pipe tracer and panic freezes, you should get a trace >> when the bug happens. >> > Hi Gilles, > > I finally got a bit more information. The crash occurred again today on > my testing hardware, so I installed a kernel with I-pipe trace and queue > debugging and tried to reproduce. I didn't get the same error, and the > kernel didn't oops, but I did get some interesting-looking information > in the log. It looks like something bad was happening with XDDP, but I > can't figure out what. Hopefully the attached log file will get through. Nothing seems obvious from the trace. Also, you may want to increase the trace to 10000 points for instance, in order to have more history. But if the issue is a memory corruption, chances are that it may not be enough. What version of xenomai and the I-pipe patch are you using? -- Gilles.