All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-help] Buffer Overrun -> Kernel Explosion
@ 2006-04-28  5:02 Brian L.
  2006-05-03 11:42 ` Jan Kiszka
  0 siblings, 1 reply; 3+ messages in thread
From: Brian L. @ 2006-04-28  5:02 UTC (permalink / raw)
  To: xenomai

I'm finally set up with netconsole to catch panics/crashes when they
happen so now I can report more information on the one I alluded to a
week or two ago in my "General Question.." thread.

What I did to cause it was write past the end of a buffer returned by
rt_queue_alloc. I'm not entirely sure if this message came at the
moment of the write (unlikely, IMHO) or later when more xnheap
activity took place. The crash popped up in several different ways
depending on what code paths I enabled/disabled.

What concerns me is that polluting an xnheap can bring the system to
its knees so harshly. I can see why it could be *very* hard to police
this sort of problem without destroying the performance of xnheap, so
it wouldn't surprise me if this is "normal". Still, though, it's sad
that user-space code can bring the system down after something as
innocent as a fencepost error in a string copy routine...

Thoughts? I've pasted the console dump below.


===========================================================
Unable to handle kernel paging request at virtual address 29343231
 printing eip:
c013b8dc
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
Modules linked in: intel_agp agpgart
CPU:    0
EIP:    0060:[<c013b8dc>]    Not tainted VLI
EFLAGS: 00010002   (2.6.15-ipipe)
EIP is at xnheap_alloc+0x9c/0x170
eax: d0800d54   ebx: 00000020   ecx: d0800d54   edx: c03d5740
ds: 007b   es: 007b   ss: 0068
I-pipe domain Xenomai
Stack: cf913a50 00000001 00000092 00000001 00000005 d0800d20 00000001 c03d5740
       ce52bf0c c015fda6 d0800d54 00000015 d0800d20 ce52bfbc 00000000 ce52bf40
       c015ab10 d0800d20 00000005 00000010 00000000 00000003 d0800d54 b7f0c000
Call Trace:
 [<c010375f>] show_stack+0x7f/0xa0
 [<c01038fe>] show_registers+0x14e/0x1f0
 [<c0103b45>] die+0xf5/0x1c0
 [<c010ea8e>] do_page_fault+0x2ee/0x5bc
 [<c010d7df>] __ipipe_handle_exception+0x3f/0xc0
 [<c01033b0>] error_code+0x54/0x64
 [<c015fda6>] rt_queue_alloc+0x56/0xd0

 [<c0103c0d>] die+0x1bd/0x1c0
 [<c010ea8e>] do_page_fault+0x2ee/0x5bc
 [<c010d7df>] __ipipe_handle_exception+0x3f/0xc0
 [<c01033b0>] error_code+0x54/0x64
 [<c015fda6>] rt_queue_alloc+0x56/0xd0
 [<c015ab10>] __rt_queue_alloc+0xb0/0x130
 [<c014d0fa>] hisyscall_event+0x29a/0x3a0
 [<c0139cd3>] __ipipe_dispatch_event+0xe3/0x100
 [<c010d6e0>] __ipipe_syscall_root+0x30/0xe0
 [<c0103138>] system_call+0x20/0x41
Code: 74 33 8b 46 0c a8 08 75 22 83 c8 08 89 46 0c ff ff 21 e0 ff 40 14 eb
------------[ cut here ]------------
kernel BUG at kernel/exit.c:877!
invalid operand: 0000 [#5]
PREEMPT
Modules linked in: intel_agp agpgart
CPU:    0
EIP:    0060:[<c01183f2>]    Not tainted VLI
EFLAGS: 00010202   (2.6.15-ipipe)
EIP is at do_exit+0x302/0x5f0
eax: 0080200c   ebx: 00000000   ecx: ce52b8c0   edx: ce52b8c0
ds: 007b   es: 007b   ss: 0068
I-pipe domain Xenomai
Stack: cf64d570 c03d5740 ffffffff ce52ba48 00000001 00000000 ffffffff 00000000
       ce52b93c c0103c0d 00000000 00000001 ce52b914 00000004 ce52ba48 c031bcd0
       00000000 000000ff 0000000b ce52ba48 ce52b93c
CPU:    0
EIP:    0060:[<c01183f2>]    Not tainted VLI
EFLAGS: 00010202   (2.6.15-ipipe)
EIP is at do_exit+0x302/0x5f0
eax: 0080200c   ebx: 00000000   ecx: ce52b720   edx: ce52b720
esi: cf64d570   edi: 0000000b   ebp: ce52b75c   esp: ce52b73c
ds: 007b   es: 007b   ss: 0068
I-pipe domain Xenomai
Stack: cf64d570 c03d5740 ffffffff ce52b8a8 00000000 00000000 ffffffff 00000000
       ce52b79c c0103c0d 00000000 00000001 ce52b774 00000005 ce52b8a8 c031bcd0
       00000000 000000ff 0000000b ce52b8a8 ce52b79c c010f3a7 cf64d570 ce52b8a8
Call Trace:
 [<c010375f>] show_stack+0x7f/0xa0
 [<c01038fe>] show_registers+0x14e/0x1f0
 [<c0103b45>] die+0xf5/0x1c0
 [<c0103cc5>] do_trap+0xb5/0xc0
 [<c010400c>] do_invalid_op+0xbc/0xd0
 [<c010d7df>] __ipipe_handle_exception+0x3f/0xc0
 [<c01033b0>] error_code+0x54/0x64
<0f> 0b 6d 03 27 cf 31 c0 eb d4 b8 00 e0 ff ff 21 e0 ff 40 14 eb
------------[ cut here ]------------
kernel BUG at kernel/exit.c:877!
invalid operand: 0000 [#7]
PREEMPT
Modules linked in: intel_agp agpgart
CPU:    0
EIP:    0060:[<c01183f2>]    Not tainted VLI
EFLAGS: 00010202   (2.6.15-ipipe)
EIP is at do_exit+0x302/0x5f0
eax: 0080200c   ebx: 00000000   ecx: ce52b580   edx: ce52b580
esi: cf64d570   edi: 0000000b   ebp: ce52b5bc   esp: ce52b59c
ds: 007b   es: 007b   ss: 0068
I-pipe domain Xenomai
Stack:  [<c010400c>] do_invalid_op+0xbc/0xd0
 [<c010d7df>] __ipipe_handle_exception+0x3f/0xc0
 [<c01033b0>] error_code+0x54/0x64
 [<c0103c0d>] die+0x1bd/0x1c0
 [<c010ea8e>] do_page_fault+0x2ee/0x5bc
 [<c010d7df>] __ipipe_handle_exception+0x3f/0xc0
 [<c01033b0>] error_code+0x54/0x64
 [<c015fda6>] rt_queue_alloc+0x56/0xd0
 [<c015ab10>] __rt_queue_alloc+0xb0/0x130
 [<c01033b0>] error_code+0x54/0x64
 [<c0103c0d>] die+0x1bd/0x1c0
 [<c0103cc5>] do_trap+0xb5/0xc0
 [<c010400c>] do_invalid_op+0xbc/0xd0
 [<c010d7df>] __ipipe_handle_exception+0x3f/0xc0
 [<c01033b0>] error_code+0x54/0x64
 [<c0103c0d>] die+0x1bd/0x1c0
 [<c0103cc5>] do_trap+0xb5/0xc0
 [<c010400c>] do_invalid_op+0xbc/0xd0c0 eb b8 00 e0 ff ff 21 e0 ff 40 14 eb
------------[ cut here ]------------
kernel BUG at kernel/exit.c:877!
invalid operand: 0000 [#13]
PREEMPT
Modules linked in: intel_agp agpgart
CPU:    0
EIP:    0060:[<c01183f2>]    Not tainted VLI
die+0x1bd/0x1c0
 [<c0103cc5>] do_trap+0xb5/0xc0
 [<c010400c>] do_invalid_op+0xbc/0xd0
 [<c010d7df>] __ipipe_handle_exception+0x3f/0xc0
 [<c01033b0>] error_code+0x54/0x64
 [<c0103c0d>] die+0x1bd/0x1c0
 [<c0103cc5>] do_trap+0xb5/0xc0
 [<c010400c>] do_invalid_op+0xbc/0xd0
 [<c010d7df>] __ipipe_handle_exception+0x3f/0xc0
 [<c01033b0>] error_code+0x54/0x64
 [<c0103c0d>] die+0x1bd/0x1c000 eb fe <0f> 0b 6d 03 27 cf 31 c0 eb d4
b8 00 e0 ff ff 21 e0 ff 40 14 eb
------------[ cut here ]-------- [<c010d7df>] __ipipe_handle_exception+0x3f/0xc0
 [<c01033b0>] error_code+0x54/0x64
 [<c0103c0d>] die+0x1bd/0x1c0
 [<c0103cc5>] do_trap+0xb5/0xc0
 [<c010400c>] do_invalid_op+0xbc/0xd0
 [<c010d7df>] __ipipe_handle_exception+0x3f/0xc0
 [<c01033b0>] error_code+0x54/0x64
 [<c0103c0d>] die+0x1bd/0x1c0
 [<c0103cc5>] do_trap+0xb5/0xc0
 [<c010400c>]  =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
 =======================
03 27 cf 31 c0 8d b6 00 00 00 00 8d bc 27 00 00 00 00 eb fe <0f> 0b 6d
03 27 cf 31 c0 eb


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Xenomai-help] Buffer Overrun -> Kernel Explosion
  2006-04-28  5:02 [Xenomai-help] Buffer Overrun -> Kernel Explosion Brian L.
@ 2006-05-03 11:42 ` Jan Kiszka
  2006-05-03 12:29   ` Philippe Gerum
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Kiszka @ 2006-05-03 11:42 UTC (permalink / raw)
  To: Brian L.; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 1430 bytes --]

Brian L. wrote:
> I'm finally set up with netconsole to catch panics/crashes when they
> happen so now I can report more information on the one I alluded to a
> week or two ago in my "General Question.." thread.
> 
> What I did to cause it was write past the end of a buffer returned by
> rt_queue_alloc. I'm not entirely sure if this message came at the
> moment of the write (unlikely, IMHO) or later when more xnheap
> activity took place. The crash popped up in several different ways
> depending on what code paths I enabled/disabled.
> 
> What concerns me is that polluting an xnheap can bring the system to
> its knees so harshly. I can see why it could be *very* hard to police
> this sort of problem without destroying the performance of xnheap, so
> it wouldn't surprise me if this is "normal". Still, though, it's sad
> that user-space code can bring the system down after something as
> innocent as a fencepost error in a string copy routine...
> 
> Thoughts? I've pasted the console dump below.
> 

I remember that control structures and data are tightly knotted in
xnheaps, but I agree with you that this should not lead so easily to
such crashes for user space apps. Maybe some magic number check could
help to reduce the chance for now.

A cleaner long-term solution would be to decouple both regions.
Philippe, is this feasible (I'm not that deep in the internals of xnheap)?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Xenomai-help] Buffer Overrun -> Kernel Explosion
  2006-05-03 11:42 ` Jan Kiszka
@ 2006-05-03 12:29   ` Philippe Gerum
  0 siblings, 0 replies; 3+ messages in thread
From: Philippe Gerum @ 2006-05-03 12:29 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai, Brian L.

Jan Kiszka wrote:
> Brian L. wrote:
> 
>>I'm finally set up with netconsole to catch panics/crashes when they
>>happen so now I can report more information on the one I alluded to a
>>week or two ago in my "General Question.." thread.
>>
>>What I did to cause it was write past the end of a buffer returned by
>>rt_queue_alloc. I'm not entirely sure if this message came at the
>>moment of the write (unlikely, IMHO) or later when more xnheap
>>activity took place. The crash popped up in several different ways
>>depending on what code paths I enabled/disabled.
>>
>>What concerns me is that polluting an xnheap can bring the system to
>>its knees so harshly. I can see why it could be *very* hard to police
>>this sort of problem without destroying the performance of xnheap, so
>>it wouldn't surprise me if this is "normal". Still, though, it's sad
>>that user-space code can bring the system down after something as
>>innocent as a fencepost error in a string copy routine...
>>
>>Thoughts? I've pasted the console dump below.
>>
> 
> 
> I remember that control structures and data are tightly knotted in
> xnheaps, but I agree with you that this should not lead so easily to
> such crashes for user space apps. Maybe some magic number check could
> help to reduce the chance for now.
> 
> A cleaner long-term solution would be to decouple both regions.
> Philippe, is this feasible (I'm not that deep in the internals of xnheap)?
> 

Almost everything could be done, with the proper overhead, I mean. As 
you know, there are quite a number of ways to kill your box with RT 
activity, even without trashing Xenomai's internal data structures. 
Causing a fencepost error when copying data that trashes other crucial 
application data comes to mind, which in turns might cause all sort of 
weird behaviours, including hard lockups due to unexpected infinite 
loops; this might also happen whether the system data are isolated in a 
write protected segment or not. The same goes with plain MMIO areas, 
just write garbage over such memory which might have I/O side-effects, 
and watch the box go south, no need for Xenomai here.

Since providing isolated individual data buffers is out of question for 
obvious performance reasons, rt_queue_read/write introduced in 2.2 
provide a way to send/receive data blocks without having to share the 
memory with the heap, at the expense of the data being transfered 
to/from kernel space during the calls.

In contrast, rt_queue_send/receive have been designed to handle the 
internal buffer space immediately, so that applications could prepare 
significant amount of data directly into it before transmission, which 
saves the ultimate copy to transfer the buffer, at the expense of being 
able to access the entire heap memory with no particular protection too. 
As our friend Spider-Man says, "With great power comes great 
responsibility".

-- 

Philippe.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-05-03 12:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-28  5:02 [Xenomai-help] Buffer Overrun -> Kernel Explosion Brian L.
2006-05-03 11:42 ` Jan Kiszka
2006-05-03 12:29   ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.