[Xenomai-help] Non-APIC setup broken for 2.4-SVN?

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai-help] Non-APIC setup broken for 2.4-SVN?
@ 2007-09-25 14:09 Jan Kiszka
  2007-09-25 14:25 ` Leopold Palomo-Avellaneda
  2007-09-25 21:37 ` Philippe Gerum
  0 siblings, 2 replies; 31+ messages in thread
From: Jan Kiszka @ 2007-09-25 14:09 UTC (permalink / raw)
  To: xenomai

Hi,

to make it short has anyone or could anyone try latest SVN over kernel 
2.6.20 (i-pipe for 2.6.22 is instable) _without_ LAPIC? It works for me, 
but exposes milliseconds of latencies. Same setup with LAPIC enabled is 
fine and obviously stable. Before digging into this (time is short 
anyway), I would like to exclude problems of my notebook hardware.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?
  2007-09-25 14:09 [Xenomai-help] Non-APIC setup broken for 2.4-SVN? Jan Kiszka
@ 2007-09-25 14:25 ` Leopold Palomo-Avellaneda
  2007-09-25 21:37 ` Philippe Gerum
  1 sibling, 0 replies; 31+ messages in thread
From: Leopold Palomo-Avellaneda @ 2007-09-25 14:25 UTC (permalink / raw)
  To: xenomai

A Dimarts 25 Setembre 2007 16:09, Jan Kiszka va escriure:
> Hi,
>
> to make it short has anyone or could anyone try latest SVN over kernel
> 2.6.20 (i-pipe for 2.6.22 is instable) _without_ LAPIC? 

I'm running a 2.6.22 kernel patched with xenomai trunk (last week) and booting 
with noapic in the grub.

In the test, I got:

running: ./run -- -sh -T 120 -t0 # latency
*
*
* Type ^C to stop this application.
*
*
== Sampling period: 100 us
== Test mode: periodic user-mode task
== All results in microseconds
warming up...
RTT|  00:00:01  (periodic user-mode task, 100 us period, priority 99)
RTH|-----lat min|-----lat avg|-----lat max|-overrun|----lat best|---lat worst
RTD|     -16.738|     -16.613|     -14.489|       0|     -16.738|     -14.489
RTD|     -16.726|     -16.602|      -9.150|       0|     -16.738|      -9.150
RTD|     -16.738|     -16.587|     -10.463|       0|     -16.738|      -9.150
RTD|     -16.731|     -16.602|     -11.780|       0|     -16.738|      -9.150
RTD|     -16.731|     -16.605|     -13.943|       0|     -16.738|      -9.150
RTD|     -16.733|     -16.586|     -10.669|       0|     -16.738|      -9.150
RTD|     -16.732|     -16.604|     -13.736|       0|     -16.738|      -9.150
RTD|     -16.740|     -16.603|     -10.243|       0|     -16.740|      -9.150
RTD|     -16.732|     -16.585|     -10.725|       0|     -16.740|      -9.150
RTD|     -16.727|     -16.601|     -13.796|       0|     -16.740|      -9.150
RTD|     -16.726|     -16.603|     -13.741|       0|     -16.740|      -9.150       

> It works for me, 
> but exposes milliseconds of latencies. Same setup with LAPIC enabled is
> fine and obviously stable. Before digging into this (time is short
> anyway), I would like to exclude problems of my notebook hardware.

I cannot run it with lapic because the system hung.

Regards,

Leo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?
  2007-09-25 14:09 [Xenomai-help] Non-APIC setup broken for 2.4-SVN? Jan Kiszka
  2007-09-25 14:25 ` Leopold Palomo-Avellaneda
@ 2007-09-25 21:37 ` Philippe Gerum
  2007-09-26  9:31   ` Jan Kiszka
  1 sibling, 1 reply; 31+ messages in thread
From: Philippe Gerum @ 2007-09-25 21:37 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Tue, 2007-09-25 at 16:09 +0200, Jan Kiszka wrote:
> Hi,
> 
> to make it short has anyone or could anyone try latest SVN over kernel 
> 2.6.20 (i-pipe for 2.6.22 is instable) _without_ LAPIC? It works for me, 
> but exposes milliseconds of latencies. Same setup with LAPIC enabled is 
> fine and obviously stable. Before digging into this (time is short 
> anyway), I would like to exclude problems of my notebook hardware.
> 

Not confirmed here. Everything looks ok.

> Thanks,
> Jan
> 
-- 
Philippe.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?
  2007-09-25 21:37 ` Philippe Gerum
@ 2007-09-26  9:31   ` Jan Kiszka
  2007-09-30 10:22     ` Jan Kiszka
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2007-09-26  9:31 UTC (permalink / raw)
  To: rpm; +Cc: xenomai

Philippe Gerum wrote:
> On Tue, 2007-09-25 at 16:09 +0200, Jan Kiszka wrote:
>> Hi,
>>
>> to make it short has anyone or could anyone try latest SVN over kernel 
>> 2.6.20 (i-pipe for 2.6.22 is instable) _without_ LAPIC? It works for me, 
>> but exposes milliseconds of latencies. Same setup with LAPIC enabled is 
>> fine and obviously stable. Before digging into this (time is short 
>> anyway), I would like to exclude problems of my notebook hardware.
>>
> 
> Not confirmed here. Everything looks ok.

Yes, I found a second box working fine with a comparable (though not 
identical) config. I suspect some hw issue of my notebook now, but I 
will try to look into this a bit deeper once time permits.

More important is to understand and fix the consistent 2.6.22 issues I 
see. Only works in qemu for me, all real i386 boxes lock up sooner or 
later. :-/

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?
  2007-09-26  9:31   ` Jan Kiszka
@ 2007-09-30 10:22     ` Jan Kiszka
  2007-09-30 10:52       ` Philippe Gerum
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2007-09-30 10:22 UTC (permalink / raw)
  To: rpm; +Cc: xenomai


[-- Attachment #1.1: Type: text/plain, Size: 1508 bytes --]

Jan Kiszka wrote:
> Philippe Gerum wrote:
>> On Tue, 2007-09-25 at 16:09 +0200, Jan Kiszka wrote:
>>> Hi,
>>>
>>> to make it short has anyone or could anyone try latest SVN over kernel 
>>> 2.6.20 (i-pipe for 2.6.22 is instable) _without_ LAPIC? It works for me, 
>>> but exposes milliseconds of latencies. Same setup with LAPIC enabled is 
>>> fine and obviously stable. Before digging into this (time is short 
>>> anyway), I would like to exclude problems of my notebook hardware.
>>>
>> Not confirmed here. Everything looks ok.
> 
> Yes, I found a second box working fine with a comparable (though not 
> identical) config. I suspect some hw issue of my notebook now, but I 
> will try to look into this a bit deeper once time permits.

It was a correlation between trying to use the PIT while Linux fiddled
with the HPET (when HPET is in use, the PIT cannot be programmed). See
my attached warning patch which may prevent others from pulling hairs.

> More important is to understand and fix the consistent 2.6.22 issues I 
> see. Only works in qemu for me, all real i386 boxes lock up sooner or 
> later. :-/

I'm on this. Multiple bugs are involved, at least one of them is visible
already with vanilla on my work notebook. Another one is related to
Xenomai not dealing with the lapic device being in
CLOCK_EVT_MODE_SHUTDOWN whenever the NMI watchdog is in use. And a third
one only gives me "Detected illicit call from domain Xenomai" before the
box reboots. :(

Jan

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: hpet-conflict-detection.patch --]
[-- Type: text/x-patch; name="hpet-conflict-detection.patch", Size: 838 bytes --]

Index: xenomai/scripts/Kconfig.frag
===================================================================
--- xenomai/scripts/Kconfig.frag	(Revision 3019)
+++ xenomai/scripts/Kconfig.frag	(Arbeitskopie)
@@ -11,8 +11,13 @@ comment "NOTE: Xenomai conflicts with PC
 comment "(menu Device Drivers/Input device support/Miscellaneous devices)"
 	depends on !X86_TSC && X86 && INPUT_PCSPKR
 
+comment "NOTE: Xenomai conflicts with HPET Timer support."
+	depends on !X86_LOCAL_APIC && X86 && HPET_TIMER
+comment "(menu Processor type and features/HPET Timer Support)"
+	depends on !X86_LOCAL_APIC && X86 && HPET_TIMER
+
 config XENOMAI
-	depends on (X86_TSC || !X86 || !INPUT_PCSPKR)
+	depends on ((X86_TSC || !X86 || !INPUT_PCSPKR) && (!HPET_TIMER || !X86 || X86_LOCAL_APIC))
 	bool "Xenomai"
 	default y
         select IPIPE

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?
  2007-09-30 10:22     ` Jan Kiszka
@ 2007-09-30 10:52       ` Philippe Gerum
  2007-09-30 11:00         ` Jan Kiszka
  0 siblings, 1 reply; 31+ messages in thread
From: Philippe Gerum @ 2007-09-30 10:52 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
> Jan Kiszka wrote:
> > Philippe Gerum wrote:
> >> On Tue, 2007-09-25 at 16:09 +0200, Jan Kiszka wrote:
> >>> Hi,
> >>>
> >>> to make it short has anyone or could anyone try latest SVN over kernel 
> >>> 2.6.20 (i-pipe for 2.6.22 is instable) _without_ LAPIC? It works for me, 
> >>> but exposes milliseconds of latencies. Same setup with LAPIC enabled is 
> >>> fine and obviously stable. Before digging into this (time is short 
> >>> anyway), I would like to exclude problems of my notebook hardware.
> >>>
> >> Not confirmed here. Everything looks ok.
> > 
> > Yes, I found a second box working fine with a comparable (though not 
> > identical) config. I suspect some hw issue of my notebook now, but I 
> > will try to look into this a bit deeper once time permits.
> 
> It was a correlation between trying to use the PIT while Linux fiddled
> with the HPET (when HPET is in use, the PIT cannot be programmed). See
> my attached warning patch which may prevent others from pulling hairs.
> 

Yes, this one looks better than the previous attempt to prevent HPET
+Xenomai globally which would create unwanted situations with x86_64
IIRC. I'll test this, and will probably merge.

> > More important is to understand and fix the consistent 2.6.22 issues I 
> > see. Only works in qemu for me, all real i386 boxes lock up sooner or 
> > later. :-/
> 
> I'm on this. Multiple bugs are involved, at least one of them is visible
> already with vanilla on my work notebook.

Do you work with 2.6.22 baseline, or the latest stable update?

>  Another one is related to
> Xenomai not dealing with the lapic device being in
> CLOCK_EVT_MODE_SHUTDOWN whenever the NMI watchdog is in use.

Btw, I have a patch for the latest issue we talked about regarding
kernel-disabled LAPIC. It changes the generic inner interface a bit for
other archs too, so I will commit this asap.

>  And a third
> one only gives me "Detected illicit call from domain Xenomai" before the
> box reboots. :(

Grmff... Do you run with your smp_processor_id() instrumentation in?

> 
> Jan
-- 
Philippe.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?
  2007-09-30 10:52       ` Philippe Gerum
@ 2007-09-30 11:00         ` Jan Kiszka
  2007-09-30 11:42           ` [Xenomai-core] crashing 2.6.22 (was: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?) Jan Kiszka
  2007-09-30 19:52           ` [Xenomai-help] Non-APIC setup broken for 2.4-SVN? Philippe Gerum
  0 siblings, 2 replies; 31+ messages in thread
From: Jan Kiszka @ 2007-09-30 11:00 UTC (permalink / raw)
  To: rpm; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 2687 bytes --]

Philippe Gerum wrote:
> On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>> Philippe Gerum wrote:
>>>> On Tue, 2007-09-25 at 16:09 +0200, Jan Kiszka wrote:
>>>>> Hi,
>>>>>
>>>>> to make it short has anyone or could anyone try latest SVN over kernel 
>>>>> 2.6.20 (i-pipe for 2.6.22 is instable) _without_ LAPIC? It works for me, 
>>>>> but exposes milliseconds of latencies. Same setup with LAPIC enabled is 
>>>>> fine and obviously stable. Before digging into this (time is short 
>>>>> anyway), I would like to exclude problems of my notebook hardware.
>>>>>
>>>> Not confirmed here. Everything looks ok.
>>> Yes, I found a second box working fine with a comparable (though not 
>>> identical) config. I suspect some hw issue of my notebook now, but I 
>>> will try to look into this a bit deeper once time permits.
>> It was a correlation between trying to use the PIT while Linux fiddled
>> with the HPET (when HPET is in use, the PIT cannot be programmed). See
>> my attached warning patch which may prevent others from pulling hairs.
>>
> 
> Yes, this one looks better than the previous attempt to prevent HPET
> +Xenomai globally which would create unwanted situations with x86_64
> IIRC. I'll test this, and will probably merge.

Great.

> 
>>> More important is to understand and fix the consistent 2.6.22 issues I 
>>> see. Only works in qemu for me, all real i386 boxes lock up sooner or 
>>> later. :-/
>> I'm on this. Multiple bugs are involved, at least one of them is visible
>> already with vanilla on my work notebook.
> 
> Do you work with 2.6.22 baseline, or the latest stable update?

2.6.22.7 and .23-rc8. Both hang hard when I play with the backlight keys
of my Fujitsu-Siemens Lifebook, while this works fine (with and without
Xenomai) under 2.6.20.20. Weird.

> 
>>  Another one is related to
>> Xenomai not dealing with the lapic device being in
>> CLOCK_EVT_MODE_SHUTDOWN whenever the NMI watchdog is in use.
> 
> Btw, I have a patch for the latest issue we talked about regarding
> kernel-disabled LAPIC. It changes the generic inner interface a bit for
> other archs too, so I will commit this asap.

Looking forward!

> 
>>  And a third
>> one only gives me "Detected illicit call from domain Xenomai" before the
>> box reboots. :(
> 
> Grmff... Do you run with your smp_processor_id() instrumentation in?

Yes, but I suspect this is just a symptom of some severe memory
corruption that (also?) hits I-pipe data structures. I just put in some
different instrumentation, and that warning is gone, the box just hangs
hard at a different point. Very unfriendly.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Xenomai-core] crashing 2.6.22 (was: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?)
  2007-09-30 11:00         ` Jan Kiszka
@ 2007-09-30 11:42           ` Jan Kiszka
  2007-09-30 12:42             ` Philippe Gerum
  2007-09-30 19:45             ` [Xenomai-core] crashing 2.6.22 (was: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?) Philippe Gerum
  2007-09-30 19:52           ` [Xenomai-help] Non-APIC setup broken for 2.4-SVN? Philippe Gerum
  1 sibling, 2 replies; 31+ messages in thread
From: Jan Kiszka @ 2007-09-30 11:42 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 4938 bytes --]

Jan Kiszka wrote:
> Philippe Gerum wrote:
>> On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
...
>>>  And a third
>>> one only gives me "Detected illicit call from domain Xenomai" before the
>>> box reboots. :(
>> Grmff... Do you run with your smp_processor_id() instrumentation in?
> 
> Yes, but I suspect this is just a symptom of some severe memory
> corruption that (also?) hits I-pipe data structures. I just put in some
> different instrumentation, and that warning is gone, the box just hangs
> hard at a different point. Very unfriendly.

Hah! Got some crash log by hacking a raw printk-to-uart:

[...]
<6>Xenomai: starting RTDM services.
<6>NET: Registered protocol family 10
<6>lo: Disabled Privacy Extensions
<6>ADDRCONF(NETDEV_UP): eth0: link is not ready
<3>I-pipe: Detected illicit call from domain 'Xenomai'
<3>        into a service reserved for domain 'Linux' and below.
       f3a6bc18 00000000 00000000 c05dad6c f3a6bc3c c0105fc3 c03513c7 c05dc100
       00000009 f3a6bc54 c01479cb c03592f8 c0357ae2 c035e069 f3a6bc88 f3a6bc70
       c0127224 c0111df8 00000000 f3a6bd74 00000000 f3a6bd74 f3a6bc80 c012727f
Call Trace:
 [<c010520f>] show_trace_log_lvl+0x1f/0x40
 [<c01052e1>] show_stack_log_lvl+0xb1/0xe0
 [<c0105fc3>] show_stack+0x33/0x40
 [<c01479cb>] ipipe_check_context+0x7b/0x90
 [<c0127224>] __atomic_notifier_call_chain+0x24/0x60
 [<c012727f>] atomic_notifier_call_chain+0x1f/0x30
 [<c0131e02>] notify_die+0x32/0x40
 [<c0105d29>] do_invalid_op+0x59/0xa0
 [<c0111d0b>] __ipipe_handle_exception+0x7b/0x144
 [<c02dfaeb>] error_code+0x6f/0x7c
 [<c0111d13>] __ipipe_handle_exception+0x83/0x144
 [<c02dfaeb>] error_code+0x6f/0x7c
 [<c01117df>] __ipipe_handle_irq+0x4f/0x140
 [<c0104c5e>] ipipe_ipi3+0x26/0x40
 [<c0111df8>] mcount+0x24/0x29
 [<c0115c49>] kunmap_atomic+0x9/0x60
 [<c015a040>] __handle_mm_fault+0x210/0x910
 [<c0114dac>] do_page_fault+0x1dc/0x5f0
 [<c0111d0b>] __ipipe_handle_exception+0x7b/0x144
 [<c02dfaeb>] error_code+0x6f/0x7c
 =======================
I-pipe tracer log (30 points):
    #*func                    0 ipipe_trace_panic_freeze+0x8 (ipipe_check_context+0x40)
    #*func                    0 ipipe_check_context+0xc (__atomic_notifier_call_chain+0x24)
    #*func                    0 __atomic_notifier_call_chain+0x14 (atomic_notifier_call_chain+0x1f)
    #*func                    0 atomic_notifier_call_chain+0xb (notify_die+0x32)
    #*func                    0 notify_die+0xb (do_invalid_op+0x59)
    #*func                    0 do_invalid_op+0x10 (__ipipe_handle_exception+0x7b)
    #*func                   -1 __ipipe_handle_exception+0xe (error_code+0x6f)
    #*func                   -1 __ipipe_restore_root+0x8 (__ipipe_handle_exception+0x83)
 |  #*func                   -2 do_page_fault+0xe (__ipipe_handle_exception+0x7b)
 |  # func                   -2 __ipipe_handle_exception+0xe (error_code+0x6f)
 |   +func                   -3 __ipipe_dispatch_wired+0x16 (__ipipe_handle_irq+0x4f)
 |   +func                   -3 __ipipe_ack_apic+0x8 (__ipipe_handle_irq+0x8f)
 |   +func                   -3 __ipipe_handle_irq+0x14 (ipipe_ipi3+0x26)
     +func                   -3 kunmap_atomic+0x9 (__handle_mm_fault+0x210)
     +func                   -3 ipipe_check_context+0xc (__handle_mm_fault+0x204)
     +func                   -4 page_add_file_rmap+0x8 (__handle_mm_fault+0x586)
     +func                   -4 ipipe_check_context+0xc (__handle_mm_fault+0x196)
     +func                   -4 kmap_atomic_prot+0xb (kmap_atomic+0x13)
     +func                   -4 kmap_atomic+0x8 (__handle_mm_fault+0x186)
     +func                   -4 mark_page_accessed+0x9 (filemap_nopage+0x13c)
     +func                   -4 ipipe_check_context+0xc (find_get_page+0x65)
     #func                   -4 __ipipe_unstall_root+0x8 (find_get_page+0x5b)
     #func                   -4 radix_tree_lookup+0x16 (find_get_page+0x36)
     #func                   -4 ipipe_check_context+0xc (find_get_page+0x2d)
     +func                   -5 ipipe_check_context+0xc (find_get_page+0x18)
     +func                   -5 find_get_page+0xa (filemap_nopage+0x1de)
     +func                   -5 filemap_nopage+0xe (__handle_mm_fault+0x11f)
     +func                   -5 ipipe_check_context+0xc (kunmap_atomic+0x50)
     +func                   -5 kunmap_atomic+0x9 (__handle_mm_fault+0xcc)
     +func                   -5 kmap_atomic_prot+0xb (kmap_atomic+0x13)
<0>PANIC: double fault, gdt at c0392000 [255 bytes]
<0>double fault, tss at c038d7e0
<0>eip = c0127266, esp = dfec1ff8
<0>eax = c05dad6c, ebx = dfec20f4, ecx = dfec2008, edx = 00000009
<0>esi = 00000000, edi = dfec20f4

Double fault, explains why it is so slippery... And the crash looks a
bit like that backtrace I once posted for an earlier ipipe version.

Time for a break, will dig deeper later - now that I have the tools :)

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] crashing 2.6.22 (was: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?)
  2007-09-30 11:42           ` [Xenomai-core] crashing 2.6.22 (was: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?) Jan Kiszka
@ 2007-09-30 12:42             ` Philippe Gerum
  2007-09-30 15:31               ` [Xenomai-core] crashing 2.6.22 Jan Kiszka
  2007-09-30 19:45             ` [Xenomai-core] crashing 2.6.22 (was: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?) Philippe Gerum
  1 sibling, 1 reply; 31+ messages in thread
From: Philippe Gerum @ 2007-09-30 12:42 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On Sun, 2007-09-30 at 13:42 +0200, Jan Kiszka wrote:
> Jan Kiszka wrote:
> > Philippe Gerum wrote:
> >> On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
> ...
> >>>  And a third
> >>> one only gives me "Detected illicit call from domain Xenomai" before the
> >>> box reboots. :(
> >> Grmff... Do you run with your smp_processor_id() instrumentation in?
> > 
> > Yes, but I suspect this is just a symptom of some severe memory
> > corruption that (also?) hits I-pipe data structures. I just put in some
> > different instrumentation, and that warning is gone, the box just hangs
> > hard at a different point. Very unfriendly.
> 
> Hah! Got some crash log by hacking a raw printk-to-uart:
> 
> [...]
> <6>Xenomai: starting RTDM services.
> <6>NET: Registered protocol family 10
> <6>lo: Disabled Privacy Extensions
> <6>ADDRCONF(NETDEV_UP): eth0: link is not ready
> <3>I-pipe: Detected illicit call from domain 'Xenomai'
> <3>        into a service reserved for domain 'Linux' and below.
>        f3a6bc18 00000000 00000000 c05dad6c f3a6bc3c c0105fc3 c03513c7 c05dc100
>        00000009 f3a6bc54 c01479cb c03592f8 c0357ae2 c035e069 f3a6bc88 f3a6bc70
>        c0127224 c0111df8 00000000 f3a6bd74 00000000 f3a6bd74 f3a6bc80 c012727f
> Call Trace:
>  [<c010520f>] show_trace_log_lvl+0x1f/0x40
>  [<c01052e1>] show_stack_log_lvl+0xb1/0xe0
>  [<c0105fc3>] show_stack+0x33/0x40
>  [<c01479cb>] ipipe_check_context+0x7b/0x90
>  [<c0127224>] __atomic_notifier_call_chain+0x24/0x60
>  [<c012727f>] atomic_notifier_call_chain+0x1f/0x30
>  [<c0131e02>] notify_die+0x32/0x40
>  [<c0105d29>] do_invalid_op+0x59/0xa0
>  [<c0111d0b>] __ipipe_handle_exception+0x7b/0x144
>  [<c02dfaeb>] error_code+0x6f/0x7c

Wow. Why that?

>  [<c0111d13>] __ipipe_handle_exception+0x83/0x144
>  [<c02dfaeb>] error_code+0x6f/0x7c

And this? We should not get any exception over an IPI3 handler. I guess
the double fault may be explained by this root cause.

>  [<c01117df>] __ipipe_handle_irq+0x4f/0x140
>  [<c0104c5e>] ipipe_ipi3+0x26/0x40

Our LAPIC timer vector. Are you running full modular or statically btw?

>  [<c0111df8>] mcount+0x24/0x29
>  [<c0115c49>] kunmap_atomic+0x9/0x60
>  [<c015a040>] __handle_mm_fault+0x210/0x910
>  [<c0114dac>] do_page_fault+0x1dc/0x5f0
>  [<c0111d0b>] __ipipe_handle_exception+0x7b/0x144
>  [<c02dfaeb>] error_code+0x6f/0x7c
>  =======================
> I-pipe tracer log (30 points):
>     #*func                    0 ipipe_trace_panic_freeze+0x8 (ipipe_check_context+0x40)
>     #*func                    0 ipipe_check_context+0xc (__atomic_notifier_call_chain+0x24)
>     #*func                    0 __atomic_notifier_call_chain+0x14 (atomic_notifier_call_chain+0x1f)
>     #*func                    0 atomic_notifier_call_chain+0xb (notify_die+0x32)
>     #*func                    0 notify_die+0xb (do_invalid_op+0x59)
>     #*func                    0 do_invalid_op+0x10 (__ipipe_handle_exception+0x7b)
>     #*func                   -1 __ipipe_handle_exception+0xe (error_code+0x6f)
>     #*func                   -1 __ipipe_restore_root+0x8 (__ipipe_handle_exception+0x83)
>  |  #*func                   -2 do_page_fault+0xe (__ipipe_handle_exception+0x7b)
>  |  # func                   -2 __ipipe_handle_exception+0xe (error_code+0x6f)
>  |   +func                   -3 __ipipe_dispatch_wired+0x16 (__ipipe_handle_irq+0x4f)
>  |   +func                   -3 __ipipe_ack_apic+0x8 (__ipipe_handle_irq+0x8f)
>  |   +func                   -3 __ipipe_handle_irq+0x14 (ipipe_ipi3+0x26)
>      +func                   -3 kunmap_atomic+0x9 (__handle_mm_fault+0x210)
>      +func                   -3 ipipe_check_context+0xc (__handle_mm_fault+0x204)
>      +func                   -4 page_add_file_rmap+0x8 (__handle_mm_fault+0x586)
>      +func                   -4 ipipe_check_context+0xc (__handle_mm_fault+0x196)
>      +func                   -4 kmap_atomic_prot+0xb (kmap_atomic+0x13)
>      +func                   -4 kmap_atomic+0x8 (__handle_mm_fault+0x186)
>      +func                   -4 mark_page_accessed+0x9 (filemap_nopage+0x13c)
>      +func                   -4 ipipe_check_context+0xc (find_get_page+0x65)
>      #func                   -4 __ipipe_unstall_root+0x8 (find_get_page+0x5b)
>      #func                   -4 radix_tree_lookup+0x16 (find_get_page+0x36)
>      #func                   -4 ipipe_check_context+0xc (find_get_page+0x2d)
>      +func                   -5 ipipe_check_context+0xc (find_get_page+0x18)
>      +func                   -5 find_get_page+0xa (filemap_nopage+0x1de)
>      +func                   -5 filemap_nopage+0xe (__handle_mm_fault+0x11f)
>      +func                   -5 ipipe_check_context+0xc (kunmap_atomic+0x50)
>      +func                   -5 kunmap_atomic+0x9 (__handle_mm_fault+0xcc)
>      +func                   -5 kmap_atomic_prot+0xb (kmap_atomic+0x13)
> <0>PANIC: double fault, gdt at c0392000 [255 bytes]
> <0>double fault, tss at c038d7e0
> <0>eip = c0127266, esp = dfec1ff8
> <0>eax = c05dad6c, ebx = dfec20f4, ecx = dfec2008, edx = 00000009
> <0>esi = 00000000, edi = dfec20f4
> 
> Double fault, explains why it is so slippery... And the crash looks a
> bit like that backtrace I once posted for an earlier ipipe version.
> 
> Time for a break, will dig deeper later - now that I have the tools :)
> 
> Jan
> 
-- 
Philippe.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] crashing 2.6.22
  2007-09-30 12:42             ` Philippe Gerum
@ 2007-09-30 15:31               ` Jan Kiszka
  2007-09-30 20:04                 ` Philippe Gerum
  2007-10-01  9:04                 ` Gilles Chanteperdrix
  0 siblings, 2 replies; 31+ messages in thread
From: Jan Kiszka @ 2007-09-30 15:31 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 2631 bytes --]

Philippe Gerum wrote:
> On Sun, 2007-09-30 at 13:42 +0200, Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>> Philippe Gerum wrote:
>>>> On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
>> ...
>>>>>  And a third
>>>>> one only gives me "Detected illicit call from domain Xenomai" before the
>>>>> box reboots. :(
>>>> Grmff... Do you run with your smp_processor_id() instrumentation in?
>>> Yes, but I suspect this is just a symptom of some severe memory
>>> corruption that (also?) hits I-pipe data structures. I just put in some
>>> different instrumentation, and that warning is gone, the box just hangs
>>> hard at a different point. Very unfriendly.
>> Hah! Got some crash log by hacking a raw printk-to-uart:
>>
>> [...]
>> <6>Xenomai: starting RTDM services.
>> <6>NET: Registered protocol family 10
>> <6>lo: Disabled Privacy Extensions
>> <6>ADDRCONF(NETDEV_UP): eth0: link is not ready
>> <3>I-pipe: Detected illicit call from domain 'Xenomai'
>> <3>        into a service reserved for domain 'Linux' and below.
>>        f3a6bc18 00000000 00000000 c05dad6c f3a6bc3c c0105fc3 c03513c7 c05dc100
>>        00000009 f3a6bc54 c01479cb c03592f8 c0357ae2 c035e069 f3a6bc88 f3a6bc70
>>        c0127224 c0111df8 00000000 f3a6bd74 00000000 f3a6bd74 f3a6bc80 c012727f
>> Call Trace:
>>  [<c010520f>] show_trace_log_lvl+0x1f/0x40
>>  [<c01052e1>] show_stack_log_lvl+0xb1/0xe0
>>  [<c0105fc3>] show_stack+0x33/0x40
>>  [<c01479cb>] ipipe_check_context+0x7b/0x90
>>  [<c0127224>] __atomic_notifier_call_chain+0x24/0x60
>>  [<c012727f>] atomic_notifier_call_chain+0x1f/0x30
>>  [<c0131e02>] notify_die+0x32/0x40
>>  [<c0105d29>] do_invalid_op+0x59/0xa0
>>  [<c0111d0b>] __ipipe_handle_exception+0x7b/0x144
>>  [<c02dfaeb>] error_code+0x6f/0x7c
> 
> Wow. Why that?
> 
>>  [<c0111d13>] __ipipe_handle_exception+0x83/0x144
>>  [<c02dfaeb>] error_code+0x6f/0x7c
> 
> And this? We should not get any exception over an IPI3 handler. I guess
> the double fault may be explained by this root cause.
> 
>>  [<c01117df>] __ipipe_handle_irq+0x4f/0x140
>>  [<c0104c5e>] ipipe_ipi3+0x26/0x40
> 
> Our LAPIC timer vector. Are you running full modular or statically btw?

Fully modular. Compiling the nucleus in makes the lock-up move to
another, once again invisible spot.

I nailed down the fault address in the scenario above. It's in the
nucleus module, at the first byte of xntimer_tick_aperiodic. Are we
loosing module text pages over the time? This functions must have been
executed before as the timer was armed while I collected the
/proc/modules and then triggered the crash.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] crashing 2.6.22
  2007-09-30 15:31               ` [Xenomai-core] crashing 2.6.22 Jan Kiszka
@ 2007-09-30 20:04                 ` Philippe Gerum
  2007-10-01  9:04                 ` Gilles Chanteperdrix
  1 sibling, 0 replies; 31+ messages in thread
From: Philippe Gerum @ 2007-09-30 20:04 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On Sun, 2007-09-30 at 17:31 +0200, Jan Kiszka wrote:
> Philippe Gerum wrote:
> > On Sun, 2007-09-30 at 13:42 +0200, Jan Kiszka wrote:
> >> Jan Kiszka wrote:
> >>> Philippe Gerum wrote:
> >>>> On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
> >> ...
> >>>>>  And a third
> >>>>> one only gives me "Detected illicit call from domain Xenomai" before the
> >>>>> box reboots. :(
> >>>> Grmff... Do you run with your smp_processor_id() instrumentation in?
> >>> Yes, but I suspect this is just a symptom of some severe memory
> >>> corruption that (also?) hits I-pipe data structures. I just put in some
> >>> different instrumentation, and that warning is gone, the box just hangs
> >>> hard at a different point. Very unfriendly.
> >> Hah! Got some crash log by hacking a raw printk-to-uart:
> >>
> >> [...]
> >> <6>Xenomai: starting RTDM services.
> >> <6>NET: Registered protocol family 10
> >> <6>lo: Disabled Privacy Extensions
> >> <6>ADDRCONF(NETDEV_UP): eth0: link is not ready
> >> <3>I-pipe: Detected illicit call from domain 'Xenomai'
> >> <3>        into a service reserved for domain 'Linux' and below.
> >>        f3a6bc18 00000000 00000000 c05dad6c f3a6bc3c c0105fc3 c03513c7 c05dc100
> >>        00000009 f3a6bc54 c01479cb c03592f8 c0357ae2 c035e069 f3a6bc88 f3a6bc70
> >>        c0127224 c0111df8 00000000 f3a6bd74 00000000 f3a6bd74 f3a6bc80 c012727f
> >> Call Trace:
> >>  [<c010520f>] show_trace_log_lvl+0x1f/0x40
> >>  [<c01052e1>] show_stack_log_lvl+0xb1/0xe0
> >>  [<c0105fc3>] show_stack+0x33/0x40
> >>  [<c01479cb>] ipipe_check_context+0x7b/0x90
> >>  [<c0127224>] __atomic_notifier_call_chain+0x24/0x60
> >>  [<c012727f>] atomic_notifier_call_chain+0x1f/0x30
> >>  [<c0131e02>] notify_die+0x32/0x40
> >>  [<c0105d29>] do_invalid_op+0x59/0xa0
> >>  [<c0111d0b>] __ipipe_handle_exception+0x7b/0x144
> >>  [<c02dfaeb>] error_code+0x6f/0x7c
> > 
> > Wow. Why that?
> > 
> >>  [<c0111d13>] __ipipe_handle_exception+0x83/0x144
> >>  [<c02dfaeb>] error_code+0x6f/0x7c
> > 
> > And this? We should not get any exception over an IPI3 handler. I guess
> > the double fault may be explained by this root cause.
> > 
> >>  [<c01117df>] __ipipe_handle_irq+0x4f/0x140
> >>  [<c0104c5e>] ipipe_ipi3+0x26/0x40
> > 
> > Our LAPIC timer vector. Are you running full modular or statically btw?
> 
> Fully modular. Compiling the nucleus in makes the lock-up move to
> another, once again invisible spot.
> 
> I nailed down the fault address in the scenario above. It's in the
> nucleus module, at the first byte of xntimer_tick_aperiodic. Are we
> loosing module text pages over the time?
>  This functions must have been
> executed before as the timer was armed while I collected the
> /proc/modules and then triggered the crash.

The timer is routed when the first skin binds to the nucleus. Modules
are unmapped while the box goes down for reboot, so maybe the timer is
not released in the LAPIC case upon such event. IIRC, I fixed a similar
issue in the PIT case recently, where rthal_timer_release() would not
call ipipe_release_tickdev(). It would be interesting to know whether
rthal_timer_release() is ever called at all upon shutdown. If not, the
kernel event notifier is likely going to be our friend soon...

> 
> Jan
> 
-- 
Philippe.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] crashing 2.6.22
  2007-09-30 15:31               ` [Xenomai-core] crashing 2.6.22 Jan Kiszka
  2007-09-30 20:04                 ` Philippe Gerum
@ 2007-10-01  9:04                 ` Gilles Chanteperdrix
  2007-10-01  9:17                   ` Jan Kiszka
  2007-10-08  7:33                   ` [Xenomai-core] crashing 2.6.22 Jan Kiszka
  1 sibling, 2 replies; 31+ messages in thread
From: Gilles Chanteperdrix @ 2007-10-01  9:04 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On 9/30/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> Philippe Gerum wrote:
> > On Sun, 2007-09-30 at 13:42 +0200, Jan Kiszka wrote:
> >> Jan Kiszka wrote:
> >>> Philippe Gerum wrote:
> >>>> On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
> >> ...
> >>>>>  And a third
> >>>>> one only gives me "Detected illicit call from domain Xenomai" before the
> >>>>> box reboots. :(
> >>>> Grmff... Do you run with your smp_processor_id() instrumentation in?
> >>> Yes, but I suspect this is just a symptom of some severe memory
> >>> corruption that (also?) hits I-pipe data structures. I just put in some
> >>> different instrumentation, and that warning is gone, the box just hangs
> >>> hard at a different point. Very unfriendly.
> >> Hah! Got some crash log by hacking a raw printk-to-uart:
> >>
> >> [...]
> >> <6>Xenomai: starting RTDM services.
> >> <6>NET: Registered protocol family 10
> >> <6>lo: Disabled Privacy Extensions
> >> <6>ADDRCONF(NETDEV_UP): eth0: link is not ready
> >> <3>I-pipe: Detected illicit call from domain 'Xenomai'
> >> <3>        into a service reserved for domain 'Linux' and below.
> >>        f3a6bc18 00000000 00000000 c05dad6c f3a6bc3c c0105fc3 c03513c7 c05dc100
> >>        00000009 f3a6bc54 c01479cb c03592f8 c0357ae2 c035e069 f3a6bc88 f3a6bc70
> >>        c0127224 c0111df8 00000000 f3a6bd74 00000000 f3a6bd74 f3a6bc80 c012727f
> >> Call Trace:
> >>  [<c010520f>] show_trace_log_lvl+0x1f/0x40
> >>  [<c01052e1>] show_stack_log_lvl+0xb1/0xe0
> >>  [<c0105fc3>] show_stack+0x33/0x40
> >>  [<c01479cb>] ipipe_check_context+0x7b/0x90
> >>  [<c0127224>] __atomic_notifier_call_chain+0x24/0x60
> >>  [<c012727f>] atomic_notifier_call_chain+0x1f/0x30
> >>  [<c0131e02>] notify_die+0x32/0x40
> >>  [<c0105d29>] do_invalid_op+0x59/0xa0
> >>  [<c0111d0b>] __ipipe_handle_exception+0x7b/0x144
> >>  [<c02dfaeb>] error_code+0x6f/0x7c
> >
> > Wow. Why that?
> >
> >>  [<c0111d13>] __ipipe_handle_exception+0x83/0x144
> >>  [<c02dfaeb>] error_code+0x6f/0x7c
> >
> > And this? We should not get any exception over an IPI3 handler. I guess
> > the double fault may be explained by this root cause.
> >
> >>  [<c01117df>] __ipipe_handle_irq+0x4f/0x140
> >>  [<c0104c5e>] ipipe_ipi3+0x26/0x40
> >
> > Our LAPIC timer vector. Are you running full modular or statically btw?
>
> Fully modular. Compiling the nucleus in makes the lock-up move to
> another, once again invisible spot.
>
> I nailed down the fault address in the scenario above. It's in the
> nucleus module, at the first byte of xntimer_tick_aperiodic. Are we
> loosing module text pages over the time? This functions must have been
> executed before as the timer was armed while I collected the
> /proc/modules and then triggered the crash.

There is a pending issue about vmalloced areas, which I completely forgot:
https://mail.gna.org/public/xenomai-core/2007-02/msg00138.html

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] crashing 2.6.22
  2007-10-01  9:04                 ` Gilles Chanteperdrix
@ 2007-10-01  9:17                   ` Jan Kiszka
  2007-10-01  9:23                     ` Gilles Chanteperdrix
  2007-10-08  7:33                   ` [Xenomai-core] crashing 2.6.22 Jan Kiszka
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2007-10-01  9:17 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
> On 9/30/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>> Philippe Gerum wrote:
>>> On Sun, 2007-09-30 at 13:42 +0200, Jan Kiszka wrote:
>>>> Jan Kiszka wrote:
>>>>> Philippe Gerum wrote:
>>>>>> On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
>>>> ...
>>>>>>>  And a third
>>>>>>> one only gives me "Detected illicit call from domain Xenomai" before the
>>>>>>> box reboots. :(
>>>>>> Grmff... Do you run with your smp_processor_id() instrumentation in?
>>>>> Yes, but I suspect this is just a symptom of some severe memory
>>>>> corruption that (also?) hits I-pipe data structures. I just put in some
>>>>> different instrumentation, and that warning is gone, the box just hangs
>>>>> hard at a different point. Very unfriendly.
>>>> Hah! Got some crash log by hacking a raw printk-to-uart:
>>>>
>>>> [...]
>>>> <6>Xenomai: starting RTDM services.
>>>> <6>NET: Registered protocol family 10
>>>> <6>lo: Disabled Privacy Extensions
>>>> <6>ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>> <3>I-pipe: Detected illicit call from domain 'Xenomai'
>>>> <3>        into a service reserved for domain 'Linux' and below.
>>>>        f3a6bc18 00000000 00000000 c05dad6c f3a6bc3c c0105fc3 c03513c7 c05dc100
>>>>        00000009 f3a6bc54 c01479cb c03592f8 c0357ae2 c035e069 f3a6bc88 f3a6bc70
>>>>        c0127224 c0111df8 00000000 f3a6bd74 00000000 f3a6bd74 f3a6bc80 c012727f
>>>> Call Trace:
>>>>  [<c010520f>] show_trace_log_lvl+0x1f/0x40
>>>>  [<c01052e1>] show_stack_log_lvl+0xb1/0xe0
>>>>  [<c0105fc3>] show_stack+0x33/0x40
>>>>  [<c01479cb>] ipipe_check_context+0x7b/0x90
>>>>  [<c0127224>] __atomic_notifier_call_chain+0x24/0x60
>>>>  [<c012727f>] atomic_notifier_call_chain+0x1f/0x30
>>>>  [<c0131e02>] notify_die+0x32/0x40
>>>>  [<c0105d29>] do_invalid_op+0x59/0xa0
>>>>  [<c0111d0b>] __ipipe_handle_exception+0x7b/0x144
>>>>  [<c02dfaeb>] error_code+0x6f/0x7c
>>> Wow. Why that?
>>>
>>>>  [<c0111d13>] __ipipe_handle_exception+0x83/0x144
>>>>  [<c02dfaeb>] error_code+0x6f/0x7c
>>> And this? We should not get any exception over an IPI3 handler. I guess
>>> the double fault may be explained by this root cause.
>>>
>>>>  [<c01117df>] __ipipe_handle_irq+0x4f/0x140
>>>>  [<c0104c5e>] ipipe_ipi3+0x26/0x40
>>> Our LAPIC timer vector. Are you running full modular or statically btw?
>> Fully modular. Compiling the nucleus in makes the lock-up move to
>> another, once again invisible spot.
>>
>> I nailed down the fault address in the scenario above. It's in the
>> nucleus module, at the first byte of xntimer_tick_aperiodic. Are we
>> loosing module text pages over the time? This functions must have been
>> executed before as the timer was armed while I collected the
>> /proc/modules and then triggered the crash.
> 
> There is a pending issue about vmalloced areas, which I completely forgot:
> https://mail.gna.org/public/xenomai-core/2007-02/msg00138.html
> 

Would this explain my problems which are already visible without any 
Xenomai application running (and also without unloading the modules 
again, to answer Philippe's question)? Hell, I would love to find the 
reason here, debugging this stuff stopped being fun a long time ago...

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] crashing 2.6.22
  2007-10-01  9:17                   ` Jan Kiszka
@ 2007-10-01  9:23                     ` Gilles Chanteperdrix
  2007-10-01  9:32                       ` Jan Kiszka
  0 siblings, 1 reply; 31+ messages in thread
From: Gilles Chanteperdrix @ 2007-10-01  9:23 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On 10/1/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> Gilles Chanteperdrix wrote:
> > On 9/30/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> >> Philippe Gerum wrote:
> >>> On Sun, 2007-09-30 at 13:42 +0200, Jan Kiszka wrote:
> >>>> Jan Kiszka wrote:
> >>>>> Philippe Gerum wrote:
> >>>>>> On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
> >>>> ...
> >>>>>>>  And a third
> >>>>>>> one only gives me "Detected illicit call from domain Xenomai" before the
> >>>>>>> box reboots. :(
> >>>>>> Grmff... Do you run with your smp_processor_id() instrumentation in?
> >>>>> Yes, but I suspect this is just a symptom of some severe memory
> >>>>> corruption that (also?) hits I-pipe data structures. I just put in some
> >>>>> different instrumentation, and that warning is gone, the box just hangs
> >>>>> hard at a different point. Very unfriendly.
> >>>> Hah! Got some crash log by hacking a raw printk-to-uart:
> >>>>
> >>>> [...]
> >>>> <6>Xenomai: starting RTDM services.
> >>>> <6>NET: Registered protocol family 10
> >>>> <6>lo: Disabled Privacy Extensions
> >>>> <6>ADDRCONF(NETDEV_UP): eth0: link is not ready
> >>>> <3>I-pipe: Detected illicit call from domain 'Xenomai'
> >>>> <3>        into a service reserved for domain 'Linux' and below.
> >>>>        f3a6bc18 00000000 00000000 c05dad6c f3a6bc3c c0105fc3 c03513c7 c05dc100
> >>>>        00000009 f3a6bc54 c01479cb c03592f8 c0357ae2 c035e069 f3a6bc88 f3a6bc70
> >>>>        c0127224 c0111df8 00000000 f3a6bd74 00000000 f3a6bd74 f3a6bc80 c012727f
> >>>> Call Trace:
> >>>>  [<c010520f>] show_trace_log_lvl+0x1f/0x40
> >>>>  [<c01052e1>] show_stack_log_lvl+0xb1/0xe0
> >>>>  [<c0105fc3>] show_stack+0x33/0x40
> >>>>  [<c01479cb>] ipipe_check_context+0x7b/0x90
> >>>>  [<c0127224>] __atomic_notifier_call_chain+0x24/0x60
> >>>>  [<c012727f>] atomic_notifier_call_chain+0x1f/0x30
> >>>>  [<c0131e02>] notify_die+0x32/0x40
> >>>>  [<c0105d29>] do_invalid_op+0x59/0xa0
> >>>>  [<c0111d0b>] __ipipe_handle_exception+0x7b/0x144
> >>>>  [<c02dfaeb>] error_code+0x6f/0x7c
> >>> Wow. Why that?
> >>>
> >>>>  [<c0111d13>] __ipipe_handle_exception+0x83/0x144
> >>>>  [<c02dfaeb>] error_code+0x6f/0x7c
> >>> And this? We should not get any exception over an IPI3 handler. I guess
> >>> the double fault may be explained by this root cause.
> >>>
> >>>>  [<c01117df>] __ipipe_handle_irq+0x4f/0x140
> >>>>  [<c0104c5e>] ipipe_ipi3+0x26/0x40
> >>> Our LAPIC timer vector. Are you running full modular or statically btw?
> >> Fully modular. Compiling the nucleus in makes the lock-up move to
> >> another, once again invisible spot.
> >>
> >> I nailed down the fault address in the scenario above. It's in the
> >> nucleus module, at the first byte of xntimer_tick_aperiodic. Are we
> >> loosing module text pages over the time? This functions must have been
> >> executed before as the timer was armed while I collected the
> >> /proc/modules and then triggered the crash.
> >
> > There is a pending issue about vmalloced areas, which I completely forgot:
> > https://mail.gna.org/public/xenomai-core/2007-02/msg00138.html
> >
>
> Would this explain my problems which are already visible without any
> Xenomai application running (and also without unloading the modules
> again, to answer Philippe's question)? Hell, I would love to find the
> reason here, debugging this stuff stopped being fun a long time ago...

It would explain bugs involving a race between task creation and
vmalloc/ioremap. But the bug would only happen with Xenomai tasks
running,
otherwise, the vmalloced/ioremaped area would be mapped lazily as usual.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] crashing 2.6.22
  2007-10-01  9:23                     ` Gilles Chanteperdrix
@ 2007-10-01  9:32                       ` Jan Kiszka
  2007-10-01  9:38                         ` Gilles Chanteperdrix
  2007-10-01 12:42                         ` [Xenomai-core] crashing 2.6.22 Labozzetta, Saverio
  0 siblings, 2 replies; 31+ messages in thread
From: Jan Kiszka @ 2007-10-01  9:32 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
> On 10/1/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>> Gilles Chanteperdrix wrote:
>>> On 9/30/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>>>> Philippe Gerum wrote:
>>>>> On Sun, 2007-09-30 at 13:42 +0200, Jan Kiszka wrote:
>>>>>> Jan Kiszka wrote:
>>>>>>> Philippe Gerum wrote:
>>>>>>>> On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
>>>>>> ...
>>>>>>>>>  And a third
>>>>>>>>> one only gives me "Detected illicit call from domain Xenomai" before the
>>>>>>>>> box reboots. :(
>>>>>>>> Grmff... Do you run with your smp_processor_id() instrumentation in?
>>>>>>> Yes, but I suspect this is just a symptom of some severe memory
>>>>>>> corruption that (also?) hits I-pipe data structures. I just put in some
>>>>>>> different instrumentation, and that warning is gone, the box just hangs
>>>>>>> hard at a different point. Very unfriendly.
>>>>>> Hah! Got some crash log by hacking a raw printk-to-uart:
>>>>>>
>>>>>> [...]
>>>>>> <6>Xenomai: starting RTDM services.
>>>>>> <6>NET: Registered protocol family 10
>>>>>> <6>lo: Disabled Privacy Extensions
>>>>>> <6>ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>>> <3>I-pipe: Detected illicit call from domain 'Xenomai'
>>>>>> <3>        into a service reserved for domain 'Linux' and below.
>>>>>>        f3a6bc18 00000000 00000000 c05dad6c f3a6bc3c c0105fc3 c03513c7 c05dc100
>>>>>>        00000009 f3a6bc54 c01479cb c03592f8 c0357ae2 c035e069 f3a6bc88 f3a6bc70
>>>>>>        c0127224 c0111df8 00000000 f3a6bd74 00000000 f3a6bd74 f3a6bc80 c012727f
>>>>>> Call Trace:
>>>>>>  [<c010520f>] show_trace_log_lvl+0x1f/0x40
>>>>>>  [<c01052e1>] show_stack_log_lvl+0xb1/0xe0
>>>>>>  [<c0105fc3>] show_stack+0x33/0x40
>>>>>>  [<c01479cb>] ipipe_check_context+0x7b/0x90
>>>>>>  [<c0127224>] __atomic_notifier_call_chain+0x24/0x60
>>>>>>  [<c012727f>] atomic_notifier_call_chain+0x1f/0x30
>>>>>>  [<c0131e02>] notify_die+0x32/0x40
>>>>>>  [<c0105d29>] do_invalid_op+0x59/0xa0
>>>>>>  [<c0111d0b>] __ipipe_handle_exception+0x7b/0x144
>>>>>>  [<c02dfaeb>] error_code+0x6f/0x7c
>>>>> Wow. Why that?
>>>>>
>>>>>>  [<c0111d13>] __ipipe_handle_exception+0x83/0x144
>>>>>>  [<c02dfaeb>] error_code+0x6f/0x7c
>>>>> And this? We should not get any exception over an IPI3 handler. I guess
>>>>> the double fault may be explained by this root cause.
>>>>>
>>>>>>  [<c01117df>] __ipipe_handle_irq+0x4f/0x140
>>>>>>  [<c0104c5e>] ipipe_ipi3+0x26/0x40
>>>>> Our LAPIC timer vector. Are you running full modular or statically btw?
>>>> Fully modular. Compiling the nucleus in makes the lock-up move to
>>>> another, once again invisible spot.
>>>>
>>>> I nailed down the fault address in the scenario above. It's in the
>>>> nucleus module, at the first byte of xntimer_tick_aperiodic. Are we
>>>> loosing module text pages over the time? This functions must have been
>>>> executed before as the timer was armed while I collected the
>>>> /proc/modules and then triggered the crash.
>>> There is a pending issue about vmalloced areas, which I completely forgot:
>>> https://mail.gna.org/public/xenomai-core/2007-02/msg00138.html
>>>
>> Would this explain my problems which are already visible without any
>> Xenomai application running (and also without unloading the modules
>> again, to answer Philippe's question)? Hell, I would love to find the
>> reason here, debugging this stuff stopped being fun a long time ago...
> 
> It would explain bugs involving a race between task creation and
> vmalloc/ioremap. But the bug would only happen with Xenomai tasks
> running,

I don't need to start any Xenomai task to trigger the problem.

> otherwise, the vmalloced/ioremaped area would be mapped lazily as usual.

I guess module text pages are not mapped lazily, otherwise quite a lot 
of things would have fallen apart much earlier, right?

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] crashing 2.6.22
  2007-10-01  9:32                       ` Jan Kiszka
@ 2007-10-01  9:38                         ` Gilles Chanteperdrix
  2007-10-01 12:12                           ` [Xenomai-core] ARM compiling error Patrick
  2007-10-01 12:42                         ` [Xenomai-core] crashing 2.6.22 Labozzetta, Saverio
  1 sibling, 1 reply; 31+ messages in thread
From: Gilles Chanteperdrix @ 2007-10-01  9:38 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On 10/1/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> Gilles Chanteperdrix wrote:
> > On 10/1/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> >> Gilles Chanteperdrix wrote:
> >>> On 9/30/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> >>>> Philippe Gerum wrote:
> >>>>> On Sun, 2007-09-30 at 13:42 +0200, Jan Kiszka wrote:
> >>>>>> Jan Kiszka wrote:
> >>>>>>> Philippe Gerum wrote:
> >>>>>>>> On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
> >>>>>> ...
> >>>>>>>>>  And a third
> >>>>>>>>> one only gives me "Detected illicit call from domain Xenomai" before the
> >>>>>>>>> box reboots. :(
> >>>>>>>> Grmff... Do you run with your smp_processor_id() instrumentation in?
> >>>>>>> Yes, but I suspect this is just a symptom of some severe memory
> >>>>>>> corruption that (also?) hits I-pipe data structures. I just put in some
> >>>>>>> different instrumentation, and that warning is gone, the box just hangs
> >>>>>>> hard at a different point. Very unfriendly.
> >>>>>> Hah! Got some crash log by hacking a raw printk-to-uart:
> >>>>>>
> >>>>>> [...]
> >>>>>> <6>Xenomai: starting RTDM services.
> >>>>>> <6>NET: Registered protocol family 10
> >>>>>> <6>lo: Disabled Privacy Extensions
> >>>>>> <6>ADDRCONF(NETDEV_UP): eth0: link is not ready
> >>>>>> <3>I-pipe: Detected illicit call from domain 'Xenomai'
> >>>>>> <3>        into a service reserved for domain 'Linux' and below.
> >>>>>>        f3a6bc18 00000000 00000000 c05dad6c f3a6bc3c c0105fc3 c03513c7 c05dc100
> >>>>>>        00000009 f3a6bc54 c01479cb c03592f8 c0357ae2 c035e069 f3a6bc88 f3a6bc70
> >>>>>>        c0127224 c0111df8 00000000 f3a6bd74 00000000 f3a6bd74 f3a6bc80 c012727f
> >>>>>> Call Trace:
> >>>>>>  [<c010520f>] show_trace_log_lvl+0x1f/0x40
> >>>>>>  [<c01052e1>] show_stack_log_lvl+0xb1/0xe0
> >>>>>>  [<c0105fc3>] show_stack+0x33/0x40
> >>>>>>  [<c01479cb>] ipipe_check_context+0x7b/0x90
> >>>>>>  [<c0127224>] __atomic_notifier_call_chain+0x24/0x60
> >>>>>>  [<c012727f>] atomic_notifier_call_chain+0x1f/0x30
> >>>>>>  [<c0131e02>] notify_die+0x32/0x40
> >>>>>>  [<c0105d29>] do_invalid_op+0x59/0xa0
> >>>>>>  [<c0111d0b>] __ipipe_handle_exception+0x7b/0x144
> >>>>>>  [<c02dfaeb>] error_code+0x6f/0x7c
> >>>>> Wow. Why that?
> >>>>>
> >>>>>>  [<c0111d13>] __ipipe_handle_exception+0x83/0x144
> >>>>>>  [<c02dfaeb>] error_code+0x6f/0x7c
> >>>>> And this? We should not get any exception over an IPI3 handler. I guess
> >>>>> the double fault may be explained by this root cause.
> >>>>>
> >>>>>>  [<c01117df>] __ipipe_handle_irq+0x4f/0x140
> >>>>>>  [<c0104c5e>] ipipe_ipi3+0x26/0x40
> >>>>> Our LAPIC timer vector. Are you running full modular or statically btw?
> >>>> Fully modular. Compiling the nucleus in makes the lock-up move to
> >>>> another, once again invisible spot.
> >>>>
> >>>> I nailed down the fault address in the scenario above. It's in the
> >>>> nucleus module, at the first byte of xntimer_tick_aperiodic. Are we
> >>>> loosing module text pages over the time? This functions must have been
> >>>> executed before as the timer was armed while I collected the
> >>>> /proc/modules and then triggered the crash.
> >>> There is a pending issue about vmalloced areas, which I completely forgot:
> >>> https://mail.gna.org/public/xenomai-core/2007-02/msg00138.html
> >>>
> >> Would this explain my problems which are already visible without any
> >> Xenomai application running (and also without unloading the modules
> >> again, to answer Philippe's question)? Hell, I would love to find the
> >> reason here, debugging this stuff stopped being fun a long time ago...
> >
> > It would explain bugs involving a race between task creation and
> > vmalloc/ioremap. But the bug would only happen with Xenomai tasks
> > running,
>
> I don't need to start any Xenomai task to trigger the problem.
>
> > otherwise, the vmalloced/ioremaped area would be mapped lazily as usual.
>
> I guess module text pages are not mapped lazily, otherwise quite a lot
> of things would have fallen apart much earlier, right?

This would happen when a task and a module are created at the same
time, and the module would be mapped lazily only for the newly created
task.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Xenomai-core] ARM compiling error
  2007-10-01  9:38                         ` Gilles Chanteperdrix
@ 2007-10-01 12:12                           ` Patrick
  2007-10-01 12:25                             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 31+ messages in thread
From: Patrick @ 2007-10-01 12:12 UTC (permalink / raw)
  To: 'xenomai-core'

[-- Attachment #1: Type: text/plain, Size: 1532 bytes --]

Hy,

 

I install Xenomai 2.3.4 for ARM on 2.6.20 kernel. The prepare-kernel script
work well.

When I compiling my kernel I have the error message above:

 

  CC      kernel/xenomai/arch/generic/hal.o

  LD      kernel/xenomai/arch/generic/xeno_hal.o

  LD      kernel/xenomai/arch/generic/built-in.o

  LD      kernel/xenomai/arch/built-in.o

  CC      kernel/xenomai/nucleus/heap.o

  CC      kernel/xenomai/nucleus/intr.o

kernel/xenomai/nucleus/intr.c:60: warning: 'xnirqs' defined but not used

  CC      kernel/xenomai/nucleus/module.o

  CC      kernel/xenomai/nucleus/pod.o

  CC      kernel/xenomai/nucleus/synch.o

  CC      kernel/xenomai/nucleus/thread.o

  CC      kernel/xenomai/nucleus/timer.o

  CC      kernel/xenomai/nucleus/shadow.o

In file included from kernel/xenomai/nucleus/shadow.c:56:

include/asm/xenomai/bits/shadow.h: In function 'xnarch_local_syscall':

include/asm/xenomai/bits/shadow.h:180: error: 'struct <anonymous>' has no
member named 'counter'

include/asm/xenomai/bits/shadow.h:181: error: 'struct <anonymous>' has no
member named 'mask'

include/asm/xenomai/bits/shadow.h:182: error: 'struct <anonymous>' has no
member named 'last_cnt'

include/asm/xenomai/bits/shadow.h:183: error: 'struct <anonymous>' has no
member named 'tsc'

make[3]: *** [kernel/xenomai/nucleus/shadow.o] Erreur 1

make[2]: *** [kernel/xenomai/nucleus] Erreur 2

make[1]: *** [kernel/xenomai] Erreur 2

make: *** [kernel] Erreur 2

 

 

Do you have a solution for fix this problem?

 

Thanks in advance

 

Patrick


[-- Attachment #2: Type: text/html, Size: 8690 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] ARM compiling error
  2007-10-01 12:12                           ` [Xenomai-core] ARM compiling error Patrick
@ 2007-10-01 12:25                             ` Gilles Chanteperdrix
       [not found]                               ` <200710011255.l91CtkvR000470@domain.hid>
  0 siblings, 1 reply; 31+ messages in thread
From: Gilles Chanteperdrix @ 2007-10-01 12:25 UTC (permalink / raw)
  To: Patrick; +Cc: xenomai-core

On 10/1/07, Patrick <kpa_info@domain.hid> wrote:
>
>
>
>
> Hy,
>
>
>
> I install Xenomai 2.3.4 for ARM on 2.6.20 kernel. The prepare-kernel script
> work well.

Which ARM machine ?
Which version of the ARM patch ?
>From your compilation log, I would say that you are using an I-pipe
patch which is older than Xenomai v2.3.4, so the general advice is to
use the I-pipe patch which comes with the version of Xenomai you use.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <200710011255.l91CtkvR000470@domain.hid>]

* Re: [Xenomai-core] ARM compiling error
       [not found]                               ` <200710011255.l91CtkvR000470@domain.hid>
@ 2007-10-01 13:10                                 ` Gilles Chanteperdrix
  2007-10-01 13:43                                   ` Patrick
  0 siblings, 1 reply; 31+ messages in thread
From: Gilles Chanteperdrix @ 2007-10-01 13:10 UTC (permalink / raw)
  To: Patrick; +Cc: xenomai-core

On 10/1/07, Patrick <kpa_info@domain.hid> wrote:
> > On 10/1/07, Patrick <kpa_info@domain.hid> wrote:
> > >
> > > I install Xenomai 2.3.4 for ARM on 2.6.20 kernel. The prepare-kernel
> > > script
> > > work well.
> >
> > Which ARM machine ?
> > Which version of the ARM patch ?
> > From your compilation log, I would say that you are using an I-pipe
> > patch which is older than Xenomai v2.3.4, so the general advice is to
> > use the I-pipe patch which comes with the version of Xenomai you use.
>
> I use an ARM PXA270 and adeos-ipipe-2.6.20-arm-1.7-05.
>
> I use the latest stable version of Xenomai (2.3.4) and the latest version of
> ipipe for ARM (1.7).

The I-pipe patch which comes with Xenomai (2.3.4) is
adeos-ipipe-2.6.20-arm-1.7-06, you will find it in the
ksrc/arch/arm/patches subdirectory of Xenomai sources. If you call
prepare-kernel without any argument it will pick the right patch.

Please do send your replies to the mailing list.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] ARM compiling error
  2007-10-01 13:10                                 ` Gilles Chanteperdrix
@ 2007-10-01 13:43                                   ` Patrick
  2007-10-01 14:29                                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 31+ messages in thread
From: Patrick @ 2007-10-01 13:43 UTC (permalink / raw)
  To: 'xenomai-core'


I have already tried this (with arm-1.7-05 from adeos website and arm-1.7-06
from xenomai/ksrc/arch/arm/patches) but the result is the same.

Thanks for your help

Sorry for direct answers ;-)


-----Message d'origine-----
De : Gilles Chanteperdrix [mailto:gilles.chanteperdrix@xenomai.org
Envoyé : lundi, 1. octobre 2007 15:11
À : Patrick
Cc : xenomai-core
Objet : Re: [Xenomai-core] ARM compiling error

On 10/1/07, Patrick <kpa_info@domain.hid> wrote:
> > On 10/1/07, Patrick <kpa_info@domain.hid> wrote:
> > >
> > > I install Xenomai 2.3.4 for ARM on 2.6.20 kernel. The prepare-kernel
> > > script
> > > work well.
> >
> > Which ARM machine ?
> > Which version of the ARM patch ?
> > From your compilation log, I would say that you are using an I-pipe
> > patch which is older than Xenomai v2.3.4, so the general advice is to
> > use the I-pipe patch which comes with the version of Xenomai you use.
>
> I use an ARM PXA270 and adeos-ipipe-2.6.20-arm-1.7-05.
>
> I use the latest stable version of Xenomai (2.3.4) and the latest version
of
> ipipe for ARM (1.7).

The I-pipe patch which comes with Xenomai (2.3.4) is
adeos-ipipe-2.6.20-arm-1.7-06, you will find it in the
ksrc/arch/arm/patches subdirectory of Xenomai sources. If you call
prepare-kernel without any argument it will pick the right patch.

Please do send your replies to the mailing list.

-- 
                                               Gilles Chanteperdrix



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] ARM compiling error
  2007-10-01 13:43                                   ` Patrick
@ 2007-10-01 14:29                                     ` Gilles Chanteperdrix
  0 siblings, 0 replies; 31+ messages in thread
From: Gilles Chanteperdrix @ 2007-10-01 14:29 UTC (permalink / raw)
  To: Patrick; +Cc: xenomai-core

On 10/1/07, Patrick <kpa_info@domain.hid> wrote:
> > -----Message d'origine-----
> > De: Gilles Chanteperdrix [mailto:gilles.chanteperdrix@xenomai.org]
> > Envoyé: lundi, 1. octobre 2007 15:11
> > À: Patrick
> > Cc: xenomai-core
> > Objet: Re: [Xenomai-core] ARM compiling error
> > > On 10/1/07, Patrick <kpa_info@domain.hid> wrote:
> > > > On 10/1/07, Patrick <kpa_info@domain.hid> wrote:
> > > > >
> > > > > I install Xenomai 2.3.4 for ARM on 2.6.20 kernel. The prepare-kernel
> > > > > script
> > > > > work well.
> > > >
> > > > Which ARM machine ?
> > > > Which version of the ARM patch ?
> > > > From your compilation log, I would say that you are using an I-pipe
> > > > patch which is older than Xenomai v2.3.4, so the general advice is to
> > > > use the I-pipe patch which comes with the version of Xenomai you use.
> > >
> > > I use an ARM PXA270 and adeos-ipipe-2.6.20-arm-1.7-05.
> > >
> > > I use the latest stable version of Xenomai (2.3.4) and the latest version
> > of
> > > ipipe for ARM (1.7).
> >
> > The I-pipe patch which comes with Xenomai (2.3.4) is
> > adeos-ipipe-2.6.20-arm-1.7-06, you will find it in the
> > ksrc/arch/arm/patches subdirectory of Xenomai sources. If you call
> > prepare-kernel without any argument it will pick the right patch.
> >
> > Please do send your replies to the mailing list.
>
> I have already tried this (with arm-1.7-05 from adeos website and arm-1.7-06
> from xenomai/ksrc/arch/arm/patches) but the result is the same.

I have just tried to build Xenomai-2.3.4 for PXA 270 (LogicPD PXA270
Card Engine Development Platform), and it builds flawlessly. So, I
suggest you erase everything, and restart from clean sources, use
I-pipe for ARM version 1.7-06, and nothing else.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] crashing 2.6.22
  2007-10-01  9:32                       ` Jan Kiszka
  2007-10-01  9:38                         ` Gilles Chanteperdrix
@ 2007-10-01 12:42                         ` Labozzetta, Saverio
  2007-10-01 13:32                           ` Labozzetta, Saverio
  1 sibling, 1 reply; 31+ messages in thread
From: Labozzetta, Saverio @ 2007-10-01 12:42 UTC (permalink / raw)
  To: Jan Kiszka, Gilles Chanteperdrix; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 5077 bytes --]





>-----Original Message-----
>From: xenomai-core-bounces@domain.hid on behalf of Jan Kiszka
>Sent: Mon 2007-10-01 11:32 AM
>To: Gilles Chanteperdrix
>Cc: xenomai-core
>Subject: Re: [Xenomai-core] crashing 2.6.22
> 
>Gilles Chanteperdrix wrote:
>> On 10/1/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>>> Gilles Chanteperdrix wrote:
>>>> On 9/30/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>>>>> Philippe Gerum wrote:
>>>>>> On Sun, 2007-09-30 at 13:42 +0200, Jan Kiszka wrote:
>>>>>>> Jan Kiszka wrote:
>>>>>>>> Philippe Gerum wrote:
>>>>>>>>> On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
>>>>>>> ...
>>>>>>>>>>  And a third
>>>>>>>>>> one only gives me "Detected illicit call from domain Xenomai" before the
>>>>>>>>>> box reboots. :(
>>>>>>>>> Grmff... Do you run with your smp_processor_id() instrumentation in?
>>>>>>>> Yes, but I suspect this is just a symptom of some severe memory
>>>>>>>> corruption that (also?) hits I-pipe data structures. I just put in some
>>>>>>>> different instrumentation, and that warning is gone, the box just hangs
>>>>>>>> hard at a different point. Very unfriendly.
>>>>>>> Hah! Got some crash log by hacking a raw printk-to-uart:
>>>>>>>
>>>>>>> [...]
>>>>>>> <6>Xenomai: starting RTDM services.
>>>>>>> <6>NET: Registered protocol family 10
>>>>>>> <6>lo: Disabled Privacy Extensions
>>>>>>> <6>ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>>>> <3>I-pipe: Detected illicit call from domain 'Xenomai'
>>>>>>> <3>        into a service reserved for domain 'Linux' and below.
>>>>>>>        f3a6bc18 00000000 00000000 c05dad6c f3a6bc3c c0105fc3 c03513c7 c05dc100
>>>>>>>        00000009 f3a6bc54 c01479cb c03592f8 c0357ae2 c035e069 f3a6bc88 f3a6bc70
>>>>>>>        c0127224 c0111df8 00000000 f3a6bd74 00000000 f3a6bd74 f3a6bc80 c012727f
>>>>>>> Call Trace:
>>>>>>>  [<c010520f>] show_trace_log_lvl+0x1f/0x40
>>>>>>>  [<c01052e1>] show_stack_log_lvl+0xb1/0xe0
>>>>>>>  [<c0105fc3>] show_stack+0x33/0x40
>>>>>>>  [<c01479cb>] ipipe_check_context+0x7b/0x90
>>>>>>>  [<c0127224>] __atomic_notifier_call_chain+0x24/0x60
>>>>>>>  [<c012727f>] atomic_notifier_call_chain+0x1f/0x30
>>>>>>>  [<c0131e02>] notify_die+0x32/0x40
>>>>>>>  [<c0105d29>] do_invalid_op+0x59/0xa0
>>>>>>>  [<c0111d0b>] __ipipe_handle_exception+0x7b/0x144
>>>>>>>  [<c02dfaeb>] error_code+0x6f/0x7c
>>>>>> Wow. Why that?
>>>>>>
>>>>>>>  [<c0111d13>] __ipipe_handle_exception+0x83/0x144
>>>>>>>  [<c02dfaeb>] error_code+0x6f/0x7c
>>>>>> And this? We should not get any exception over an IPI3 handler. I guess
>>>>>> the double fault may be explained by this root cause.
>>>>>>
>>>>>>>  [<c01117df>] __ipipe_handle_irq+0x4f/0x140
>>>>>>>  [<c0104c5e>] ipipe_ipi3+0x26/0x40
>>>>>> Our LAPIC timer vector. Are you running full modular or statically btw?
>>>>> Fully modular. Compiling the nucleus in makes the lock-up move to
>>>>> another, once again invisible spot.
>>>>>
>>>>> I nailed down the fault address in the scenario above. It's in the
>>>>> nucleus module, at the first byte of xntimer_tick_aperiodic. Are we
>>>>> loosing module text pages over the time? This functions must have been
>>>>> executed before as the timer was armed while I collected the
>>>>> /proc/modules and then triggered the crash.
>>>> There is a pending issue about vmalloced areas, which I completely forgot:
>>>> https://mail.gna.org/public/xenomai-core/2007-02/msg00138.html
>>>>
>>> Would this explain my problems which are already visible without any
>>> Xenomai application running (and also without unloading the modules
>>> again, to answer Philippe's question)? Hell, I would love to find the
>>> reason here, debugging this stuff stopped being fun a long time ago...
>> 
>> It would explain bugs involving a race between task creation and
>> vmalloc/ioremap. But the bug would only happen with Xenomai tasks
>> running,
>
>I don't need to start any Xenomai task to trigger the problem.
>
>> otherwise, the vmalloced/ioremaped area would be mapped lazily as usual.
>
>I guess module text pages are not mapped lazily, otherwise quite a lot 
>of things would have fallen apart much earlier, right?

 AFAIK Once inserted module text pages are part of the kernel, so have
to be reliably ready as long as the servicies offered are registred,
is the insertion function which allocates memory, access it to write 
the text of the module and make it part of the kernel, so is keep in 
main memory.

  Saverio

>
>Jan
>
>-- 
>Siemens AG, Corporate Technology, CT SE 2
>Corporate Competence Center Embedded Linux
>



This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient,  you are not authorized to read, print, retain, copy, disseminate,  distribute, or use this message or any part thereof. If you receive this  message in error, please notify the sender immediately and delete all  copies of this message.

[-- Attachment #2: Type: text/html, Size: 7716 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] crashing 2.6.22
  2007-10-01 12:42                         ` [Xenomai-core] crashing 2.6.22 Labozzetta, Saverio
@ 2007-10-01 13:32                           ` Labozzetta, Saverio
  2007-10-02  9:04                             ` [Xenomai-help] rt_task crash kernel Patrick
  0 siblings, 1 reply; 31+ messages in thread
From: Labozzetta, Saverio @ 2007-10-01 13:32 UTC (permalink / raw)
  To: Labozzetta, Saverio, Jan Kiszka, Gilles Chanteperdrix; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 5824 bytes --]





-----Original Message-----
From: xenomai-core-bounces@domain.hid on behalf of Labozzetta, Saverio
Sent: Mon 2007-10-01 2:42 PM
To: Jan Kiszka; Gilles Chanteperdrix
Cc: xenomai-core
Subject: Re: [Xenomai-core] crashing 2.6.22
 




>>-----Original Message-----
>>From: xenomai-core-bounces@domain.hid on behalf of Jan Kiszka
>>Sent: Mon 2007-10-01 11:32 AM
>>To: Gilles Chanteperdrix
>>Cc: xenomai-core
>Subject: Re: [Xenomai-core] crashing 2.6.22
>> 
>>Gilles Chanteperdrix wrote:
>>> On 10/1/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>>>> Gilles Chanteperdrix wrote:
>>>>> On 9/30/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>>>>>> Philippe Gerum wrote:
>>>>>>> On Sun, 2007-09-30 at 13:42 +0200, Jan Kiszka wrote:
>>>>>>>> Jan Kiszka wrote:
>>>>>>>>> Philippe Gerum wrote:
>>>>>>>>>> On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
>>>>>>>> ...
>>>>>>>>>>>  And a third
>>>>>>>>>>> one only gives me "Detected illicit call from domain Xenomai" before the
>>>>>>>>>>> box reboots. :(
>>>>>>>>>> Grmff... Do you run with your smp_processor_id() instrumentation in?
>>>>>>>>> Yes, but I suspect this is just a symptom of some severe memory
>>>>>>>>> corruption that (also?) hits I-pipe data structures. I just put in some
>>>>>>>>> different instrumentation, and that warning is gone, the box just hangs
>>>>>>>>> hard at a different point. Very unfriendly.
>>>>>>>> Hah! Got some crash log by hacking a raw printk-to-uart:
>>>>>>>>
>>>>>>>> [...]
>>>>>>>> <6>Xenomai: starting RTDM services.
>>>>>>>> <6>NET: Registered protocol family 10
>>>>>>>> <6>lo: Disabled Privacy Extensions
>>>>>>>> <6>ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>>>>> <3>I-pipe: Detected illicit call from domain 'Xenomai'
>>>>>>>> <3>        into a service reserved for domain 'Linux' and below.
>>>>>>>>        f3a6bc18 00000000 00000000 c05dad6c f3a6bc3c c0105fc3 c03513c7 c05dc100
>>>>>>>>        00000009 f3a6bc54 c01479cb c03592f8 c0357ae2 c035e069 f3a6bc88 f3a6bc70
>>>>>>>>        c0127224 c0111df8 00000000 f3a6bd74 00000000 f3a6bd74 f3a6bc80 c012727f
>>>>>>>> Call Trace:
>>>>>>>>  [<c010520f>] show_trace_log_lvl+0x1f/0x40
>>>>>>>>  [<c01052e1>] show_stack_log_lvl+0xb1/0xe0
>>>>>>>>  [<c0105fc3>] show_stack+0x33/0x40
>>>>>>>>  [<c01479cb>] ipipe_check_context+0x7b/0x90
>>>>>>>>  [<c0127224>] __atomic_notifier_call_chain+0x24/0x60
>>>>>>>>  [<c012727f>] atomic_notifier_call_chain+0x1f/0x30
>>>>>>>>  [<c0131e02>] notify_die+0x32/0x40
>>>>>>>>  [<c0105d29>] do_invalid_op+0x59/0xa0
>>>>>>>>  [<c0111d0b>] __ipipe_handle_exception+0x7b/0x144
>>>>>>>>  [<c02dfaeb>] error_code+0x6f/0x7c
>>>>>>> Wow. Why that?
>>>>>>>
>>>>>>>>  [<c0111d13>] __ipipe_handle_exception+0x83/0x144
>>>>>>>>  [<c02dfaeb>] error_code+0x6f/0x7c
>>>>>>> And this? We should not get any exception over an IPI3 handler. I guess
>>>>>>> the double fault may be explained by this root cause.
>>>>>>>
>>>>>>>>  [<c01117df>] __ipipe_handle_irq+0x4f/0x140
>>>>>>>>  [<c0104c5e>] ipipe_ipi3+0x26/0x40
>>>>>>> Our LAPIC timer vector. Are you running full modular or statically btw?
>>>>>> Fully modular. Compiling the nucleus in makes the lock-up move to
>>>>>> another, once again invisible spot.
>>>>>
>>>>>> I nailed down the fault address in the scenario above. It's in the
>>>>>> nucleus module, at the first byte of xntimer_tick_aperiodic. Are we
>>>>>> loosing module text pages over the time? This functions must have been
>>>>>> executed before as the timer was armed while I collected the
>>>>>> /proc/modules and then triggered the crash.
>>>>> There is a pending issue about vmalloced areas, which I completely forgot:
>>>>> https://mail.gna.org/public/xenomai-core/2007-02/msg00138.html
>>>>>
>>>> Would this explain my problems which are already visible without any
>>>> Xenomai application running (and also without unloading the modules
>>>> again, to answer Philippe's question)? Hell, I would love to find the
>>>> reason here, debugging this stuff stopped being fun a long time ago...
>>> 
>>> It would explain bugs involving a race between task creation and
>>> vmalloc/ioremap. But the bug would only happen with Xenomai tasks
>>> running,
>>
>>I don't need to start any Xenomai task to trigger the problem.
>>
>>> otherwise, the vmalloced/ioremaped area would be mapped lazily as usual.
>>
>>I guess module text pages are not mapped lazily, otherwise quite a lot 
>>of things would have fallen apart much earlier, right?
>
> AFAIK Once inserted module text pages are part of the kernel, so have
>to be reliably ready as long as the servicies offered are registred,
>is the insertion function which allocates memory, access it to write 
>the text of the module and make it part of the kernel, so is keep in 
>main memory.
>

I've been silly: the pages are directly allocated in kernel space (GFP_KERNEL),
and some of their addresses are pointed by the function offered in some
kernel tables...

BTW: sorry about the following nasty copiright abuse patch, is automatically
attached to any mail I send outside, an sorry also for the filthy email client
I use, but I've not found other ways since they put exchange on... 

>  Saverio
>
>>
>>Jan
>>
>>-- 
>>Siemens AG, Corporate Technology, CT SE 2
>>Corporate Competence Center Embedded Linux
>>




This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient,  you are not authorized to read, print, retain, copy, disseminate,  distribute, or use this message or any part thereof. If you receive this  message in error, please notify the sender immediately and delete all  copies of this message.

[-- Attachment #2: Type: text/html, Size: 8834 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Xenomai-help] rt_task crash kernel
  2007-10-01 13:32                           ` Labozzetta, Saverio
@ 2007-10-02  9:04                             ` Patrick
  2007-10-02  9:11                               ` Gilles Chanteperdrix
  0 siblings, 1 reply; 31+ messages in thread
From: Patrick @ 2007-10-02  9:04 UTC (permalink / raw)
  To: xenomai


[-- Attachment #1.1: Type: text/plain, Size: 926 bytes --]

Hi,

 

I'm using Xenomai 2.3.4 with 2.6.20 kernel on an ARM PXA270 board.

The installation was executed successfully.

 

Now I'm trying to insert a small test module (source code is attached) but
the kernel crashes on rt_task_start().

I have tried the same module on another x86 machine and there is no problem.


I have tried with or without rt_task_set_periodic in aperiodic and periodic
timer mode but the result is the same.

 

Do you have an idea?

 

Regards

 

This message contains information that may be privileged or confidential and
is the property of the Capgemini Group. It is intended only for the person
to whom it is addressed. If you are not the intended recipient, you are not
authorized to read, print, retain, copy, disseminate, distribute, or use
this message or any part thereof. If you receive this message in error,
please notify the sender immediately and delete all copies of this message.

	

[-- Attachment #1.2: Type: text/html, Size: 4351 bytes --]

[-- Attachment #2: myModule_m.c --]
[-- Type: text/plain, Size: 1550 bytes --]

/*******************************************************************
 * isrtimer_m.c 
 *******************************************************************/

#include <native/task.h>
#include <native/intr.h>
#include <native/alarm.h>

#define STACK_SIZE         8192        /* Taille de la pile par défaut */
#define MS                 1000000     /* 1 ms exprimé en ns */
#define TIMER_PERIODIC     1           /* 1: timer périodique, 0: timer apériodique */

RT_TASK myTask;
RT_ALARM myAlarm;

int err; /* stockage du code d'erreur */


/* cleanup_module() */
void __exit cleanup_module (void) {

  printk("#irqtimer: cleanup\n");

  rt_task_delete(&myTask);

  printk("Bye bye!\n");
}

/* Tâche périodique */
void periodic(void *cookie) {
   while(1) {
      printk("debut 20ms\n");
      rt_timer_spin(20 * MS);
      printk("fin 20ms\n");
      rt_task_wait_period(NULL);
   }
}

/* init_module() */
int __init init_module (void) {

  printk("#module de test: starting...\n");

  
  /* Timer initialization */
  if (TIMER_PERIODIC)
    err = rt_timer_set_mode(MS);   /* periodic mode, one tick will be equal to 1 ms */ 
  else
    err = rt_timer_set_mode(TM_ONESHOT);

  if (err != 0) {
    printk("Error timer: %d\n", err);
    return -1;
  }


  /*Création de la tache*/
  rt_task_create(&myTask, "taskCharge", STACK_SIZE, 10, 0);
  //rt_task_set_periodic(&myTask,TM_NOW,50*(TIMER_PERIODIC?1:MS));
  rt_task_start(&myTask, periodic, NULL);

  return 0;
}

MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-help] rt_task crash kernel
  2007-10-02  9:04                             ` [Xenomai-help] rt_task crash kernel Patrick
@ 2007-10-02  9:11                               ` Gilles Chanteperdrix
       [not found]                                 ` <200710020940.l929ep0p021831@domain.hid>
  0 siblings, 1 reply; 31+ messages in thread
From: Gilles Chanteperdrix @ 2007-10-02  9:11 UTC (permalink / raw)
  To: Patrick; +Cc: xenomai

On 10/2/07, Patrick <kpa_info@domain.hid> wrote:
>
>
>
>
> Hi,
>
>
>
> I'm using Xenomai 2.3.4 with 2.6.20 kernel on an ARM PXA270 board.
>
> The installation was executed successfully.
>
>
>
> Now I'm trying to insert a small test module (source code is attached) but the kernel crashes on rt_task_start().
>
> I have tried the same module on another x86 machine and there is no problem.
>
> I have tried with or without rt_task_set_periodic in aperiodic and periodic timer mode but the result is the same.
>
>
>
> Do you have an idea?

Do you observe the same behaviour with the latency test ?

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <200710020940.l929ep0p021831@domain.hid>]

* Re: [Xenomai-core] [Xenomai-help] rt_task crash kernel
       [not found]                                 ` <200710020940.l929ep0p021831@domain.hid>
@ 2007-10-02  9:46                                   ` Gilles Chanteperdrix
  0 siblings, 0 replies; 31+ messages in thread
From: Gilles Chanteperdrix @ 2007-10-02  9:46 UTC (permalink / raw)
  To: Patrick; +Cc: xenomai-core

On 10/2/07, Patrick <kpa_info@domain.hid> wrote:
>
>
>
>
> Yes, I have tried with bin/latency user space application and the system
> crashes.

latency appears to crash on ARM when launched with the default 100us
period, the system is just too busy. Do you observe the crash with a 1
ms period ?

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] crashing 2.6.22
  2007-10-01  9:04                 ` Gilles Chanteperdrix
  2007-10-01  9:17                   ` Jan Kiszka
@ 2007-10-08  7:33                   ` Jan Kiszka
  2007-10-08  8:45                     ` Gilles Chanteperdrix
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2007-10-08  7:33 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 3399 bytes --]

Gilles Chanteperdrix wrote:
> On 9/30/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>> Philippe Gerum wrote:
>>> On Sun, 2007-09-30 at 13:42 +0200, Jan Kiszka wrote:
>>>> Jan Kiszka wrote:
>>>>> Philippe Gerum wrote:
>>>>>> On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
>>>> ...
>>>>>>>  And a third
>>>>>>> one only gives me "Detected illicit call from domain Xenomai" before the
>>>>>>> box reboots. :(
>>>>>> Grmff... Do you run with your smp_processor_id() instrumentation in?
>>>>> Yes, but I suspect this is just a symptom of some severe memory
>>>>> corruption that (also?) hits I-pipe data structures. I just put in some
>>>>> different instrumentation, and that warning is gone, the box just hangs
>>>>> hard at a different point. Very unfriendly.
>>>> Hah! Got some crash log by hacking a raw printk-to-uart:
>>>>
>>>> [...]
>>>> <6>Xenomai: starting RTDM services.
>>>> <6>NET: Registered protocol family 10
>>>> <6>lo: Disabled Privacy Extensions
>>>> <6>ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>> <3>I-pipe: Detected illicit call from domain 'Xenomai'
>>>> <3>        into a service reserved for domain 'Linux' and below.
>>>>        f3a6bc18 00000000 00000000 c05dad6c f3a6bc3c c0105fc3 c03513c7 c05dc100
>>>>        00000009 f3a6bc54 c01479cb c03592f8 c0357ae2 c035e069 f3a6bc88 f3a6bc70
>>>>        c0127224 c0111df8 00000000 f3a6bd74 00000000 f3a6bd74 f3a6bc80 c012727f
>>>> Call Trace:
>>>>  [<c010520f>] show_trace_log_lvl+0x1f/0x40
>>>>  [<c01052e1>] show_stack_log_lvl+0xb1/0xe0
>>>>  [<c0105fc3>] show_stack+0x33/0x40
>>>>  [<c01479cb>] ipipe_check_context+0x7b/0x90
>>>>  [<c0127224>] __atomic_notifier_call_chain+0x24/0x60
>>>>  [<c012727f>] atomic_notifier_call_chain+0x1f/0x30
>>>>  [<c0131e02>] notify_die+0x32/0x40
>>>>  [<c0105d29>] do_invalid_op+0x59/0xa0
>>>>  [<c0111d0b>] __ipipe_handle_exception+0x7b/0x144
>>>>  [<c02dfaeb>] error_code+0x6f/0x7c
>>> Wow. Why that?
>>>
>>>>  [<c0111d13>] __ipipe_handle_exception+0x83/0x144
>>>>  [<c02dfaeb>] error_code+0x6f/0x7c
>>> And this? We should not get any exception over an IPI3 handler. I guess
>>> the double fault may be explained by this root cause.
>>>
>>>>  [<c01117df>] __ipipe_handle_irq+0x4f/0x140
>>>>  [<c0104c5e>] ipipe_ipi3+0x26/0x40
>>> Our LAPIC timer vector. Are you running full modular or statically btw?
>> Fully modular. Compiling the nucleus in makes the lock-up move to
>> another, once again invisible spot.
>>
>> I nailed down the fault address in the scenario above. It's in the
>> nucleus module, at the first byte of xntimer_tick_aperiodic. Are we
>> loosing module text pages over the time? This functions must have been
>> executed before as the timer was armed while I collected the
>> /proc/modules and then triggered the crash.
> 
> There is a pending issue about vmalloced areas, which I completely forgot:
> https://mail.gna.org/public/xenomai-core/2007-02/msg00138.html
> 

Ooops. By reading all my mails, I would have avoided reinventing this
wheel on my own. Your patch is almost what I posted yesterday to fix the
vmalloc issue.

Looks like we no longer need the last hunk of it on recent kernels, right?

Jan

PS: We should really consider using bug trackers for Adeos and Xenomai!
I have a few (minor) patches hanging around as well, but things quickly
get lost when bigger problems pop up.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] crashing 2.6.22
  2007-10-08  7:33                   ` [Xenomai-core] crashing 2.6.22 Jan Kiszka
@ 2007-10-08  8:45                     ` Gilles Chanteperdrix
  2007-10-09  9:11                       ` Philippe Gerum
  0 siblings, 1 reply; 31+ messages in thread
From: Gilles Chanteperdrix @ 2007-10-08  8:45 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On 10/8/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> Gilles Chanteperdrix wrote:
> > On 9/30/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> >> Philippe Gerum wrote:
> >>> On Sun, 2007-09-30 at 13:42 +0200, Jan Kiszka wrote:
> >>>> Jan Kiszka wrote:
> >>>>> Philippe Gerum wrote:
> >>>>>> On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
> >>>> ...
> >>>>>>>  And a third
> >>>>>>> one only gives me "Detected illicit call from domain Xenomai" before the
> >>>>>>> box reboots. :(
> >>>>>> Grmff... Do you run with your smp_processor_id() instrumentation in?
> >>>>> Yes, but I suspect this is just a symptom of some severe memory
> >>>>> corruption that (also?) hits I-pipe data structures. I just put in some
> >>>>> different instrumentation, and that warning is gone, the box just hangs
> >>>>> hard at a different point. Very unfriendly.
> >>>> Hah! Got some crash log by hacking a raw printk-to-uart:
> >>>>
> >>>> [...]
> >>>> <6>Xenomai: starting RTDM services.
> >>>> <6>NET: Registered protocol family 10
> >>>> <6>lo: Disabled Privacy Extensions
> >>>> <6>ADDRCONF(NETDEV_UP): eth0: link is not ready
> >>>> <3>I-pipe: Detected illicit call from domain 'Xenomai'
> >>>> <3>        into a service reserved for domain 'Linux' and below.
> >>>>        f3a6bc18 00000000 00000000 c05dad6c f3a6bc3c c0105fc3 c03513c7 c05dc100
> >>>>        00000009 f3a6bc54 c01479cb c03592f8 c0357ae2 c035e069 f3a6bc88 f3a6bc70
> >>>>        c0127224 c0111df8 00000000 f3a6bd74 00000000 f3a6bd74 f3a6bc80 c012727f
> >>>> Call Trace:
> >>>>  [<c010520f>] show_trace_log_lvl+0x1f/0x40
> >>>>  [<c01052e1>] show_stack_log_lvl+0xb1/0xe0
> >>>>  [<c0105fc3>] show_stack+0x33/0x40
> >>>>  [<c01479cb>] ipipe_check_context+0x7b/0x90
> >>>>  [<c0127224>] __atomic_notifier_call_chain+0x24/0x60
> >>>>  [<c012727f>] atomic_notifier_call_chain+0x1f/0x30
> >>>>  [<c0131e02>] notify_die+0x32/0x40
> >>>>  [<c0105d29>] do_invalid_op+0x59/0xa0
> >>>>  [<c0111d0b>] __ipipe_handle_exception+0x7b/0x144
> >>>>  [<c02dfaeb>] error_code+0x6f/0x7c
> >>> Wow. Why that?
> >>>
> >>>>  [<c0111d13>] __ipipe_handle_exception+0x83/0x144
> >>>>  [<c02dfaeb>] error_code+0x6f/0x7c
> >>> And this? We should not get any exception over an IPI3 handler. I guess
> >>> the double fault may be explained by this root cause.
> >>>
> >>>>  [<c01117df>] __ipipe_handle_irq+0x4f/0x140
> >>>>  [<c0104c5e>] ipipe_ipi3+0x26/0x40
> >>> Our LAPIC timer vector. Are you running full modular or statically btw?
> >> Fully modular. Compiling the nucleus in makes the lock-up move to
> >> another, once again invisible spot.
> >>
> >> I nailed down the fault address in the scenario above. It's in the
> >> nucleus module, at the first byte of xntimer_tick_aperiodic. Are we
> >> loosing module text pages over the time? This functions must have been
> >> executed before as the timer was armed while I collected the
> >> /proc/modules and then triggered the crash.
> >
> > There is a pending issue about vmalloced areas, which I completely forgot:
> > https://mail.gna.org/public/xenomai-core/2007-02/msg00138.html
> >
>
> Ooops. By reading all my mails, I would have avoided reinventing this
> wheel on my own. Your patch is almost what I posted yesterday to fix the
> vmalloc issue.
>
> Looks like we no longer need the last hunk of it on recent kernels, right?

Yes, it fixes an issue which was fixed a long time ago.

> Jan
>
> PS: We should really consider using bug trackers for Adeos and Xenomai!
> I have a few (minor) patches hanging around as well, but things quickly
> get lost when bigger problems pop up.

We have bug trackers, the point is think about using them.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] crashing 2.6.22
  2007-10-08  8:45                     ` Gilles Chanteperdrix
@ 2007-10-09  9:11                       ` Philippe Gerum
  0 siblings, 0 replies; 31+ messages in thread
From: Philippe Gerum @ 2007-10-09  9:11 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai-core

On Mon, 2007-10-08 at 10:45 +0200, Gilles Chanteperdrix wrote:
> > Ooops. By reading all my mails, I would have avoided reinventing
> this
> > wheel on my own. Your patch is almost what I posted yesterday to fix
> the
> > vmalloc issue.
> >
> > Looks like we no longer need the last hunk of it on recent kernels,
> right?
> 
> Yes, it fixes an issue which was fixed a long time ago.
> 
Yeah, my mistake. I've postponed this patch and forgot to push it
forward again after the testing period in my cooker. Sorry about that.

> > Jan
> >
> > PS: We should really consider using bug trackers for Adeos and
> Xenomai!
> > I have a few (minor) patches hanging around as well, but things
> quickly
> > get lost when bigger problems pop up.
> 
> We have bug trackers, the point is think about using them.

Indeed. We even have patch trackers we could activate. Either we use
them, or any patch sent for inclusion should be posted in a separate
mail to the -core list, with a [PATCH] header. I'm sometimes missing
them because they are part of a lengthy conversation, buried under lots
of mail I could fast-forward a bit aggressively.
> 
-- 
Philippe.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-core] crashing 2.6.22 (was: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?)
  2007-09-30 11:42           ` [Xenomai-core] crashing 2.6.22 (was: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?) Jan Kiszka
  2007-09-30 12:42             ` Philippe Gerum
@ 2007-09-30 19:45             ` Philippe Gerum
  1 sibling, 0 replies; 31+ messages in thread
From: Philippe Gerum @ 2007-09-30 19:45 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On Sun, 2007-09-30 at 13:42 +0200, Jan Kiszka wrote:
> Jan Kiszka wrote:
> > Philippe Gerum wrote:
> >> On Sun, 2007-09-30 at 12:22 +0200, Jan Kiszka wrote:
> ...
> >>>  And a third
> >>> one only gives me "Detected illicit call from domain Xenomai" before the
> >>> box reboots. :(
> >> Grmff... Do you run with your smp_processor_id() instrumentation in?
> > 
> > Yes, but I suspect this is just a symptom of some severe memory
> > corruption that (also?) hits I-pipe data structures. I just put in some
> > different instrumentation, and that warning is gone, the box just hangs
> > hard at a different point. Very unfriendly.
> 
> Hah! Got some crash log by hacking a raw printk-to-uart:
> 

Btw, if it's based on fiddling with the 16550 directly as I imagine it
is, you may want to push me a patch. We already have this feature for
blackfin and powerpc (__ipipe_serial_debug(const char *fmt, ...)), and
it definitely makes sense to have it for x86* too.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?
  2007-09-30 11:00         ` Jan Kiszka
  2007-09-30 11:42           ` [Xenomai-core] crashing 2.6.22 (was: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?) Jan Kiszka
@ 2007-09-30 19:52           ` Philippe Gerum
  1 sibling, 0 replies; 31+ messages in thread
From: Philippe Gerum @ 2007-09-30 19:52 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Sun, 2007-09-30 at 13:00 +0200, Jan Kiszka wrote:
> > Do you work with 2.6.22 baseline, or the latest stable update?
> 
> 2.6.22.7 and .23-rc8. Both hang hard when I play with the backlight
> keys
> of my Fujitsu-Siemens Lifebook,

Hey, shameless plug, you have been caught. :o)

> > Grmff... Do you run with your smp_processor_id() instrumentation in?
> 
> Yes, but I suspect this is just a symptom of some severe memory
> corruption that (also?) hits I-pipe data structures.

Unlikely, but a timing-dependent issue may explain this behaviour as
well.

>  I just put in some
> different instrumentation, and that warning is gone, the box just
> hangs
> hard at a different point. Very unfriendly.
> 
> Jan
> 
-- 
Philippe.




^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2007-10-09  9:11 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-25 14:09 [Xenomai-help] Non-APIC setup broken for 2.4-SVN? Jan Kiszka
2007-09-25 14:25 ` Leopold Palomo-Avellaneda
2007-09-25 21:37 ` Philippe Gerum
2007-09-26  9:31   ` Jan Kiszka
2007-09-30 10:22     ` Jan Kiszka
2007-09-30 10:52       ` Philippe Gerum
2007-09-30 11:00         ` Jan Kiszka
2007-09-30 11:42           ` [Xenomai-core] crashing 2.6.22 (was: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?) Jan Kiszka
2007-09-30 12:42             ` Philippe Gerum
2007-09-30 15:31               ` [Xenomai-core] crashing 2.6.22 Jan Kiszka
2007-09-30 20:04                 ` Philippe Gerum
2007-10-01  9:04                 ` Gilles Chanteperdrix
2007-10-01  9:17                   ` Jan Kiszka
2007-10-01  9:23                     ` Gilles Chanteperdrix
2007-10-01  9:32                       ` Jan Kiszka
2007-10-01  9:38                         ` Gilles Chanteperdrix
2007-10-01 12:12                           ` [Xenomai-core] ARM compiling error Patrick
2007-10-01 12:25                             ` Gilles Chanteperdrix
     [not found]                               ` <200710011255.l91CtkvR000470@domain.hid>
2007-10-01 13:10                                 ` Gilles Chanteperdrix
2007-10-01 13:43                                   ` Patrick
2007-10-01 14:29                                     ` Gilles Chanteperdrix
2007-10-01 12:42                         ` [Xenomai-core] crashing 2.6.22 Labozzetta, Saverio
2007-10-01 13:32                           ` Labozzetta, Saverio
2007-10-02  9:04                             ` [Xenomai-help] rt_task crash kernel Patrick
2007-10-02  9:11                               ` Gilles Chanteperdrix
     [not found]                                 ` <200710020940.l929ep0p021831@domain.hid>
2007-10-02  9:46                                   ` [Xenomai-core] " Gilles Chanteperdrix
2007-10-08  7:33                   ` [Xenomai-core] crashing 2.6.22 Jan Kiszka
2007-10-08  8:45                     ` Gilles Chanteperdrix
2007-10-09  9:11                       ` Philippe Gerum
2007-09-30 19:45             ` [Xenomai-core] crashing 2.6.22 (was: [Xenomai-help] Non-APIC setup broken for 2.4-SVN?) Philippe Gerum
2007-09-30 19:52           ` [Xenomai-help] Non-APIC setup broken for 2.4-SVN? Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.