[Xenomai-help] rt_queue with multiple listeners

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai-help] rt_queue with multiple listeners
@ 2007-10-08 13:03 Petr Cervenka
  2008-02-06 14:09 ` [Xenomai-help] FPU not available Petr Cervenka
  0 siblings, 1 reply; 14+ messages in thread
From: Petr Cervenka @ 2007-10-08 13:03 UTC (permalink / raw)
  To: xenomai

Hello,
I would like to use a rt_queue (or some similar mechanism) in a special broadcast mode. I need one queue and multiple listeners, but I need to ensure that every listener gets every queue element. Some of the listeners could still work, when the function "rt_queue_send" is called. So I need some "fixed reference count" to be reached before the element is removed from the queue. The solution could be one queue for every listener. But I hope someone knows a better solution. Could you help me with this?
Petr Cervenka

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Xenomai-help] FPU not available
  2007-10-08 13:03 [Xenomai-help] rt_queue with multiple listeners Petr Cervenka
@ 2008-02-06 14:09 ` Petr Cervenka
  2008-02-06 14:45   ` Jan Kiszka
  2008-02-07 13:23   ` Gilles Chanteperdrix
  0 siblings, 2 replies; 14+ messages in thread
From: Petr Cervenka @ 2008-02-06 14:09 UTC (permalink / raw)
  To: xenomai-help

Hello.
Recently, we switched to newer distribution of linux (Kubuntu 7.10). During this switch we changed many things (Xenomai 2.4.1, linux kernel 2.6.24, x86_64 architecture, ...).
No we have problem, that in one of our tasks we are sometimes not able to use floating point operations (under very specific circumstances) . In such case, that task crashes immediately, but rest of the application runs "normaly". Output from dmesg is attached to this message. Task was created with T_FPU flag.
Is there anything we can check or change?
Petr Cervenka


[ 1132.862102] ------------[ cut here ]------------
[ 1132.862110] kernel BUG at kernel/ipipe/core.c:321!
[ 1132.862112] invalid opcode: 0000 [2] PREEMPT SMP
[ 1132.862115] CPU 0
[ 1132.862117] Modules linked in: rt_r8169 rtpacket rtnet rfcomm l2cap bluetooth ppdev container ac sbs sbshc dock battery lp irtty_sir sir_dev psmouse irda serio_raw parport_pc parport crc_ccitt pcspkr k8temp shpchp pci_hotplug i2c_nforce2 button i2c_core af_packet ipv6 evdev ext3 jbd mbcache sg sd_mod sata_nv forcedeth ata_generic ehci_hcd ohci_hcd libata amd74xx ide_core scsi_mod usbcore fan fuse
[ 1132.862148] Pid: 5802, comm: REG_TASK_2056 Tainted: G      D 2.6.24-adeos #2
[ 1132.862150] RIP: 0010:[<ffffffff80277adc>]  [<ffffffff80277adc>] __ipipe_restore_root+0x3c/0x50
[ 1132.862159] RSP: 0000:ffff81003b74ff00  EFLAGS: 00010002
[ 1132.862161] RAX: ffffffff80674aa0 RBX: ffffffff80674aa0 RCX: ffff81008099b000
[ 1132.862163] RDX: ffff81008099b000 RSI: 0000000000418ed6 RDI: 0000000000000000
[ 1132.862165] RBP: 00000000400903e0 R08: 0000000080400140 R09: 00000000007e883f
[ 1132.862168] R10: 0000000000000000 R11: 0000000000000206 R12: ffff81003b74ff58
[ 1132.862170] R13: 0000000000000007 R14: ffffffff8066f7c0 R15: 0000000000000001
[ 1132.862172] FS:  0000000040091950(0063) GS:ffffffff805ee000(0000) knlGS:0000000000000000
[ 1132.862174] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1132.862176] CR2: 00002aaab0bbe028 CR3: 000000003b7c6000 CR4: 00000000000006e0
[ 1132.862178] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1132.862180] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1132.862183] Process REG_TASK_2056 (pid: 5802, threadinfo ffff81003b74e000, task ffff81003be78790)
[ 1132.862184] Stack:  ffffffff8022728b 0000000000000296 0000000000000000 0000000000000000
[ 1132.862189]  0000000000000000 00000000400903e0 00000000007f2bb8 0000000040090380
[ 1132.862193]  00000000007f2a50 0000000000000001 ffffffff8049c103 0000000000000001
[ 1132.862197] Call Trace:
[ 1132.862203]  [<ffffffff8022728b>] __ipipe_handle_exception+0x16b/0x210
[ 1132.862210]  [<ffffffff8049c103>] error_sti+0x1e/0x52
[ 1132.862218]
[ 1132.862219]
[ 1132.862219] Code: 0f 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
[ 1132.862229] RIP  [<ffffffff80277adc>] __ipipe_restore_root+0x3c/0x50
[ 1132.862233]  RSP <ffff81003b74ff00>
[ 1132.862238] ---[ end trace a3cc53b342d61517 ]---



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] FPU not available
  2008-02-06 14:09 ` [Xenomai-help] FPU not available Petr Cervenka
@ 2008-02-06 14:45   ` Jan Kiszka
  2008-02-07 12:22     ` Petr Cervenka
  2008-02-07 13:23   ` Gilles Chanteperdrix
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2008-02-06 14:45 UTC (permalink / raw)
  To: Petr Cervenka; +Cc: xenomai-help

Petr Cervenka wrote:
> Hello.
> Recently, we switched to newer distribution of linux (Kubuntu 7.10). During this switch we changed many things (Xenomai 2.4.1, linux kernel 2.6.24, x86_64 architecture, ...).
> No we have problem, that in one of our tasks we are sometimes not able to use floating point operations (under very specific circumstances) . In such case, that task crashes immediately, but rest of the application runs "normaly". Output from dmesg is attached to this message. Task was created with T_FPU flag.
> Is there anything we can check or change?
> Petr Cervenka
> 
> 
> [ 1132.862102] ------------[ cut here ]------------
> [ 1132.862110] kernel BUG at kernel/ipipe/core.c:321!

Uah, that should no longer happen. What ipipe version are you using?

> [ 1132.862112] invalid opcode: 0000 [2] PREEMPT SMP
> [ 1132.862115] CPU 0
> [ 1132.862117] Modules linked in: rt_r8169 rtpacket rtnet rfcomm l2cap bluetooth ppdev container ac sbs sbshc dock battery lp irtty_sir sir_dev psmouse irda serio_raw parport_pc parport crc_ccitt pcspkr k8temp shpchp pci_hotplug i2c_nforce2 button i2c_core af_packet ipv6 evdev ext3 jbd mbcache sg sd_mod sata_nv forcedeth ata_generic ehci_hcd ohci_hcd libata amd74xx ide_core scsi_mod usbcore fan fuse
> [ 1132.862148] Pid: 5802, comm: REG_TASK_2056 Tainted: G      D 2.6.24-adeos #2
> [ 1132.862150] RIP: 0010:[<ffffffff80277adc>]  [<ffffffff80277adc>] __ipipe_restore_root+0x3c/0x50
> [ 1132.862159] RSP: 0000:ffff81003b74ff00  EFLAGS: 00010002
> [ 1132.862161] RAX: ffffffff80674aa0 RBX: ffffffff80674aa0 RCX: ffff81008099b000
> [ 1132.862163] RDX: ffff81008099b000 RSI: 0000000000418ed6 RDI: 0000000000000000
> [ 1132.862165] RBP: 00000000400903e0 R08: 0000000080400140 R09: 00000000007e883f
> [ 1132.862168] R10: 0000000000000000 R11: 0000000000000206 R12: ffff81003b74ff58
> [ 1132.862170] R13: 0000000000000007 R14: ffffffff8066f7c0 R15: 0000000000000001
> [ 1132.862172] FS:  0000000040091950(0063) GS:ffffffff805ee000(0000) knlGS:0000000000000000
> [ 1132.862174] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 1132.862176] CR2: 00002aaab0bbe028 CR3: 000000003b7c6000 CR4: 00000000000006e0
> [ 1132.862178] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1132.862180] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 1132.862183] Process REG_TASK_2056 (pid: 5802, threadinfo ffff81003b74e000, task ffff81003be78790)
> [ 1132.862184] Stack:  ffffffff8022728b 0000000000000296 0000000000000000 0000000000000000
> [ 1132.862189]  0000000000000000 00000000400903e0 00000000007f2bb8 0000000040090380
> [ 1132.862193]  00000000007f2a50 0000000000000001 ffffffff8049c103 0000000000000001
> [ 1132.862197] Call Trace:
> [ 1132.862203]  [<ffffffff8022728b>] __ipipe_handle_exception+0x16b/0x210
> [ 1132.862210]  [<ffffffff8049c103>] error_sti+0x1e/0x52
> [ 1132.862218]
> [ 1132.862219]
> [ 1132.862219] Code: 0f 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
> [ 1132.862229] RIP  [<ffffffff80277adc>] __ipipe_restore_root+0x3c/0x50
> [ 1132.862233]  RSP <ffff81003b74ff00>
> [ 1132.862238] ---[ end trace a3cc53b342d61517 ]---
> 

If you are already on latest ipipe-2.6.24-rc6-x86-2.0-02, please switch
on IPIPE_DEBUG and the tracer, specifically IPIPE_TRACE_MCOUNT. Then try
to trigger the issue and post the full kernel dump. Also, if you happen
to have a self-contained test case for this, we would happily take it.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] FPU not available
  2008-02-06 14:45   ` Jan Kiszka
@ 2008-02-07 12:22     ` Petr Cervenka
  2008-02-07 13:16       ` Jan Kiszka
  0 siblings, 1 reply; 14+ messages in thread
From: Petr Cervenka @ 2008-02-07 12:22 UTC (permalink / raw)
  To: jan.kiszka, grugh; +Cc: xenomai

>
>Petr Cervenka wrote:
>> Hello.
>> Recently, we switched to newer distribution of linux (Kubuntu 7.10).
During this switch we changed many things (Xenomai 2.4.1, linux kernel
2.6.24, x86_64 architecture, ...).
>> No we have problem, that in one of our tasks we are sometimes not able
to use floating point operations (under very specific circumstances) . In
such case, that task crashes immediately, but rest of the application runs
"normaly". Output from dmesg is attached to this message. Task was created
with T_FPU flag.
>> Is there anything we can check or change?
>> Petr Cervenka
>> 
>> 
>> [ 1132.862102] ------------[ cut here ]------------
>> [ 1132.862110] kernel BUG at kernel/ipipe/core.c:321!
>
>Uah, that should no longer happen. What ipipe version are you using?

I' was using ipipe ipipe-2.6.24-rc6-x86-2.0-01, so I tried ipipe-2.6.24-rc6-x86-2.0-02.
I was unable to compile the kernel because of this error:
  CC      arch/x86/kernel/ipipe.o
arch/x86/kernel/ipipe.c: In function __ipipe_preempt_schedule_irq:
arch/x86/kernel/ipipe.c:588: error: implicit declaration of function preempt_schedule_irq
make[1]: *** [arch/x86/kernel/ipipe.o] Error 1
make: *** [arch/x86/kernel] Error 2 

So I changed the file preempt.h:
--- old/include/linux/preempt.h	2008-02-06 17:29:37.000000000 +0100
+++ new/include/linux/preempt.h	2008-02-06 17:54:51.000000000 +0100
@@ -28,6 +28,8 @@
 
 asmlinkage void preempt_schedule(void);
 
+asmlinkage void preempt_schedule_irq(void);
+
 #define preempt_disable() \
 do { \
 	ipipe_check_context(ipipe_root_domain); \ 

>
>> [ 1132.862112] invalid opcode: 0000 [2] PREEMPT SMP
>> [ 1132.862115] CPU 0
>> [ 1132.862117] Modules linked in: rt_r8169 rtpacket rtnet rfcomm l2cap
bluetooth ppdev container ac sbs sbshc dock battery lp irtty_sir sir_dev
psmouse irda serio_raw parport_pc parport crc_ccitt pcspkr k8temp shpchp
pci_hotplug i2c_nforce2 button i2c_core af_packet ipv6 evdev ext3 jbd
mbcache sg sd_mod sata_nv forcedeth ata_generic ehci_hcd ohci_hcd libata
amd74xx ide_core scsi_mod usbcore fan fuse
>> [ 1132.862148] Pid: 5802, comm: REG_TASK_2056 Tainted: G      D
2.6.24-adeos #2
>> [ 1132.862150] RIP: 0010:[<ffffffff80277adc>]  [<ffffffff80277adc>]
__ipipe_restore_root+0x3c/0x50
>> [ 1132.862159] RSP: 0000:ffff81003b74ff00  EFLAGS: 00010002
>> [ 1132.862161] RAX: ffffffff80674aa0 RBX: ffffffff80674aa0 RCX:
ffff81008099b000
>> [ 1132.862163] RDX: ffff81008099b000 RSI: 0000000000418ed6 RDI:
0000000000000000
>> [ 1132.862165] RBP: 00000000400903e0 R08: 0000000080400140 R09:
00000000007e883f
>> [ 1132.862168] R10: 0000000000000000 R11: 0000000000000206 R12:
ffff81003b74ff58
>> [ 1132.862170] R13: 0000000000000007 R14: ffffffff8066f7c0 R15:
0000000000000001
>> [ 1132.862172] FS:  0000000040091950(0063) GS:ffffffff805ee000(0000)
knlGS:0000000000000000
>> [ 1132.862174] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [ 1132.862176] CR2: 00002aaab0bbe028 CR3: 000000003b7c6000 CR4:
00000000000006e0
>> [ 1132.862178] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
>> [ 1132.862180] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
>> [ 1132.862183] Process REG_TASK_2056 (pid: 5802, threadinfo
ffff81003b74e000, task ffff81003be78790)
>> [ 1132.862184] Stack:  ffffffff8022728b 0000000000000296
0000000000000000 0000000000000000
>> [ 1132.862189]  0000000000000000 00000000400903e0 00000000007f2bb8
0000000040090380
>> [ 1132.862193]  00000000007f2a50 0000000000000001 ffffffff8049c103
0000000000000001
>> [ 1132.862197] Call Trace:
>> [ 1132.862203]  [<ffffffff8022728b>]
__ipipe_handle_exception+0x16b/0x210
>> [ 1132.862210]  [<ffffffff8049c103>] error_sti+0x1e/0x52
>> [ 1132.862218]
>> [ 1132.862219]
>> [ 1132.862219] Code: 0f 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00
00 00 00
>> [ 1132.862229] RIP  [<ffffffff80277adc>] __ipipe_restore_root+0x3c/0x50
>> [ 1132.862233]  RSP <ffff81003b74ff00>
>> [ 1132.862238] ---[ end trace a3cc53b342d61517 ]---
>> 
>
>If you are already on latest ipipe-2.6.24-rc6-x86-2.0-02, please switch
>on IPIPE_DEBUG and the tracer, specifically IPIPE_TRACE_MCOUNT. Then try
>to trigger the issue and post the full kernel dump. Also, if you happen
>to have a self-contained test case for this, we would happily take it.
>

When I enabled IPIPE_DEBUG and trace, the problem disapperared. But when it's switched off it's the same as before.
Also temporary switching to secondary mode helps.

>Thanks,
>Jan
>
>-- 
>Siemens AG, Corporate Technology, CT SE 2
>Corporate Competence Center Embedded Linux
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] FPU not available
  2008-02-07 12:22     ` Petr Cervenka
@ 2008-02-07 13:16       ` Jan Kiszka
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Kiszka @ 2008-02-07 13:16 UTC (permalink / raw)
  To: Petr Cervenka; +Cc: xenomai

Petr Cervenka wrote:
>> Petr Cervenka wrote:
>>> Hello.
>>> Recently, we switched to newer distribution of linux (Kubuntu 7.10).
> During this switch we changed many things (Xenomai 2.4.1, linux kernel
> 2.6.24, x86_64 architecture, ...).
>>> No we have problem, that in one of our tasks we are sometimes not able
> to use floating point operations (under very specific circumstances) . In
> such case, that task crashes immediately, but rest of the application runs
> "normaly". Output from dmesg is attached to this message. Task was created
> with T_FPU flag.
>>> Is there anything we can check or change?
>>> Petr Cervenka
>>>
>>>
>>> [ 1132.862102] ------------[ cut here ]------------
>>> [ 1132.862110] kernel BUG at kernel/ipipe/core.c:321!
>> Uah, that should no longer happen. What ipipe version are you using?
> 
> I' was using ipipe ipipe-2.6.24-rc6-x86-2.0-01, so I tried ipipe-2.6.24-rc6-x86-2.0-02.
> I was unable to compile the kernel because of this error:
>   CC      arch/x86/kernel/ipipe.o
> arch/x86/kernel/ipipe.c: In function __ipipe_preempt_schedule_irq:
> arch/x86/kernel/ipipe.c:588: error: implicit declaration of function preempt_schedule_irq
> make[1]: *** [arch/x86/kernel/ipipe.o] Error 1
> make: *** [arch/x86/kernel] Error 2 
> 
> So I changed the file preempt.h:
> --- old/include/linux/preempt.h	2008-02-06 17:29:37.000000000 +0100
> +++ new/include/linux/preempt.h	2008-02-06 17:54:51.000000000 +0100
> @@ -28,6 +28,8 @@
>  
>  asmlinkage void preempt_schedule(void);
>  
> +asmlinkage void preempt_schedule_irq(void);
> +
>  #define preempt_disable() \
>  do { \
>  	ipipe_check_context(ipipe_root_domain); \ 
> 

Yes, that fix is already in git and will be delivered with the next
ipipe version.

>>> [ 1132.862112] invalid opcode: 0000 [2] PREEMPT SMP
>>> [ 1132.862115] CPU 0
>>> [ 1132.862117] Modules linked in: rt_r8169 rtpacket rtnet rfcomm l2cap
> bluetooth ppdev container ac sbs sbshc dock battery lp irtty_sir sir_dev
> psmouse irda serio_raw parport_pc parport crc_ccitt pcspkr k8temp shpchp
> pci_hotplug i2c_nforce2 button i2c_core af_packet ipv6 evdev ext3 jbd
> mbcache sg sd_mod sata_nv forcedeth ata_generic ehci_hcd ohci_hcd libata
> amd74xx ide_core scsi_mod usbcore fan fuse
>>> [ 1132.862148] Pid: 5802, comm: REG_TASK_2056 Tainted: G      D
> 2.6.24-adeos #2
>>> [ 1132.862150] RIP: 0010:[<ffffffff80277adc>]  [<ffffffff80277adc>]
> __ipipe_restore_root+0x3c/0x50
>>> [ 1132.862159] RSP: 0000:ffff81003b74ff00  EFLAGS: 00010002
>>> [ 1132.862161] RAX: ffffffff80674aa0 RBX: ffffffff80674aa0 RCX:
> ffff81008099b000
>>> [ 1132.862163] RDX: ffff81008099b000 RSI: 0000000000418ed6 RDI:
> 0000000000000000
>>> [ 1132.862165] RBP: 00000000400903e0 R08: 0000000080400140 R09:
> 00000000007e883f
>>> [ 1132.862168] R10: 0000000000000000 R11: 0000000000000206 R12:
> ffff81003b74ff58
>>> [ 1132.862170] R13: 0000000000000007 R14: ffffffff8066f7c0 R15:
> 0000000000000001
>>> [ 1132.862172] FS:  0000000040091950(0063) GS:ffffffff805ee000(0000)
> knlGS:0000000000000000
>>> [ 1132.862174] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> [ 1132.862176] CR2: 00002aaab0bbe028 CR3: 000000003b7c6000 CR4:
> 00000000000006e0
>>> [ 1132.862178] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
>>> [ 1132.862180] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
>>> [ 1132.862183] Process REG_TASK_2056 (pid: 5802, threadinfo
> ffff81003b74e000, task ffff81003be78790)
>>> [ 1132.862184] Stack:  ffffffff8022728b 0000000000000296
> 0000000000000000 0000000000000000
>>> [ 1132.862189]  0000000000000000 00000000400903e0 00000000007f2bb8
> 0000000040090380
>>> [ 1132.862193]  00000000007f2a50 0000000000000001 ffffffff8049c103
> 0000000000000001
>>> [ 1132.862197] Call Trace:
>>> [ 1132.862203]  [<ffffffff8022728b>]
> __ipipe_handle_exception+0x16b/0x210
>>> [ 1132.862210]  [<ffffffff8049c103>] error_sti+0x1e/0x52
>>> [ 1132.862218]
>>> [ 1132.862219]
>>> [ 1132.862219] Code: 0f 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00
> 00 00 00
>>> [ 1132.862229] RIP  [<ffffffff80277adc>] __ipipe_restore_root+0x3c/0x50
>>> [ 1132.862233]  RSP <ffff81003b74ff00>
>>> [ 1132.862238] ---[ end trace a3cc53b342d61517 ]---
>>>
>> If you are already on latest ipipe-2.6.24-rc6-x86-2.0-02, please switch
>> on IPIPE_DEBUG and the tracer, specifically IPIPE_TRACE_MCOUNT. Then try
>> to trigger the issue and post the full kernel dump. Also, if you happen
>> to have a self-contained test case for this, we would happily take it.
>>
> 
> When I enabled IPIPE_DEBUG and trace, the problem disapperared. But when it's switched off it's the same as before.
> Also temporary switching to secondary mode helps.
> 

Some evil race may happen here, and the tracer may prevent it from
occurring (or make it far less probable). You have SMP on, but do you
also run on something >1 CPU?

What is weird:

__ipipe_handle_exception()
	if (unlikely(!ipipe_root_domain_p)) {
		...
		/* Switch to root so that Linux can handle the fault cleanly. */
		ipipe_current_domain = ipipe_root_domain;

But then later in this function:

	local_irq_restore(flags);
	=> __ipipe_restore_root()
		BUG_ON(!ipipe_root_domain_p);

Hmm, the latter BUG_ON is not armed with CONFIG_IPIPE_DEBUG_CONTEXT -
which is default y for CONFIG_IPIPE_DEBUG. Could you check it is off
when doing the test with the tracer? Still weird, though.

Again, a test case to reproduce the problem is welcome, too.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] FPU not available
  2008-02-06 14:09 ` [Xenomai-help] FPU not available Petr Cervenka
  2008-02-06 14:45   ` Jan Kiszka
@ 2008-02-07 13:23   ` Gilles Chanteperdrix
  2008-02-07 13:45     ` Jan Kiszka
  1 sibling, 1 reply; 14+ messages in thread
From: Gilles Chanteperdrix @ 2008-02-07 13:23 UTC (permalink / raw)
  To: Petr Cervenka; +Cc: xenomai-help

On Wed, Feb 6, 2008 at 3:09 PM, Petr Cervenka <grugh@domain.hid> wrote:
> Hello.
>  Recently, we switched to newer distribution of linux (Kubuntu 7.10). During this switch we changed many things (Xenomai 2.4.1, linux kernel 2.6.24, x86_64 architecture, ...).
>  No we have problem, that in one of our tasks we are sometimes not able to use floating point operations (under very specific circumstances) . In such case, that task crashes immediately, but rest of the application runs "normaly". Output from dmesg is attached to this message. Task was created with T_FPU flag.
>  Is there anything we can check or change?
>  Petr Cervenka

I do not know if this is related to the issue you are facing, but the
first FPU fault of a thread running in primary mode may be handled by
Xenomai without switching to secondary mode. So, maybe the fault
epilogue implicitely expects Xenomai to have switched the fault to
secondary mode and use some secondary mode services such as
ipipe_restore_root, whereas the thread never leaved primary mode.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] FPU not available
  2008-02-07 13:23   ` Gilles Chanteperdrix
@ 2008-02-07 13:45     ` Jan Kiszka
  2008-02-07 14:02       ` Jan Kiszka
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2008-02-07 13:45 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Petr Cervenka, xenomai-help

Gilles Chanteperdrix wrote:
> On Wed, Feb 6, 2008 at 3:09 PM, Petr Cervenka <grugh@domain.hid> wrote:
>> Hello.
>>  Recently, we switched to newer distribution of linux (Kubuntu 7.10). During this switch we changed many things (Xenomai 2.4.1, linux kernel 2.6.24, x86_64 architecture, ...).
>>  No we have problem, that in one of our tasks we are sometimes not able to use floating point operations (under very specific circumstances) . In such case, that task crashes immediately, but rest of the application runs "normaly". Output from dmesg is attached to this message. Task was created with T_FPU flag.
>>  Is there anything we can check or change?
>>  Petr Cervenka
> 
> I do not know if this is related to the issue you are facing, but the
> first FPU fault of a thread running in primary mode may be handled by
> Xenomai without switching to secondary mode. So, maybe the fault
> epilogue implicitely expects Xenomai to have switched the fault to
> secondary mode and use some secondary mode services such as
> ipipe_restore_root, whereas the thread never leaved primary mode.
> 

Good point! That is probably this path (and not the one I starred on):

__ipipe_handle_exception()
	...
	if (unlikely(ipipe_trap_notify(vector, regs))) {
		local_irq_restore(flags);
		return 1;
	}

That needs some more thoughts...

Petr, confirming our assumptions with the help of the tracer is still
valuable!

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] FPU not available
  2008-02-07 13:45     ` Jan Kiszka
@ 2008-02-07 14:02       ` Jan Kiszka
  2008-02-07 14:35         ` Philippe Gerum
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2008-02-07 14:02 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Petr Cervenka, xenomai-help

Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> On Wed, Feb 6, 2008 at 3:09 PM, Petr Cervenka <grugh@domain.hid> wrote:
>>> Hello.
>>>  Recently, we switched to newer distribution of linux (Kubuntu 7.10). During this switch we changed many things (Xenomai 2.4.1, linux kernel 2.6.24, x86_64 architecture, ...).
>>>  No we have problem, that in one of our tasks we are sometimes not able to use floating point operations (under very specific circumstances) . In such case, that task crashes immediately, but rest of the application runs "normaly". Output from dmesg is attached to this message. Task was created with T_FPU flag.
>>>  Is there anything we can check or change?
>>>  Petr Cervenka
>> I do not know if this is related to the issue you are facing, but the
>> first FPU fault of a thread running in primary mode may be handled by
>> Xenomai without switching to secondary mode. So, maybe the fault
>> epilogue implicitely expects Xenomai to have switched the fault to
>> secondary mode and use some secondary mode services such as
>> ipipe_restore_root, whereas the thread never leaved primary mode.
>>
> 
> Good point! That is probably this path (and not the one I starred on):
> 
> __ipipe_handle_exception()
> 	...
> 	if (unlikely(ipipe_trap_notify(vector, regs))) {
> 		local_irq_restore(flags);
> 		return 1;
> 	}
> 
> That needs some more thoughts...

Looking at the whole __ipipe_handle_exception, the problem is related to
 the early, context-independent __ipipe_stall_root(). Can we postpone
this safely after having called any potential high-stage hooks for this
exception, and then only if the callee migrated the thread to the root
domain? Or is there a need to have the root domain stalled across the
post-fault migration?

In the latter case, we would have to fiddle with the stall bits directly
instead of calling local_irq_restore - not just to work around the
BUG_ON, but also to avoid sync'ing root over potentially stalled
non-root domains...

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] FPU not available
  2008-02-07 14:02       ` Jan Kiszka
@ 2008-02-07 14:35         ` Philippe Gerum
  2008-02-07 14:56           ` Jan Kiszka
  0 siblings, 1 reply; 14+ messages in thread
From: Philippe Gerum @ 2008-02-07 14:35 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Petr Cervenka, xenomai-help

Jan Kiszka wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> On Wed, Feb 6, 2008 at 3:09 PM, Petr Cervenka <grugh@domain.hid> wrote:
>>>> Hello.
>>>>  Recently, we switched to newer distribution of linux (Kubuntu 7.10). During this switch we changed many things (Xenomai 2.4.1, linux kernel 2.6.24, x86_64 architecture, ...).
>>>>  No we have problem, that in one of our tasks we are sometimes not able to use floating point operations (under very specific circumstances) . In such case, that task crashes immediately, but rest of the application runs "normaly". Output from dmesg is attached to this message. Task was created with T_FPU flag.
>>>>  Is there anything we can check or change?
>>>>  Petr Cervenka
>>> I do not know if this is related to the issue you are facing, but the
>>> first FPU fault of a thread running in primary mode may be handled by
>>> Xenomai without switching to secondary mode. So, maybe the fault
>>> epilogue implicitely expects Xenomai to have switched the fault to
>>> secondary mode and use some secondary mode services such as
>>> ipipe_restore_root, whereas the thread never leaved primary mode.
>>>
>> Good point! That is probably this path (and not the one I starred on):
>>
>> __ipipe_handle_exception()
>> 	...
>> 	if (unlikely(ipipe_trap_notify(vector, regs))) {
>> 		local_irq_restore(flags);
>> 		return 1;
>> 	}
>>
>> That needs some more thoughts...
> 
> Looking at the whole __ipipe_handle_exception, the problem is related to
>  the early, context-independent __ipipe_stall_root(). Can we postpone
> this safely after having called any potential high-stage hooks for this
> exception, and then only if the callee migrated the thread to the root
> domain? Or is there a need to have the root domain stalled across the
> post-fault migration?
>

Someone from the root domain may want to get notified of the exceptions
occurring in that domain too, in which case we may not postpone the
virtual mask fixup after the notifier invocation, otherwise we would
call the handler with a broken interrupt state.

> In the latter case, we would have to fiddle with the stall bits directly
> instead of calling local_irq_restore - not just to work around the
> BUG_ON, but also to avoid sync'ing root over potentially stalled
> non-root domains...
> 

This used to be done by ipipe_restore_pipeline_nosync() in older
patches, but this one has disappeared after the flat log refactoring. We
indeed need to resurrect something alike in order to reset the stall bit
without calling the syncer, when taking the fast exit path after
ipipe_trap_notify().

> Jan
> 


-- 
Philippe.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] FPU not available
  2008-02-07 14:35         ` Philippe Gerum
@ 2008-02-07 14:56           ` Jan Kiszka
  2008-02-08 12:41             ` Petr Cervenka
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2008-02-07 14:56 UTC (permalink / raw)
  To: rpm; +Cc: Petr Cervenka, xenomai-help

Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
>>>> On Wed, Feb 6, 2008 at 3:09 PM, Petr Cervenka <grugh@domain.hid> wrote:
>>>>> Hello.
>>>>>  Recently, we switched to newer distribution of linux (Kubuntu 7.10). During this switch we changed many things (Xenomai 2.4.1, linux kernel 2.6.24, x86_64 architecture, ...).
>>>>>  No we have problem, that in one of our tasks we are sometimes not able to use floating point operations (under very specific circumstances) . In such case, that task crashes immediately, but rest of the application runs "normaly". Output from dmesg is attached to this message. Task was created with T_FPU flag.
>>>>>  Is there anything we can check or change?
>>>>>  Petr Cervenka
>>>> I do not know if this is related to the issue you are facing, but the
>>>> first FPU fault of a thread running in primary mode may be handled by
>>>> Xenomai without switching to secondary mode. So, maybe the fault
>>>> epilogue implicitely expects Xenomai to have switched the fault to
>>>> secondary mode and use some secondary mode services such as
>>>> ipipe_restore_root, whereas the thread never leaved primary mode.
>>>>
>>> Good point! That is probably this path (and not the one I starred on):
>>>
>>> __ipipe_handle_exception()
>>> 	...
>>> 	if (unlikely(ipipe_trap_notify(vector, regs))) {
>>> 		local_irq_restore(flags);
>>> 		return 1;
>>> 	}
>>>
>>> That needs some more thoughts...
>> Looking at the whole __ipipe_handle_exception, the problem is related to
>>  the early, context-independent __ipipe_stall_root(). Can we postpone
>> this safely after having called any potential high-stage hooks for this
>> exception, and then only if the callee migrated the thread to the root
>> domain? Or is there a need to have the root domain stalled across the
>> post-fault migration?
>>
> 
> Someone from the root domain may want to get notified of the exceptions
> occurring in that domain too, in which case we may not postpone the
> virtual mask fixup after the notifier invocation, otherwise we would
> call the handler with a broken interrupt state.
> 
>> In the latter case, we would have to fiddle with the stall bits directly
>> instead of calling local_irq_restore - not just to work around the
>> BUG_ON, but also to avoid sync'ing root over potentially stalled
>> non-root domains...
>>
> 
> This used to be done by ipipe_restore_pipeline_nosync() in older
> patches, but this one has disappeared after the flat log refactoring. We
> indeed need to resurrect something alike in order to reset the stall bit
> without calling the syncer, when taking the fast exit path after
> ipipe_trap_notify().

Hmm, so it could be fairly simple in fact:

--- a/arch/x86/kernel/ipipe.c
+++ b/arch/x86/kernel/ipipe.c
@@ -755,7 +755,9 @@ int __ipipe_handle_exception(struct pt_r
 #endif /* CONFIG_KGDB */
 
 	if (unlikely(ipipe_trap_notify(vector, regs))) {
-		local_irq_restore(flags);
+		if (!flags)
+			__clear_bit(IPIPE_STALL_FLAG,
+				    &ipipe_root_cpudom_var(status));
 		return 1;
 	}
 
Petr, ready to try?
 
Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] FPU not available
  2008-02-07 14:56           ` Jan Kiszka
@ 2008-02-08 12:41             ` Petr Cervenka
  2008-02-08 13:17               ` Philippe Gerum
  2008-02-08 13:18               ` Philippe Gerum
  0 siblings, 2 replies; 14+ messages in thread
From: Petr Cervenka @ 2008-02-08 12:41 UTC (permalink / raw)
  To: jan.kiszka; +Cc: xenomai


>Philippe Gerum wrote:
>> Jan Kiszka wrote:
>>> Jan Kiszka wrote:
>>>> Gilles Chanteperdrix wrote:
>>>>> On Wed, Feb 6, 2008 at 3:09 PM, Petr Cervenka <grugh@domain.hid> wrote:
>>>>>> Hello.
>>>>>>  Recently, we switched to newer distribution of linux (Kubuntu 7.10). During this switch we changed many things (Xenomai 2.4.1, linux kernel 2.6.24, x86_64 architecture, ...).
>>>>>>  No we have problem, that in one of our tasks we are sometimes not able to use floating point operations (under very specific circumstances) . In such case, that task crashes immediately, but rest of the application runs "normaly". Output from dmesg is attached to this message. Task was created with T_FPU flag.
>>>>>>  Is there anything we can check or change?
>>>>>>  Petr Cervenka
>>>>> I do not know if this is related to the issue you are facing, but the
>>>>> first FPU fault of a thread running in primary mode may be handled by
>>>>> Xenomai without switching to secondary mode. So, maybe the fault
>>>>> epilogue implicitely expects Xenomai to have switched the fault to
>>>>> secondary mode and use some secondary mode services such as
>>>>> ipipe_restore_root, whereas the thread never leaved primary mode.
>>>>>
>>>> Good point! That is probably this path (and not the one I starred on):
>>>>
>>>> __ipipe_handle_exception()
>>>> 	...
>>>> 	if (unlikely(ipipe_trap_notify(vector, regs))) {
>>>> 		local_irq_restore(flags);
>>>> 		return 1;
>>>> 	}
>>>>
>>>> That needs some more thoughts...
>>> Looking at the whole __ipipe_handle_exception, the problem is related to
>>>  the early, context-independent __ipipe_stall_root(). Can we postpone
>>> this safely after having called any potential high-stage hooks for this
>>> exception, and then only if the callee migrated the thread to the root
>>> domain? Or is there a need to have the root domain stalled across the
>>> post-fault migration?
>>>
>> 
>> Someone from the root domain may want to get notified of the exceptions
>> occurring in that domain too, in which case we may not postpone the
>> virtual mask fixup after the notifier invocation, otherwise we would
>> call the handler with a broken interrupt state.
>> 
>>> In the latter case, we would have to fiddle with the stall bits directly
>>> instead of calling local_irq_restore - not just to work around the
>>> BUG_ON, but also to avoid sync'ing root over potentially stalled
>>> non-root domains...
>>>
>> 
>> This used to be done by ipipe_restore_pipeline_nosync() in older
>> patches, but this one has disappeared after the flat log refactoring. We
>> indeed need to resurrect something alike in order to reset the stall bit
>> without calling the syncer, when taking the fast exit path after
>> ipipe_trap_notify().
>
>Hmm, so it could be fairly simple in fact:
>
>--- a/arch/x86/kernel/ipipe.c
>+++ b/arch/x86/kernel/ipipe.c
>@@ -755,7 +755,9 @@ int __ipipe_handle_exception(struct pt_r
> #endif /* CONFIG_KGDB */
> 
> 	if (unlikely(ipipe_trap_notify(vector, regs))) {
>-		local_irq_restore(flags);
>+		if (!flags)
>+			__clear_bit(IPIPE_STALL_FLAG,
>+				    &ipipe_root_cpudom_var(status));
> 		return 1;
> 	}
> 
>Petr, ready to try?
> 
I tried this patch and the problem (or the race condition) disappeared. ;-)
Is there any (easy) method to recognise if the problem was solved?

To your previous questions:
We use Athlon64 X2 (2 cores, 64-bit), kubuntu 7.10 amd64.
We have 2 real-time userspace applications: some kind of server for rtnet communication with special measuring hardware, and clients (1-4 instances) for some computing, configuration, ethernet comunication, etc. Comunication between server and clients is via named rt_queues.. Any "failing example" is perhaps impossible.
Any attempt with IPIPE_DEBUG and tracer removes the race condition.

Thank you VERY MUCH for you help and support (all of you).
Petr

>Jan
>
>-- 
>Siemens AG, Corporate Technology, CT SE 2
>Corporate Competence Center Embedded Linux
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] FPU not available
  2008-02-08 12:41             ` Petr Cervenka
@ 2008-02-08 13:17               ` Philippe Gerum
  2008-02-08 13:18               ` Philippe Gerum
  1 sibling, 0 replies; 14+ messages in thread
From: Philippe Gerum @ 2008-02-08 13:17 UTC (permalink / raw)
  To: Petr Cervenka; +Cc: jan.kiszka, xenomai

Petr Cervenka wrote:
>> Philippe Gerum wrote:
>>> Jan Kiszka wrote:
>>>> Jan Kiszka wrote:
>>>>> Gilles Chanteperdrix wrote:
>>>>>> On Wed, Feb 6, 2008 at 3:09 PM, Petr Cervenka <grugh@domain.hid> wrote:
>>>>>>> Hello.
>>>>>>>  Recently, we switched to newer distribution of linux (Kubuntu 7.10). During this switch we changed many things (Xenomai 2.4.1, linux kernel 2.6.24, x86_64 architecture, ...).
>>>>>>>  No we have problem, that in one of our tasks we are sometimes not able to use floating point operations (under very specific circumstances) . In such case, that task crashes immediately, but rest of the application runs "normaly". Output from dmesg is attached to this message. Task was created with T_FPU flag.
>>>>>>>  Is there anything we can check or change?
>>>>>>>  Petr Cervenka
>>>>>> I do not know if this is related to the issue you are facing, but the
>>>>>> first FPU fault of a thread running in primary mode may be handled by
>>>>>> Xenomai without switching to secondary mode. So, maybe the fault
>>>>>> epilogue implicitely expects Xenomai to have switched the fault to
>>>>>> secondary mode and use some secondary mode services such as
>>>>>> ipipe_restore_root, whereas the thread never leaved primary mode.
>>>>>>
>>>>> Good point! That is probably this path (and not the one I starred on):
>>>>>
>>>>> __ipipe_handle_exception()
>>>>> 	...
>>>>> 	if (unlikely(ipipe_trap_notify(vector, regs))) {
>>>>> 		local_irq_restore(flags);
>>>>> 		return 1;
>>>>> 	}
>>>>>
>>>>> That needs some more thoughts...
>>>> Looking at the whole __ipipe_handle_exception, the problem is related to
>>>>  the early, context-independent __ipipe_stall_root(). Can we postpone
>>>> this safely after having called any potential high-stage hooks for this
>>>> exception, and then only if the callee migrated the thread to the root
>>>> domain? Or is there a need to have the root domain stalled across the
>>>> post-fault migration?
>>>>
>>> Someone from the root domain may want to get notified of the exceptions
>>> occurring in that domain too, in which case we may not postpone the
>>> virtual mask fixup after the notifier invocation, otherwise we would
>>> call the handler with a broken interrupt state.
>>>
>>>> In the latter case, we would have to fiddle with the stall bits directly
>>>> instead of calling local_irq_restore - not just to work around the
>>>> BUG_ON, but also to avoid sync'ing root over potentially stalled
>>>> non-root domains...
>>>>
>>> This used to be done by ipipe_restore_pipeline_nosync() in older
>>> patches, but this one has disappeared after the flat log refactoring. We
>>> indeed need to resurrect something alike in order to reset the stall bit
>>> without calling the syncer, when taking the fast exit path after
>>> ipipe_trap_notify().
>> Hmm, so it could be fairly simple in fact:
>>
>> --- a/arch/x86/kernel/ipipe.c
>> +++ b/arch/x86/kernel/ipipe.c
>> @@ -755,7 +755,9 @@ int __ipipe_handle_exception(struct pt_r
>> #endif /* CONFIG_KGDB */
>>
>> 	if (unlikely(ipipe_trap_notify(vector, regs))) {
>> -		local_irq_restore(flags);
>> +		if (!flags)
>> +			__clear_bit(IPIPE_STALL_FLAG,
>> +				    &ipipe_root_cpudom_var(status));
>> 		return 1;
>> 	}
>>
>> Petr, ready to try?
>>
> I tried this patch and the problem (or the race condition) disappeared. ;-)
> Is there any (easy) method to recognise if the problem was solved?
>

If the following warning pops up when running your app, then the patch
just saved your day too.

diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c
index ce24db7..8b034cd 100644
--- a/arch/x86/kernel/ipipe.c
+++ b/arch/x86/kernel/ipipe.c
@@ -759,6 +759,7 @@ int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector)

 	if (unlikely(ipipe_trap_notify(vector, regs))) {
 		if (!flags)
+			WARN_ON(!ipipe_root_domain_p);
 			__clear_bit(IPIPE_STALL_FLAG,
 				    &ipipe_root_cpudom_var(status));
 		return 1;

> To your previous questions:
> We use Athlon64 X2 (2 cores, 64-bit), kubuntu 7.10 amd64.
> We have 2 real-time userspace applications: some kind of server for rtnet communication with special measuring hardware, and clients (1-4 instances) for some computing, configuration, ethernet comunication, etc. Comunication between server and clients is via named rt_queues.. Any "failing example" is perhaps impossible.
> Any attempt with IPIPE_DEBUG and tracer removes the race condition.
> 
> Thank you VERY MUCH for you help and support (all of you).
> Petr
> 
>> Jan
>>
>> -- 
>> Siemens AG, Corporate Technology, CT SE 2
>> Corporate Competence Center Embedded Linux
>>
> 
> 
> _______________________________________________
> Xenomai-help mailing list
> Xenomai-help@domain.hid
> https://mail.gna.org/listinfo/xenomai-help
> 


-- 
Philippe.


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] FPU not available
  2008-02-08 12:41             ` Petr Cervenka
  2008-02-08 13:17               ` Philippe Gerum
@ 2008-02-08 13:18               ` Philippe Gerum
  2008-02-08 15:27                 ` Petr Cervenka
  1 sibling, 1 reply; 14+ messages in thread
From: Philippe Gerum @ 2008-02-08 13:18 UTC (permalink / raw)
  To: Petr Cervenka; +Cc: jan.kiszka, xenomai

Petr Cervenka wrote:
>> Philippe Gerum wrote:
>>> Jan Kiszka wrote:
>>>> Jan Kiszka wrote:
>>>>> Gilles Chanteperdrix wrote:
>>>>>> On Wed, Feb 6, 2008 at 3:09 PM, Petr Cervenka <grugh@domain.hid> wrote:
>>>>>>> Hello.
>>>>>>>  Recently, we switched to newer distribution of linux (Kubuntu 7.10). During this switch we changed many things (Xenomai 2.4.1, linux kernel 2.6.24, x86_64 architecture, ...).
>>>>>>>  No we have problem, that in one of our tasks we are sometimes not able to use floating point operations (under very specific circumstances) . In such case, that task crashes immediately, but rest of the application runs "normaly". Output from dmesg is attached to this message. Task was created with T_FPU flag.
>>>>>>>  Is there anything we can check or change?
>>>>>>>  Petr Cervenka
>>>>>> I do not know if this is related to the issue you are facing, but the
>>>>>> first FPU fault of a thread running in primary mode may be handled by
>>>>>> Xenomai without switching to secondary mode. So, maybe the fault
>>>>>> epilogue implicitely expects Xenomai to have switched the fault to
>>>>>> secondary mode and use some secondary mode services such as
>>>>>> ipipe_restore_root, whereas the thread never leaved primary mode.
>>>>>>
>>>>> Good point! That is probably this path (and not the one I starred on):
>>>>>
>>>>> __ipipe_handle_exception()
>>>>> 	...
>>>>> 	if (unlikely(ipipe_trap_notify(vector, regs))) {
>>>>> 		local_irq_restore(flags);
>>>>> 		return 1;
>>>>> 	}
>>>>>
>>>>> That needs some more thoughts...
>>>> Looking at the whole __ipipe_handle_exception, the problem is related to
>>>>  the early, context-independent __ipipe_stall_root(). Can we postpone
>>>> this safely after having called any potential high-stage hooks for this
>>>> exception, and then only if the callee migrated the thread to the root
>>>> domain? Or is there a need to have the root domain stalled across the
>>>> post-fault migration?
>>>>
>>> Someone from the root domain may want to get notified of the exceptions
>>> occurring in that domain too, in which case we may not postpone the
>>> virtual mask fixup after the notifier invocation, otherwise we would
>>> call the handler with a broken interrupt state.
>>>
>>>> In the latter case, we would have to fiddle with the stall bits directly
>>>> instead of calling local_irq_restore - not just to work around the
>>>> BUG_ON, but also to avoid sync'ing root over potentially stalled
>>>> non-root domains...
>>>>
>>> This used to be done by ipipe_restore_pipeline_nosync() in older
>>> patches, but this one has disappeared after the flat log refactoring. We
>>> indeed need to resurrect something alike in order to reset the stall bit
>>> without calling the syncer, when taking the fast exit path after
>>> ipipe_trap_notify().
>> Hmm, so it could be fairly simple in fact:
>>
>> --- a/arch/x86/kernel/ipipe.c
>> +++ b/arch/x86/kernel/ipipe.c
>> @@ -755,7 +755,9 @@ int __ipipe_handle_exception(struct pt_r
>> #endif /* CONFIG_KGDB */
>>
>> 	if (unlikely(ipipe_trap_notify(vector, regs))) {
>> -		local_irq_restore(flags);
>> +		if (!flags)
>> +			__clear_bit(IPIPE_STALL_FLAG,
>> +				    &ipipe_root_cpudom_var(status));
>> 		return 1;
>> 	}
>>
>> Petr, ready to try?
>>
> I tried this patch and the problem (or the race condition) disappeared. ;-)
> Is there any (easy) method to recognise if the problem was solved?
> 

This one won't break the whole thing...

diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c
index ce24db7..af9d4c4 100644
--- a/arch/x86/kernel/ipipe.c
+++ b/arch/x86/kernel/ipipe.c
@@ -758,6 +758,7 @@ int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector)
 #endif /* CONFIG_KGDB */

 	if (unlikely(ipipe_trap_notify(vector, regs))) {
+		WARN_ON(!ipipe_root_domain_p);
 		if (!flags)
 			__clear_bit(IPIPE_STALL_FLAG,
 				    &ipipe_root_cpudom_var(status));

> To your previous questions:
> We use Athlon64 X2 (2 cores, 64-bit), kubuntu 7.10 amd64.
> We have 2 real-time userspace applications: some kind of server for rtnet communication with special measuring hardware, and clients (1-4 instances) for some computing, configuration, ethernet comunication, etc. Comunication between server and clients is via named rt_queues.. Any "failing example" is perhaps impossible.
> Any attempt with IPIPE_DEBUG and tracer removes the race condition.
> 
> Thank you VERY MUCH for you help and support (all of you).
> Petr
> 
>> Jan
>>
>> -- 
>> Siemens AG, Corporate Technology, CT SE 2
>> Corporate Competence Center Embedded Linux
>>
> 
> 
> _______________________________________________
> Xenomai-help mailing list
> Xenomai-help@domain.hid
> https://mail.gna.org/listinfo/xenomai-help
> 


-- 
Philippe.


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] FPU not available
  2008-02-08 13:18               ` Philippe Gerum
@ 2008-02-08 15:27                 ` Petr Cervenka
  0 siblings, 0 replies; 14+ messages in thread
From: Petr Cervenka @ 2008-02-08 15:27 UTC (permalink / raw)
  To: rpm; +Cc: jan.kiszka, xenomai


______________________________________________________________
> Od: rpm@xenomai.org
> Komu: Petr Cervenka <grugh@domain.hid>
> CC: jan.kiszka@domain.hid, xenomai@xenomai.org
> Datum: 08.02.2008 14:25
> Předmět: Re: [Xenomai-help] FPU not available
>
>Petr Cervenka wrote:
>>> Philippe Gerum wrote:
>>>> Jan Kiszka wrote:
>>>>> Jan Kiszka wrote:
>>>>>> Gilles Chanteperdrix wrote:
>>>>>>> On Wed, Feb 6, 2008 at 3:09 PM, Petr Cervenka <grugh@domain.hid>
wrote:
>>>>>>>> Hello.
>>>>>>>>  Recently, we switched to newer distribution of linux (Kubuntu
7.10). During this switch we changed many things (Xenomai 2.4.1, linux
kernel 2.6.24, x86_64 architecture, ...).
>>>>>>>>  No we have problem, that in one of our tasks we are sometimes not able to use floating point operations (under very specific circumstances) . In such case, that task crashes immediately, but rest of the application runs "normaly". Output from dmesg is attached to this message. Task was created with T_FPU flag.
>>>>>>>>  Is there anything we can check or change?
>>>>>>>>  Petr Cervenka
>>>>>>> I do not know if this is related to the issue you are facing, but the
>>>>>>> first FPU fault of a thread running in primary mode may be handled by
>>>>>>> Xenomai without switching to secondary mode. So, maybe the fault
>>>>>>> epilogue implicitely expects Xenomai to have switched the fault to
>>>>>>> secondary mode and use some secondary mode services such as
>>>>>>> ipipe_restore_root, whereas the thread never leaved primary mode.
>>>>>>>
>>>>>> Good point! That is probably this path (and not the one I starred on):
>>>>>>
>>>>>> __ipipe_handle_exception()
>>>>>> 	...
>>>>>> 	if (unlikely(ipipe_trap_notify(vector, regs))) {
>>>>>> 		local_irq_restore(flags);
>>>>>> 		return 1;
>>>>>> 	}
>>>>>>
>>>>>> That needs some more thoughts...
>>>>> Looking at the whole __ipipe_handle_exception, the problem is related to
>>>>>  the early, context-independent __ipipe_stall_root(). Can we postpone
>>>>> this safely after having called any potential high-stage hooks for this
>>>>> exception, and then only if the callee migrated the thread to the root
>>>>> domain? Or is there a need to have the root domain stalled across the
>>>>> post-fault migration?
>>>>>
>>>> Someone from the root domain may want to get notified of the exceptions
>>>> occurring in that domain too, in which case we may not postpone the
>>>> virtual mask fixup after the notifier invocation, otherwise we would
>>>> call the handler with a broken interrupt state.
>>>>
>>>>> In the latter case, we would have to fiddle with the stall bits directly
>>>>> instead of calling local_irq_restore - not just to work around the
>>>>> BUG_ON, but also to avoid sync'ing root over potentially stalled
>>>>> non-root domains...
>>>>>
>>>> This used to be done by ipipe_restore_pipeline_nosync() in older
>>>> patches, but this one has disappeared after the flat log refactoring. We
>>>> indeed need to resurrect something alike in order to reset the stall bit
>>>> without calling the syncer, when taking the fast exit path after
>>>> ipipe_trap_notify().
>>> Hmm, so it could be fairly simple in fact:
>>>
>>> --- a/arch/x86/kernel/ipipe.c
>>> +++ b/arch/x86/kernel/ipipe.c
>>> @@ -755,7 +755,9 @@ int __ipipe_handle_exception(struct pt_r
>>> #endif /* CONFIG_KGDB */
>>>
>>> 	if (unlikely(ipipe_trap_notify(vector, regs))) {
>>> -		local_irq_restore(flags);
>>> +		if (!flags)
>>> +			__clear_bit(IPIPE_STALL_FLAG,
>>> +				    &ipipe_root_cpudom_var(status));
>>> 		return 1;
>>> 	}
>>>
>>> Petr, ready to try?
>>>
>> I tried this patch and the problem (or the race condition) disappeared. ;-)
>> Is there any (easy) method to recognise if the problem was solved?
>> 
>
>This one won't break the whole thing...
>
>diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c
>index ce24db7..af9d4c4 100644
>--- a/arch/x86/kernel/ipipe.c
>+++ b/arch/x86/kernel/ipipe.c
>@@ -758,6 +758,7 @@ int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector)
> #endif /* CONFIG_KGDB */
>
> 	if (unlikely(ipipe_trap_notify(vector, regs))) {
>+		WARN_ON(!ipipe_root_domain_p);
> 		if (!flags)
> 			__clear_bit(IPIPE_STALL_FLAG,
> 				    &ipipe_root_cpudom_var(status));
>
I applied the "WARN_ON" patch and got the kernel bug again.
Dmesg output is attached. I hope it will help you this time a little bit.
It seems that the warning started to be printed and then the error happened (if I understand it well).

>> To your previous questions:
>> We use Athlon64 X2 (2 cores, 64-bit), kubuntu 7.10 amd64.
>> We have 2 real-time userspace applications: some kind of server for rtnet communication with special measuring hardware, and clients (1-4 instances) for some computing,  configuration, ethernet comunication, etc. Comunication between server and clients is via named rt_queues.. Any "failing example" is perhaps impossible.
>> Any attempt with IPIPE_DEBUG and tracer removes the race condition.
>> 
>> Thank you VERY MUCH for you help and support (all of you).
>> Petr
>> 
>>> Jan
>>>
>>> -- 
>>> Siemens AG, Corporate Technology, CT SE 2
>>> Corporate Competence Center Embedded Linux
>>>
>> 
>> 
>> _______________________________________________
>> Xenomai-help mailing list
>> Xenomai-help@domain.hid
>> https://mail.gna.org/listinfo/xenomai-help
>> 
>
>
>-- 
>Philippe.
>

[   52.570242] WARNING: at arch/x86/kernel/ipipe.c:758 __ipipe_handle_exception()
[   52.570249] Pid: 4758, comm: REG_TASK_2056 Not tainted 2.6.24-adeos #4
[   52.570251] 
[   52.570251] Call Trace:
[   52.570283] ------------[ cut here ]------------
[   52.570318] kernel BUG at kernel/ipipe/core.c:321!
[   52.570351] invalid opcode: 0000 [1] PREEMPT SMP 
[   52.570449] CPU 0 
[   52.570499] Modules linked in: rt_r8169 rtpacket rtnet rfcomm l2cap bluetooth ppdev container ac sbs sbshc dock battery lp irtty_sir sir_dev irda psmouse parport_pc parport crc_ccitt serio_raw k8temp pcspkr shpchp pci_hotplug button i2c_nforce2 i2c_core af_packet ipv6 evdev ext3 jbd mbcache sg sd_mod sata_nv ata_generic forcedeth libata amd74xx scsi_mod ide_core ehci_hcd ohci_hcd usbcore fan fuse
[   52.571610] Pid: 4758, comm: REG_TASK_2056 Not tainted 2.6.24-adeos #4
[   52.571645] RIP: 0010:[<ffffffff80278e47>]  [<ffffffff80278e47>] __ipipe_restore_root+0x47/0x50
[   52.571714] RSP: 0000:ffff81003e16bd88  EFLAGS: 00010002
[   52.571747] RAX: ffffffff8067caa0 RBX: 00000009e4457f0f RCX: 0000000000000003
[   52.571782] RDX: ffff810080993000 RSI: ffffffff8022743f RDI: 0000000000000001
[   52.571816] RBP: ffff81003e16bd88 R08: ffff810001008420 R09: 0000000000000004
[   52.571851] R10: ffff81003e16bda8 R11: 0000000000000000 R12: 0000000000000001
[   52.571886] R13: ffff81003e16a000 R14: ffff81003e16bffd R15: 0000000000000000
[   52.571921] FS:  0000000040091950(0063) GS:ffffffff805f5000(0000) knlGS:0000000000000000
[   52.571963] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   52.571997] CR2: 00002b6d57fb623d CR3: 000000003a0c5000 CR4: 00000000000006e0
[   52.572031] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   52.572066] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   52.572101] Process REG_TASK_2056 (pid: 4758, threadinfo ffff81003e16a000, task ffff810035432790)
[   52.572144] Stack:  ffff81003e16bdb8 ffffffff8023af74 ffff81003e16be98 ffffffff806777a8
[   52.572288]  ffff810080993000 ffff81003e16a000 ffff81003e16bdc8 ffffffff80273809
[   52.572411]  ffff81003e16bde8 ffffffff8027383e ffffffff8022743f ffff81003e16bf00
[   52.572509] Call Trace:
[   52.572569]  [<ffffffff8023af74>] cpu_clock+0x84/0xa0
[   52.572604]  [<ffffffff80273809>] get_timestamp+0x9/0x10
[   52.572638]  [<ffffffff8027383e>] touch_softlockup_watchdog+0x2e/0x40
[   52.572675]  [<ffffffff8022743f>] __ipipe_handle_exception+0x25f/0x270
[   52.572711]  [<ffffffff80220c5a>] touch_nmi_watchdog+0x1a/0x80
[   52.572746]  [<ffffffff8022743f>] __ipipe_handle_exception+0x25f/0x270
[   52.572783]  [<ffffffff8020def1>] print_trace_address+0x11/0x20
[   52.572818]  [<ffffffff8022743f>] __ipipe_handle_exception+0x25f/0x270
[   52.572853]  [<ffffffff8020d8ab>] dump_trace+0x10b/0x2c0
[   52.572890]  [<ffffffff80413d18>] exception_event+0x48/0x60
[   52.572925]  [<ffffffff8020daa3>] show_trace+0x43/0x60
[   52.572960]  [<ffffffff8020e1aa>] dump_stack+0x6a/0x80
[   52.572995]  [<ffffffff8022743f>] __ipipe_handle_exception+0x25f/0x270
[   52.573033]  [<ffffffff804a2c73>] error_sti+0x1e/0x52
[   52.573071] 
[   52.573100] 
[   52.573100] Code: 0f 0b eb fe 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 
[   52.573595] RIP  [<ffffffff80278e47>] __ipipe_restore_root+0x47/0x50
[   52.573650]  RSP <ffff81003e16bd88>
[   52.573690] ---[ end trace c09fed11ada7a064 ]---
[   52.573723] note: REG_TASK_2056[4758] exited with preempt_count 1 



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-02-08 15:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-08 13:03 [Xenomai-help] rt_queue with multiple listeners Petr Cervenka
2008-02-06 14:09 ` [Xenomai-help] FPU not available Petr Cervenka
2008-02-06 14:45   ` Jan Kiszka
2008-02-07 12:22     ` Petr Cervenka
2008-02-07 13:16       ` Jan Kiszka
2008-02-07 13:23   ` Gilles Chanteperdrix
2008-02-07 13:45     ` Jan Kiszka
2008-02-07 14:02       ` Jan Kiszka
2008-02-07 14:35         ` Philippe Gerum
2008-02-07 14:56           ` Jan Kiszka
2008-02-08 12:41             ` Petr Cervenka
2008-02-08 13:17               ` Philippe Gerum
2008-02-08 13:18               ` Philippe Gerum
2008-02-08 15:27                 ` Petr Cervenka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.