* [Xenomai-help] Interrupts lost during sleep / unblock cycles
@ 2007-11-27 18:44 Kyle Howell
2007-11-27 19:29 ` Gilles Chanteperdrix
0 siblings, 1 reply; 11+ messages in thread
From: Kyle Howell @ 2007-11-27 18:44 UTC (permalink / raw)
To: xenomai
I have been debugging a stall problem for a couple of days, and I think
I've put together enough info to check with the pros. Everything below
was experienced on a P4 (Celeron) running 2.6.20 / Xenomai 2.3.4. I've
also reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* reproduce
this problem on a Core2 running x86_64 2.6.22.9 / 2.4RC3.
I've reduced the problem to a fairly simple example below:
The Overview:
- Running a single real-time process with one standard thread and one RT
task
- The RT task loops on a 1sec rt_task_sleep
- The standard thread loops on nanosleep(10msec) and rt_task_unblock of
the RT task.
- When an unrelated interrupt arrives at the wrong time, the entire
system will hang until the 1sec task_sleep expires.
- After resuming, everything runs normally until another interrupt lands
at the wrong moment.
Here's the source code to the simple program:
/////////////////// Start taskTest.c ////////////////////
#include <stdio.h>
#include <time.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <native/types.h>
#include <native/task.h>
#include <native/timer.h>
#include <nucleus/trace.h>
typedef unsigned long long UINT64;
RT_TASK rtTask;
char rtName[16] = "RtThread";
#define MAX_DELAY (50000)
void RtThread(void *arg);
int main ( int argc, char *argv[])
{
int result;
struct timespec req;
struct timeval tv;
UINT64 oldTime, newTime, diffTime; // Times in microseconds
mlockall(MCL_CURRENT | MCL_FUTURE);
result = rt_task_create(&rtTask, rtName, 0, 10, 0);
result = rt_task_start(&rtTask, RtThread, NULL);
req.tv_sec = 0;
req.tv_nsec = 10000000; // 10 msec
gettimeofday(&tv, NULL);
oldTime = ((UINT64)tv.tv_sec * 1000000) + tv.tv_usec;
while(1)
{
xntrace_special(1,1);
rt_task_unblock(&rtTask);
/* Try to sleep for 10msec */
nanosleep(&req, NULL);
xntrace_special(2,2);
/* Check how much time has really passed */
gettimeofday(&tv, NULL);
newTime = ((UINT64)tv.tv_sec * 1000000) + tv.tv_usec;
diffTime = newTime - oldTime;
if(diffTime > MAX_DELAY)
{
printf("%llu - Diff time too large - %llu\n",newTime/1000000,
diffTime);
}
// Do it again
oldTime = newTime;
}
}
void RtThread(void *arg)
{
RTIME startTsc, endTsc;
printf("RtThread alive!\n");
// Nothing but 1sec sleeps
while(1)
{
startTsc = rt_timer_tsc();
xntrace_special(4,4);
rt_task_sleep(1000000000LL);
xntrace_special(8,8);
endTsc = rt_timer_tsc();
// Capture this event
if(rt_timer_tsc2ns(endTsc - startTsc) > 100000000)
xntrace_user_freeze(endTsc - startTsc, 0);
}
printf("RtThread quitting!\n");
}
/////////////////// End taskTest.c ////////////////////
This program produces output like the following:
# ./taskTest
RtThread alive!
1042677082 - Diff time too large - 1000308
1042677120 - Diff time too large - 1000631
1042677129 - Diff time too large - 1000406
1042677133 - Diff time too large - 1000395
etc...
Those are *one second* delays for the 10msec nanosleep to return.
Here's a capture from the I-Pipe tracer (note the user trace-points):
: +func -999696 0.157 __ipipe_syscall_root+0xa
(system_call+0x29)
: +func -999696 0.169 __ipipe_dispatch_event+0xe
(__ipipe_syscall_root+0x44)
:| +begin 0x80000001 -999695 0.172 __ipipe_dispatch_event+0x19a
(__ipipe_syscall_root+0x44)
:| + end 0x80000001 -999695 0.195 __ipipe_dispatch_event+0x18a
(__ipipe_syscall_root+0x44)
: + func -999695 0.154 hisyscall_event+0xe
(__ipipe_dispatch_event+0xb5)
: + func -999695 0.139 xnshadow_sys_trace+0xb
(hisyscall_event+0x185)
: + (0x01) 0x00000001 -999695 0.172 xnshadow_sys_trace+0x69
(hisyscall_event+0x185)
:| + begin 0x80000001 -999695 0.169 __ipipe_dispatch_event+0x17b
(__ipipe_syscall_root+0x44)
:| +end 0x80000001 -999694 0.206 __ipipe_dispatch_event+0x14f
(__ipipe_syscall_root+0x44)
:| +begin 0x80000001 -999694 0.161 __ipipe_syscall_root+0xeb
(system_call+0x29)
:| +end 0x80000001 -999694 0.214 __ipipe_syscall_root+0xf7
(system_call+0x29)
: #func -999694 0.169 __ipipe_unstall_iret_root+0x9
(restore_nocheck_notrace+0x0)
:| #begin 0x80000000 -999694 0.172
__ipipe_unstall_iret_root+0x7c (restore_nocheck_notrace+0x0)
:| +end 0x8000000d -999693 0.713
__ipipe_unstall_iret_root+0x2c (restore_nocheck_notrace+0x0)
: +func -999693 0.154 __ipipe_syscall_root+0xa
(system_call+0x29)
: +func -999693 0.165 __ipipe_dispatch_event+0xe
(__ipipe_syscall_root+0x44)
:| +begin 0x80000001 -999692 0.169 __ipipe_dispatch_event+0x19a
(__ipipe_syscall_root+0x44)
:| + end 0x80000001 -999692 0.195 __ipipe_dispatch_event+0x18a
(__ipipe_syscall_root+0x44)
: + func -999692 0.210 hisyscall_event+0xe
(__ipipe_dispatch_event+0xb5)
: + func -999692 0.184 __rt_task_unblock+0xc
(hisyscall_event+0x185)
: + func -999692 0.221
__copy_from_user_ll_nozero+0xa (__rt_task_unblock+0x3e)
: + func -999691 0.184 xnregistry_fetch+0x9
(__rt_task_unblock+0x46)
:| + begin 0x80000000 -999691 0.229 xnregistry_fetch+0x80
(__rt_task_unblock+0x46)
:| # func -999691 0.210
__ipipe_restore_pipeline_head+0xa (xnregistry_fetch+0x5f)
:| + end 0x80000000 -999691 0.191
__ipipe_restore_pipeline_head+0x4c (xnregistry_fetch+0x5f)
: + func -999691 0.176 rt_task_unblock+0xa
(__rt_task_unblock+0x54)
:| + begin 0x80000000 -999690 0.214 rt_task_unblock+0x68
(__rt_task_unblock+0x54)
:| # func -999690 0.184 xnpod_unblock_thread+0xb
(rt_task_unblock+0x72)
:| # func -999690 0.195 xnpod_resume_thread+0xe
(xnpod_unblock_thread+0x84)
:| # [ 868] RtThrea 10 -999690 0.187 xnpod_resume_thread+0x56
(xnpod_unblock_thread+0x84)
:| # func -999690 0.293 xntimer_do_stop_aperiodic+0xe
(xnpod_resume_thread+0x237)
:| # func -999689 0.180 xnpod_schedule+0xe
(rt_task_unblock+0x77)
:| # [ 867] taskTes -1 -999689 0.447 xnpod_schedule+0x90
(rt_task_unblock+0x77)
:| # func -999689 0.687 __switch_to+0xe
(xnpod_schedule+0x46e)
:| # [ 868] RtThrea 10 -999688 0.635 xnpod_schedule+0x546
(xnpod_suspend_thread+0x189)
:| # func -999687 0.229
__ipipe_restore_pipeline_head+0xa (xnpod_suspend_thread+0xa6)
:| + end 0x80000000 -999687 0.469
__ipipe_restore_pipeline_head+0x4c (xnpod_suspend_thread+0xa6)
:| + begin 0x80000001 -999687 0.274 __ipipe_dispatch_event+0x17b
(__ipipe_syscall_root+0x44)
:| + end 0x80000001 -999686 0.943 __ipipe_dispatch_event+0x14f
(__ipipe_syscall_root+0x44)
**************** This ethernet interrupt appears to trigger the problem
*****************
:| + begin 0xffffffee -999686 0.191 common_interrupt+0x29
(__ipipe_dispatch_event+0x151)
:| + func -999685 0.435 __ipipe_handle_irq+0xe
(common_interrupt+0x2e)
:| + func -999685 0.398 __ipipe_ack_irq+0x8
(__ipipe_handle_irq+0x149)
:| + func -999685 0.251 __ipipe_ack_fasteoi_irq+0x8
(__ipipe_ack_irq+0x19)
:| + func -999684 0.221 __ipipe_walk_pipeline+0xe
(__ipipe_handle_irq+0x7b)
:| + end 0xffffffee -999684+ 1.476 common_interrupt+0x38
(__ipipe_dispatch_event+0x151)
: + func -999683 0.191 __ipipe_syscall_root+0xa
(system_call+0x29)
: + func -999682 0.191 __ipipe_dispatch_event+0xe
(__ipipe_syscall_root+0x44)
:| + begin 0x80000001 -999682 0.206 __ipipe_dispatch_event+0x19a
(__ipipe_syscall_root+0x44)
:| + end 0x80000001 -999682 0.214 __ipipe_dispatch_event+0x18a
(__ipipe_syscall_root+0x44)
: + func -999682 0.221 hisyscall_event+0xe
(__ipipe_dispatch_event+0xb5)
: + func -999682 0.195 xnshadow_sys_trace+0xb
(hisyscall_event+0x185)
: + (0x08) 0x00000008 -999681 0.202 xnshadow_sys_trace+0x69
(hisyscall_event+0x185)
:| + begin 0x80000001 -999681 0.199 __ipipe_dispatch_event+0x17b
(__ipipe_syscall_root+0x44)
:| + end 0x80000001 -999681 0.931 __ipipe_dispatch_event+0x14f
(__ipipe_syscall_root+0x44)
: + func -999680 0.150 __ipipe_syscall_root+0xa
(system_call+0x29)
: + func -999680 0.165 __ipipe_dispatch_event+0xe
(__ipipe_syscall_root+0x44)
:| + begin 0x80000001 -999680 0.169 __ipipe_dispatch_event+0x19a
(__ipipe_syscall_root+0x44)
:| + end 0x80000001 -999680 0.184 __ipipe_dispatch_event+0x18a
(__ipipe_syscall_root+0x44)
: + func -999679 0.187 hisyscall_event+0xe
(__ipipe_dispatch_event+0xb5)
: + func -999679 0.150 xnshadow_sys_trace+0xb
(hisyscall_event+0x185)
: + (0x04) 0x00000004 -999679 0.176 xnshadow_sys_trace+0x69
(hisyscall_event+0x185)
:| + begin 0x80000001 -999679 0.195 __ipipe_dispatch_event+0x17b
(__ipipe_syscall_root+0x44)
:| + end 0x80000001 -999679 0.736 __ipipe_dispatch_event+0x14f
(__ipipe_syscall_root+0x44)
: + func -999678 0.150 __ipipe_syscall_root+0xa
(system_call+0x29)
: + func -999678 0.165 __ipipe_dispatch_event+0xe
(__ipipe_syscall_root+0x44)
:| + begin 0x80000001 -999678 0.172 __ipipe_dispatch_event+0x19a
(__ipipe_syscall_root+0x44)
:| + end 0x80000001 -999677 0.184 __ipipe_dispatch_event+0x18a
(__ipipe_syscall_root+0x44)
: + func -999677 0.244 hisyscall_event+0xe
(__ipipe_dispatch_event+0xb5)
: + func -999677 0.184 __rt_task_sleep+0xb
(hisyscall_event+0x185)
: + func -999677 0.330
__copy_from_user_ll_nozero+0xa (__rt_task_sleep+0x1a)
: + func -999676 0.195 rt_task_sleep+0xe
(__rt_task_sleep+0x25)
: + func -999676 0.191 xnpod_suspend_thread+0xe
(rt_task_sleep+0x73)
:| + begin 0x80000000 -999676 0.244 xnpod_suspend_thread+0x13f
(rt_task_sleep+0x73)
:| # func -999676 0.480
xntimer_do_start_aperiodic+0xe (xnpod_suspend_thread+0x173)
:| # func -999675 0.202 xnpod_schedule+0xe
(xnpod_suspend_thread+0x189)
:| # [ 868] RtThrea 10 -999675 0.417 xnpod_schedule+0x90
(xnpod_suspend_thread+0x189)
:| # func -999675 0.567 __switch_to+0xe
(xnpod_schedule+0x46e)
************** Hey! Shouldn't that stalled interrupt have been serviced
here? **************
:| # [ 867] taskTes -1 -999674 0.608 xnpod_schedule+0x546
(rt_task_unblock+0x77)
:| # func -999674 0.225
__ipipe_restore_pipeline_head+0xa (rt_task_unblock+0x57)
:| + end 0x80000000 -999673 0.353
__ipipe_restore_pipeline_head+0x4c (rt_task_unblock+0x57)
:| + begin 0x80000001 -999673 0.281 __ipipe_dispatch_event+0x17b
(__ipipe_syscall_root+0x44)
:| +end 0x80000001 -999673 0.353 __ipipe_dispatch_event+0x14f
(__ipipe_syscall_root+0x44)
:| +begin 0x80000001 -999672 0.199 __ipipe_syscall_root+0xeb
(system_call+0x29)
:| +end 0x80000001 -999672 0.274 __ipipe_syscall_root+0xf7
(system_call+0x29)
: #func -999672 0.210 __ipipe_unstall_iret_root+0x9
(restore_nocheck_notrace+0x0)
:| #begin 0x80000000 -999672 0.210
__ipipe_unstall_iret_root+0x7c (restore_nocheck_notrace+0x0)
:| +end 0x8000000d -999671 0.969
__ipipe_unstall_iret_root+0x2c (restore_nocheck_notrace+0x0)
: +func -999670 0.293 __ipipe_syscall_root+0xa
(sysenter_past_esp+0x46)
: +func -999670 0.184 sys_nanosleep+0xc
(sysenter_past_esp+0x6e)
: +func -999670 0.176 copy_from_user+0xb
(sys_nanosleep+0x1e)
: +func -999670 0.326 __copy_from_user_ll+0xa
(copy_from_user+0x35)
: +func -999670 0.172 hrtimer_nanosleep+0xe
(sys_nanosleep+0x5c)
: +func -999669 0.225 hrtimer_init+0xb
(hrtimer_nanosleep+0x22)
: +func -999669 0.202 do_nanosleep+0xd
(hrtimer_nanosleep+0x41)
: +func -999669 0.210 hrtimer_init_sleeper+0x8
(do_nanosleep+0x1d)
: +func -999669 0.184 hrtimer_start+0xe
(do_nanosleep+0x4d)
: #func -999669 0.232 ktime_get+0xd
(hrtimer_start+0xbc)
: #func -999668 0.172 ktime_get_ts+0xa
(ktime_get+0x17)
: #func -999668 0.172 getnstimeofday+0xe
(ktime_get_ts+0x19)
: #func -999668 0.229 read_tsc+0x8
(getnstimeofday+0x34)
: #func -999668 0.214 set_normalized_timespec+0x8
(ktime_get_ts+0x40)
: #func -999667 0.229 enqueue_hrtimer+0xb
(hrtimer_start+0x67)
: #func -999667 0.202 rb_insert_color+0xe
(enqueue_hrtimer+0x56)
: #func -999667 0.202 __ipipe_restore_root+0x8
(hrtimer_start+0x6f)
: #func -999667 0.232 __ipipe_unstall_root+0x8
(__ipipe_restore_root+0x1b)
:| #begin 0x80000000 -999667 0.266 __ipipe_unstall_root+0x41
(__ipipe_restore_root+0x1b)
:| +end 0x80000000 -999666 0.263 __ipipe_unstall_root+0x33
(__ipipe_restore_root+0x1b)
: +func -999666 0.199 __sched_text_start+0xe
(do_nanosleep+0x52)
: +func -999666 0.417 sched_clock+0xa
(__sched_text_start+0x85)
: #func -999665 0.206 deactivate_task+0x9
(__sched_text_start+0x35c)
: #func -999665 0.319 dequeue_task+0xa
(deactivate_task+0x1e)
:| #begin 0x80000000 -999665 0.232 __sched_text_start+0x414
(do_nanosleep+0x52)
:| #func -999665 0.300 __switch_to+0xe
(__sched_text_start+0x2e7)
:| #end 0x80000000 -999664 0.202 __sched_text_start+0x403
(cpu_idle+0x64)
: #func -999664 0.217 __ipipe_unstall_root+0x8
(__sched_text_start+0x37b)
:| #begin 0x80000000 -999664 0.180 __ipipe_unstall_root+0x41
(__sched_text_start+0x37b)
:| +end 0x80000000 -999664 0.278 __ipipe_unstall_root+0x33
(__sched_text_start+0x37b)
: +func -999664 0.214 ipipe_suspend_domain+0xe
(cpu_idle+0x44)
:| +begin 0x80000001 -999663 0.236 ipipe_suspend_domain+0x9d
(cpu_idle+0x44)
:| +end 0x80000001 -999663 0.214 ipipe_suspend_domain+0xac
(cpu_idle+0x44)
: +func -999663 0.169 mwait_idle+0x8
(cpu_idle+0x46)
: +func -999663 0.165 __ipipe_unstall_root+0x8
(mwait_idle+0xd)
:| +begin 0x80000000 -999663 0.225 __ipipe_unstall_root+0x41
(mwait_idle+0xd)
:| +end 0x80000000 -999662 0.184 __ipipe_unstall_root+0x33
(mwait_idle+0xd)
: +func -999662! 999575.071 mwait_idle_with_hints+0xb
(mwait_idle+0x16)
************ WHOOPS! No Linux timer irqs, no ethernet irqs, no nothing
***************
:| +begin 0xffffff16 -87 0.195 ipipe_ipi3+0x2e
(mwait_idle_with_hints+0x3e)
:| +func -87 0.360 __ipipe_handle_irq+0xe
(ipipe_ipi3+0x33)
:| +func -86 0.229 __ipipe_ack_apic+0x8
(__ipipe_handle_irq+0xa6)
:| +func -86 0.526 __ipipe_dispatch_wired+0xb
(__ipipe_handle_irq+0x62)
:| # func -86 0.364 xnintr_clock_handler+0x8
(__ipipe_dispatch_wired+0x85)
:| # func -85 0.326 xnintr_irq_handler+0xe
(xnintr_clock_handler+0x17)
:| # func -85 0.172 xnpod_announce_tick+0x8
(xnintr_irq_handler+0x3b)
:| # func -85 0.661 xntimer_do_tick_aperiodic+0xe
(xnpod_announce_tick+0xf)
:| +func -84 0.244 __ipipe_walk_pipeline+0xe
(__ipipe_handle_irq+0x7b)
:| + func -84 0.187 ipipe_suspend_domain+0xe
(__ipipe_walk_pipeline+0xa0)
:| + func -84 0.248 __ipipe_sync_stage+0xe
(ipipe_suspend_domain+0x63)
:| # end 0x80000000 -84 0.244 __ipipe_sync_stage+0x13c
(ipipe_suspend_domain+0x63)
: # func -83 0.184 shield_handler+0x8
(__ipipe_sync_stage+0xbf)
: # func -83 0.199 __ipipe_schedule_irq+0xe
(shield_handler+0x19)
:| # begin 0x80000001 -83 0.409 __ipipe_schedule_irq+0xdf
(shield_handler+0x19)
:| # end 0x80000001 -82 0.251 __ipipe_schedule_irq+0x10f
(shield_handler+0x19)
:| # begin 0x80000000 -82 0.319 __ipipe_sync_stage+0x130
(ipipe_suspend_domain+0x63)
:| +func -82 0.206 __ipipe_sync_stage+0xe
(ipipe_suspend_domain+0x63)
:| #end 0x80000000 -82 0.240 __ipipe_sync_stage+0x13c
(ipipe_suspend_domain+0x63)
*********** Ahh.. Now we're gonna take care of that old interrupt
***********
: #func -81 0.221 do_IRQ+0xd
(__ipipe_sync_stage+0x16f)
: #func -81 0.296 handle_fasteoi_irq+0xb
(do_IRQ+0x47)
: #func -81 0.210 handle_IRQ_event+0xe
(handle_fasteoi_irq+0x6d)
: #func -81 0.169 __ipipe_unstall_root+0x8
(handle_IRQ_event+0x65)
:| #begin 0x80000000 -81 0.176 __ipipe_unstall_root+0x41
(handle_IRQ_event+0x65)
:| +end 0x80000000 -80 0.221 __ipipe_unstall_root+0x33
(handle_IRQ_event+0x65)
: +func -80 0.338 usb_hcd_irq+0x9
(handle_IRQ_event+0x30)
: +func -80+ 1.074 uhci_irq+0xe (usb_hcd_irq+0x2a)
: +func -79 0.326 rtl8139_interrupt+0xe
(handle_IRQ_event+0x30)
: +func -78 0.950 ioread16+0x8
(rtl8139_interrupt+0x3d)
: +func -77 0.172 iowrite16+0x8
(rtl8139_interrupt+0xe4)
: +func -77 0.845 ioread16+0x8
(rtl8139_interrupt+0xeb)
: +func -76 0.176 __netif_rx_schedule+0xa
(rtl8139_interrupt+0xf3)
: #func -76 0.172 __ipipe_restore_root+0x8
(__netif_rx_schedule+0x6b)
: #func -76 0.206 __ipipe_unstall_root+0x8
(__ipipe_restore_root+0x1b)
:| #begin 0x80000000 -76 0.180 __ipipe_unstall_root+0x41
(__ipipe_restore_root+0x1b)
:| +end 0x80000000 -76 0.278 __ipipe_unstall_root+0x33
(__ipipe_restore_root+0x1b)
: #func -75 0.195 note_interrupt+0xe
(handle_fasteoi_irq+0xb3)
: #func -75 0.943 ack_ioapic_quirk_irq+0xa
(handle_fasteoi_irq+0x91)
:| #begin 0xffffffff -74 0.236 common_interrupt+0x29
(handle_fasteoi_irq+0xa4)
:| #func -74 0.248 __ipipe_handle_irq+0xe
(common_interrupt+0x2e)
:| #func -74 0.225 __ipipe_ack_irq+0x8
(__ipipe_handle_irq+0x149)
:| #func -74 0.169 __ipipe_ack_edge_irq+0x8
(__ipipe_ack_irq+0x19)
:| #func -73 0.214 ack_ioapic_irq+0x8
(__ipipe_ack_edge_irq+0xe)
:| #func -73 0.187 __ipipe_walk_pipeline+0xe
(__ipipe_handle_irq+0x7b)
:| +*func -73 0.172 ipipe_suspend_domain+0xe
(__ipipe_walk_pipeline+0xa0)
:| +*func -73 0.187 __ipipe_sync_stage+0xe
(ipipe_suspend_domain+0x63)
:| #*end 0x80000000 -73 0.199 __ipipe_sync_stage+0x13c
(ipipe_suspend_domain+0x63)
: #*func -72 0.154 shield_handler+0x8
(__ipipe_sync_stage+0xbf)
: #*func -72 0.172 __ipipe_schedule_irq+0xe
(shield_handler+0x19)
:| #*begin 0x80000001 -72 0.206 __ipipe_schedule_irq+0xdf
(shield_handler+0x19)
:| #*end 0x80000001 -72 0.244 __ipipe_schedule_irq+0x10f
(shield_handler+0x19)
:| #*begin 0x80000000 -72 0.353 __ipipe_sync_stage+0x130
(ipipe_suspend_domain+0x63)
:| #end 0xffffffff -71 0.462 common_interrupt+0x38
(handle_fasteoi_irq+0xa4)
: #func -71 0.210 irq_exit+0x8 (do_IRQ+0x4c)
Etc...
So I seem to see what's causing the problem now, but surfing the code to
find the bug might take a while. Anyone out there already know what this
one is? Let me know if you need more info. Also, I'd be interested to
hear if anyone else can reproduce this on their system.
Thanks,
Kyle Howell
=================================
This email and any files transmitted with it are
confidential and intended solely for the use of the
named recipient or recipients. If you have received
this email in error please notify the sender
immediately.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles 2007-11-27 18:44 [Xenomai-help] Interrupts lost during sleep / unblock cycles Kyle Howell @ 2007-11-27 19:29 ` Gilles Chanteperdrix 2007-11-27 19:53 ` Kyle Howell 0 siblings, 1 reply; 11+ messages in thread From: Gilles Chanteperdrix @ 2007-11-27 19:29 UTC (permalink / raw) To: Kyle Howell; +Cc: xenomai Kyle Howell wrote: > I have been debugging a stall problem for a couple of days, and I think > I've put together enough info to check with the pros. Everything below > was experienced on a P4 (Celeron) running 2.6.20 / Xenomai 2.3.4. I've > also reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* reproduce > this problem on a Core2 running x86_64 2.6.22.9 / 2.4RC3. > > I've reduced the problem to a fairly simple example below: > > The Overview: > - Running a single real-time process with one standard thread and one RT > task > - The RT task loops on a 1sec rt_task_sleep > - The standard thread loops on nanosleep(10msec) and rt_task_unblock of > the RT task. > - When an unrelated interrupt arrives at the wrong time, the entire > system will hang until the 1sec task_sleep expires. > - After resuming, everything runs normally until another interrupt lands > at the wrong moment. Do you observe the same behaviour without the interrupt shield ? -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles 2007-11-27 19:29 ` Gilles Chanteperdrix @ 2007-11-27 19:53 ` Kyle Howell 2007-11-27 20:08 ` Gilles Chanteperdrix 0 siblings, 1 reply; 11+ messages in thread From: Kyle Howell @ 2007-11-27 19:53 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai > > I have been debugging a stall problem for a couple of > days, and I think > I've put together enough info to check > with the pros. Everything below > was experienced on a P4 > (Celeron) running 2.6.20 / Xenomai 2.3.4. I've > also > reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* > reproduce > this problem on a Core2 running x86_64 2.6.22.9 > / 2.4RC3. > > > > I've reduced the problem to a fairly simple example below: > > > > The Overview: > > - Running a single real-time process with one standard > thread and one RT > task > - The RT task loops on a 1sec > rt_task_sleep > - The standard thread loops on > nanosleep(10msec) and rt_task_unblock of > the RT task. > > - When an unrelated interrupt arrives at the wrong time, > the entire > system will hang until the 1sec task_sleep expires. > > - After resuming, everything runs normally until another > interrupt lands > at the wrong moment. > > Do you observe the same behaviour without the interrupt shield ? It doesn't appear so. I'll have to let it run longer to be 100% sure, but the usual stressing isn't causing the problem. That's not expected behavior with the interrupt shield, is it? -- Kyle Howell ================================= This email and any files transmitted with it are confidential and intended solely for the use of the named recipient or recipients. If you have received this email in error please notify the sender immediately. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles 2007-11-27 19:53 ` Kyle Howell @ 2007-11-27 20:08 ` Gilles Chanteperdrix 2007-11-27 20:21 ` Kyle Howell 2007-11-28 4:59 ` Kyle Howell 0 siblings, 2 replies; 11+ messages in thread From: Gilles Chanteperdrix @ 2007-11-27 20:08 UTC (permalink / raw) To: Kyle Howell; +Cc: xenomai Kyle Howell wrote: > > > I have been debugging a stall problem for a couple of > > days, and I think > I've put together enough info to check > > with the pros. Everything below > was experienced on a P4 > > (Celeron) running 2.6.20 / Xenomai 2.3.4. I've > also > > reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* > > reproduce > this problem on a Core2 running x86_64 2.6.22.9 > > / 2.4RC3. > > > > > > I've reduced the problem to a fairly simple example below: > > > > > > The Overview: > > > - Running a single real-time process with one standard > > thread and one RT > task > - The RT task loops on a 1sec > > rt_task_sleep > - The standard thread loops on > > nanosleep(10msec) and rt_task_unblock of > the RT task. > > > - When an unrelated interrupt arrives at the wrong time, > > the entire > system will hang until the 1sec task_sleep expires. > > > - After resuming, everything runs normally until another > > interrupt lands > at the wrong moment. > > > > Do you observe the same behaviour without the interrupt shield ? > > It doesn't appear so. I'll have to let it run longer to be 100% sure, > but the usual stressing isn't causing the problem. That's not expected > behavior with the interrupt shield, is it? No, it is not an expected behavior. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles 2007-11-27 20:08 ` Gilles Chanteperdrix @ 2007-11-27 20:21 ` Kyle Howell 2007-11-27 21:10 ` Gilles Chanteperdrix 2007-11-28 4:59 ` Kyle Howell 1 sibling, 1 reply; 11+ messages in thread From: Kyle Howell @ 2007-11-27 20:21 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai > > > > I have been debugging a stall problem for a couple of > > > days, and I think > I've put together enough info to check > > > with the pros. Everything below > was experienced on a P4 > > > (Celeron) running 2.6.20 / Xenomai 2.3.4. I've > also > > > reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* > > > reproduce > this problem on a Core2 running x86_64 2.6.22.9 > > > / 2.4RC3. > > > > > > > > I've reduced the problem to a fairly simple example below: > > > > > > > > The Overview: > > > > - Running a single real-time process with one standard > > > thread and one RT > task > - The RT task loops on a 1sec > > > rt_task_sleep > - The standard thread loops on > > > nanosleep(10msec) and rt_task_unblock of > the RT task. > > > > - When an unrelated interrupt arrives at the wrong time, > > > the entire > system will hang until the 1sec task_sleep expires. > > > > - After resuming, everything runs normally until another > > > interrupt lands > at the wrong moment. > > > > > > Do you observe the same behaviour without the interrupt shield ? > > > > It doesn't appear so. I'll have to let it run longer to be > 100% sure, > > but the usual stressing isn't causing the problem. That's > not expected > > behavior with the interrupt shield, is it? > > No, it is not an expected behavior. Well, that's good news. Still no stalls without the IShield, so that's certainly narrowed it down. Another note: I'm currently using the IPipe 1.8-08 that is packaged with Xenomai 2.3.4. Do you expect a change if I grab 1.10-12 or 1.11-00? Are the Xenomai releases tightly coupled to a particular I-Pipe version (had problems with this in the past)? BTW, thanks for being so responsive. Just focusing on the IShield has already been a tremendous help. -- Kyle Howell ================================= This email and any files transmitted with it are confidential and intended solely for the use of the named recipient or recipients. If you have received this email in error please notify the sender immediately. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles 2007-11-27 20:21 ` Kyle Howell @ 2007-11-27 21:10 ` Gilles Chanteperdrix 2007-11-30 16:41 ` Philippe Gerum 0 siblings, 1 reply; 11+ messages in thread From: Gilles Chanteperdrix @ 2007-11-27 21:10 UTC (permalink / raw) To: Kyle Howell; +Cc: xenomai Kyle Howell wrote: > > > > > I have been debugging a stall problem for a couple of > > > > days, and I think > I've put together enough info to check > > > > with the pros. Everything below > was experienced on a P4 > > > > (Celeron) running 2.6.20 / Xenomai 2.3.4. I've > also > > > > reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* > > > > reproduce > this problem on a Core2 running x86_64 2.6.22.9 > > > > / 2.4RC3. > > > > > > > > > > I've reduced the problem to a fairly simple example below: > > > > > > > > > > The Overview: > > > > > - Running a single real-time process with one standard > > > > thread and one RT > task > - The RT task loops on a 1sec > > > > rt_task_sleep > - The standard thread loops on > > > > nanosleep(10msec) and rt_task_unblock of > the RT task. > > > > > - When an unrelated interrupt arrives at the wrong time, > > > > the entire > system will hang until the 1sec task_sleep expires. > > > > > - After resuming, everything runs normally until another > > > > interrupt lands > at the wrong moment. > > > > > > > > Do you observe the same behaviour without the interrupt shield ? > > > > > > It doesn't appear so. I'll have to let it run longer to be > > 100% sure, > > > but the usual stressing isn't causing the problem. That's > > not expected > > > behavior with the interrupt shield, is it? > > > > No, it is not an expected behavior. > > Well, that's good news. Still no stalls without the IShield, so that's > certainly narrowed it down. > > Another note: I'm currently using the IPipe 1.8-08 that is packaged with > Xenomai 2.3.4. Do you expect a change if I grab 1.10-12 or 1.11-00? Are > the Xenomai releases tightly coupled to a particular I-Pipe version (had > problems with this in the past)? Usually, a Xenomai version is compatible with past I-pipe releases. But you should expect problems using a new I-pipe release with an older version of Xenomai. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles 2007-11-27 21:10 ` Gilles Chanteperdrix @ 2007-11-30 16:41 ` Philippe Gerum 2007-11-30 17:03 ` Gilles Chanteperdrix 0 siblings, 1 reply; 11+ messages in thread From: Philippe Gerum @ 2007-11-30 16:41 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai Gilles Chanteperdrix wrote: > Kyle Howell wrote: > > > > > > I have been debugging a stall problem for a couple of > > > > > days, and I think > I've put together enough info to check > > > > > with the pros. Everything below > was experienced on a P4 > > > > > (Celeron) running 2.6.20 / Xenomai 2.3.4. I've > also > > > > > reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* > > > > > reproduce > this problem on a Core2 running x86_64 2.6.22.9 > > > > > / 2.4RC3. > > > > > > > > > > > > I've reduced the problem to a fairly simple example below: > > > > > > > > > > > > The Overview: > > > > > > - Running a single real-time process with one standard > > > > > thread and one RT > task > - The RT task loops on a 1sec > > > > > rt_task_sleep > - The standard thread loops on > > > > > nanosleep(10msec) and rt_task_unblock of > the RT task. > > > > > > - When an unrelated interrupt arrives at the wrong time, > > > > > the entire > system will hang until the 1sec task_sleep expires. > > > > > > - After resuming, everything runs normally until another > > > > > interrupt lands > at the wrong moment. > > > > > > > > > > Do you observe the same behaviour without the interrupt shield ? > > > > > > > > It doesn't appear so. I'll have to let it run longer to be > > > 100% sure, > > > > but the usual stressing isn't causing the problem. That's > > > not expected > > > > behavior with the interrupt shield, is it? > > > > > > No, it is not an expected behavior. > > > > Well, that's good news. Still no stalls without the IShield, so that's > > certainly narrowed it down. > > > > Another note: I'm currently using the IPipe 1.8-08 that is packaged with > > Xenomai 2.3.4. Do you expect a change if I grab 1.10-12 or 1.11-00? Are > > the Xenomai releases tightly coupled to a particular I-Pipe version (had > > problems with this in the past)? > > Usually, a Xenomai version is compatible with past I-pipe releases. But > you should expect problems using a new I-pipe release with an older > version of Xenomai. > To be more specific about this: we try really, really, really, awfully and painfully hard to keep recent I-pipe patches compatible with (reasonably) older Xenomai releases. No kidding. You may have noticed that the I-pipe API has been quite stable over time for that particular reason, and when we have to break it, there is most often some built-in compat code. The rule of thumb is: if the fully patched kernel compiles properly (I-pipe + Xenomai), then this should work, for two reasons: first, externally visible changes in some I-pipe release usually come with wrappers to please older code, and second, we are careful in not changing the semantics of existing calls even in subtle ways without also forcing a syntactical change to make sure the issue is noticed downstream, or at least provide a sane wrapper. In the former case, a compilation error should warn you, at least. Sometimes the generic part of the interrupt pipelining engine has to be changed (e.g. recent "flat log" update), and this may have consequences on the arch-dep I-pipe core interfaced with it, but in such a case, problems have to be solved at I-pipe level, and should not leak to the Xenomai space anyway. In the x86 case, we have a particular situation due to mainline being largely in a state of flux wrt some of its core layers since ages. As a result of this: - post-2.6.20 kernels won't work with Xenomai 2.3.x, because the Linux clock/timer infrastructure has changed dramatically since then, in a way that required a significant refactoring of the core Xenomai code for x86, i.e. no wrapping possible. For this reason, there has been no I-pipe support for 2.6.21/x86, and we directly jumped to 2.6.22/x86. Said differently, supporting the new generic clock event layer required significant surgery in both the I-pipe and Xenomai code. - the latest I-pipe patch for 2.6.23/x86 broke the Adeos API, specifically regarding the very recent ipipe_request_tickdev() service - which depends on the above mainline change - but since you can't use 2.6.23 with Xenomai 2.3.x, this should not be a big deal for existing production setups. OTOH, v2.4-rc7 and on will still accept older kernels, even if you may want to run them preferably over 2.6.23 and beyond. Other archs were not impacted since this service is only defined for x86 for now. - Because some people may not want to upgrade to 2.6.22+, most improvements and fixes available with the latest I-pipe releases for recent kernels have been backported to 2.6.20/x86. This patch will work with both 2.3.x and 2.4 Xenomai releases. The same goes for powerpc32. Sometimes, backward compatibility is not a sane option though. For instance, the ipipe_tune_timer() service has been removed months ago from newer patches with no replacement, because it put the burden of managing periodic timing on the shoulders of the I-pipe, albeit this should be the client code's business only. This caused hairy code to be needed in order to port the I-pipe to other archs, with no actual upside, since managing periodic timing is way more efficient when done from the upper layers, e.g. Xenomai. HTH, -- Philippe. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles 2007-11-30 16:41 ` Philippe Gerum @ 2007-11-30 17:03 ` Gilles Chanteperdrix 2007-11-30 17:23 ` Philippe Gerum 0 siblings, 1 reply; 11+ messages in thread From: Gilles Chanteperdrix @ 2007-11-30 17:03 UTC (permalink / raw) To: rpm; +Cc: xenomai On Nov 30, 2007 5:41 PM, Philippe Gerum <rpm@xenomai.org> wrote: > Gilles Chanteperdrix wrote: > > Usually, a Xenomai version is compatible with past I-pipe releases. But > > you should expect problems using a new I-pipe release with an older > > version of Xenomai. > > > > To be more specific about this: we try really, really, really, awfully > and painfully hard to keep recent I-pipe patches compatible with > (reasonably) older Xenomai releases. No kidding. You may have noticed > that the I-pipe API has been quite stable over time for that particular > reason, and when we have to break it, there is most often some built-in > compat code. Then, I really have a problem with the newer ARM I-pipe patches, the ones that no longer shut irqs over the mm switch, because they will probably compile with a Xenomai 2.2.x, but will not work properly. The only way I see to work around this is to make the ARM patch depend on a CONFIG_ symbol which would be set by the newer Xenomai. -- Gilles Chanteperdrix ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles 2007-11-30 17:03 ` Gilles Chanteperdrix @ 2007-11-30 17:23 ` Philippe Gerum 0 siblings, 0 replies; 11+ messages in thread From: Philippe Gerum @ 2007-11-30 17:23 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai Gilles Chanteperdrix wrote: > On Nov 30, 2007 5:41 PM, Philippe Gerum <rpm@xenomai.org> wrote: >> Gilles Chanteperdrix wrote: >>> Usually, a Xenomai version is compatible with past I-pipe releases. But >>> you should expect problems using a new I-pipe release with an older >>> version of Xenomai. >>> >> To be more specific about this: we try really, really, really, awfully >> and painfully hard to keep recent I-pipe patches compatible with >> (reasonably) older Xenomai releases. No kidding. You may have noticed >> that the I-pipe API has been quite stable over time for that particular >> reason, and when we have to break it, there is most often some built-in >> compat code. > > Then, I really have a problem with the newer ARM I-pipe patches, the > ones that no longer shut irqs over the mm switch, because they will > probably compile with a Xenomai 2.2.x, but will not work properly. > Well, this is the purpose of "reasonably older" in the sentence. 2.2.x is already a bit far. 2.3.x is a more reasonable target, particularly because a shiny new I-pipe patch without all core fixes that went over time into an entire major Xenomai milestone + maintenance time would not bring that much. The mm change you mentioned require core surgery in Xenomai to be compatible, but this was not an API issue. I'm not sure such kind of changes could ever be detected sanely in older code, since they don't affect the external interfaces, but require both the I-pipe and Xenomai cores to agree on interrupt management. This is a grey area, but not due to API changes. > The only way I see to work around this is to make the ARM patch depend > on a CONFIG_ symbol which would be set by the newer Xenomai. > No, we can't do that. We have to admit that sometimes backward compat is just not possible, unless we start doing really braindamage things. I'm doing enough silly mistakes unwillingly without wanting to add more of them deliberately... -- Philippe. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles 2007-11-27 20:08 ` Gilles Chanteperdrix 2007-11-27 20:21 ` Kyle Howell @ 2007-11-28 4:59 ` Kyle Howell 2007-11-28 10:57 ` Philippe Gerum 1 sibling, 1 reply; 11+ messages in thread From: Kyle Howell @ 2007-11-28 4:59 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai > > > > I have been debugging a stall problem for a couple of > > > days, and I think > I've put together enough info to check > > > with the pros. Everything below > was experienced on a P4 > > > (Celeron) running 2.6.20 / Xenomai 2.3.4. I've > also > > > reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* > > > reproduce > this problem on a Core2 running x86_64 2.6.22.9 > > > / 2.4RC3. > > > > > > > > I've reduced the problem to a fairly simple example below: > > > > > > > > The Overview: > > > > - Running a single real-time process with one standard > > > thread and one RT > task > - The RT task loops on a 1sec > > > rt_task_sleep > - The standard thread loops on > > > nanosleep(10msec) and rt_task_unblock of > the RT task. > > > > - When an unrelated interrupt arrives at the wrong time, > > > the entire > system will hang until the 1sec task_sleep expires. > > > > - After resuming, everything runs normally until another > > > interrupt lands > at the wrong moment. > > > > > > Do you observe the same behaviour without the interrupt shield ? > > > > It doesn't appear so. I'll have to let it run longer to be > 100% sure, > > but the usual stressing isn't causing the problem. That's > not expected > > behavior with the interrupt shield, is it? > > No, it is not an expected behavior. > After considerable staring and code surfing, I think I have an idea of what's happening. There are still enough parts of the code I don't fully undertand that I'm not positive, though. Check this theory out for me: Flow of events when it works: 1. Process running in root domain. 2. Interrupt fires, IShield pending bit set. 3. ipipe_walk_pipeline calls IShield handler. 4. IShield propagates interrupt to root domain. 5. Root domain finishes restoring the APIC. 6. Everything continues as expected. - or - 1. Process running in Xenomai domain. 2. Interrupt fires, IShield pending bit set. 3. ipipe_walk_pipeline resumes high-priority Xenomai domain. 4. Xenomai domain finishes and suspends. 3. ipipe_walk_pipeline calls IShield handler. 4. IShield propagates interrupt to root domain. 5. Root domain finishes restoring the APIC. 6. Everything continues as expected. Flow of events when it fails: 1. Process running in root domain, makes syscall *requiring Xenomai domain*. 2. Thread is temporarily promoted to Xenomai domain to execute syscall. 3. (Optional) Syscall results in another Xenomai task gaining control. 3. Interrupt fires, IShield pending bit set. 4. ipipe_walk_pipeline resumes high-priority Xenomai domain. 5. (Optional) Other Xenomai task completes, promoted syscall resumes. 6. Syscall returns to root domain, never calling ipipe_sync_pipeline on IShield domain. 7. Root domain sleeps without ever restoring the APIC. 8. System hangs until event-timer fires for Xenomai task. 9. Xenomai task finishes and suspends. 10. ipipe_walk_pipeline calls Ishield handler. 11. IShield propagates interrupt to root domain. 12. Root domain finishes restoring the APIC. 13. Everything continues as expected. To put it in a sentence, it looks like there's a loop-hole where a promoted syscall can get back to the root domain without the intermediate domains being checked for pending interrupts. The propagate logic in ipipe_dispatch_event *seems* like it would take care of this, but I'm guessing I'm making bad assumptions about variables. To force quick reproduction of this problem, run the previously posted taskTest, and then create some heavy interrupt activity. The rt_task_unblock call (from root) ends up being the ideal instigator for this hole because it runs in the primary domain and doesn't return until the unblocked task is done, but I'd imagine any primary-domain syscall could occasionally reproduce it. I may eventually be able to produce a patch for this, but I imagine someone else out there is already familiar enough with the code to close this up a lot faster. I certainly wouldn't be offended. -- Kyle Howell ================================= This email and any files transmitted with it are confidential and intended solely for the use of the named recipient or recipients. If you have received this email in error please notify the sender immediately. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles 2007-11-28 4:59 ` Kyle Howell @ 2007-11-28 10:57 ` Philippe Gerum 0 siblings, 0 replies; 11+ messages in thread From: Philippe Gerum @ 2007-11-28 10:57 UTC (permalink / raw) To: Kyle Howell; +Cc: xenomai Kyle Howell wrote: >> > > > I have been debugging a stall problem for a couple of >> > > days, and I think > I've put together enough info to check >> > > with the pros. Everything below > was experienced on a P4 >> > > (Celeron) running 2.6.20 / Xenomai 2.3.4. I've > also >> > > reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* >> > > reproduce > this problem on a Core2 running x86_64 2.6.22.9 >> > > / 2.4RC3. >> > > > >> > > > I've reduced the problem to a fairly simple example below: >> > > > >> > > > The Overview: >> > > > - Running a single real-time process with one standard >> > > thread and one RT > task > - The RT task loops on a 1sec >> > > rt_task_sleep > - The standard thread loops on >> > > nanosleep(10msec) and rt_task_unblock of > the RT task. >> > > > - When an unrelated interrupt arrives at the wrong time, >> > > the entire > system will hang until the 1sec task_sleep expires. >> > > > - After resuming, everything runs normally until another >> > > interrupt lands > at the wrong moment. >> > > >> > > Do you observe the same behaviour without the interrupt shield ? >> > >> > It doesn't appear so. I'll have to let it run longer to be >> 100% sure, >> > but the usual stressing isn't causing the problem. That's >> not expected >> > behavior with the interrupt shield, is it? >> >> No, it is not an expected behavior. >> > > After considerable staring and code surfing, I think I have an idea of > what's happening. There are still enough parts of the code I don't fully > undertand that I'm not positive, though. Check this theory out for me: > > Flow of events when it works: > 1. Process running in root domain. > 2. Interrupt fires, IShield pending bit set. > 3. ipipe_walk_pipeline calls IShield handler. > 4. IShield propagates interrupt to root domain. > 5. Root domain finishes restoring the APIC. > 6. Everything continues as expected. > - or - > 1. Process running in Xenomai domain. > 2. Interrupt fires, IShield pending bit set. > 3. ipipe_walk_pipeline resumes high-priority Xenomai domain. > 4. Xenomai domain finishes and suspends. > 3. ipipe_walk_pipeline calls IShield handler. > 4. IShield propagates interrupt to root domain. > 5. Root domain finishes restoring the APIC. > 6. Everything continues as expected. > > Flow of events when it fails: > 1. Process running in root domain, makes syscall *requiring Xenomai > domain*. > 2. Thread is temporarily promoted to Xenomai domain to execute syscall. > 3. (Optional) Syscall results in another Xenomai task gaining control. > 3. Interrupt fires, IShield pending bit set. > 4. ipipe_walk_pipeline resumes high-priority Xenomai domain. > 5. (Optional) Other Xenomai task completes, promoted syscall resumes. > 6. Syscall returns to root domain, never calling ipipe_sync_pipeline on > IShield domain. > 7. Root domain sleeps without ever restoring the APIC. > 8. System hangs until event-timer fires for Xenomai task. > 9. Xenomai task finishes and suspends. > 10. ipipe_walk_pipeline calls Ishield handler. > 11. IShield propagates interrupt to root domain. > 12. Root domain finishes restoring the APIC. > 13. Everything continues as expected. > > To put it in a sentence, it looks like there's a loop-hole where a > promoted syscall can get back to the root domain without the > intermediate domains being checked for pending interrupts. Your analysis makes a lot of sense, even if I can't spot the loophole immediately in the I-pipe code. The propagate > logic in ipipe_dispatch_event *seems* like it would take care of this, This routine is indeed where I would point my finger at, as a first guess. As you explained, it does look like an adverse effect of domain migration taking some sideway in the pipeline logic, which ends up breaking the propagation of events. Normally, the interrupt shield domain is never stalled, so the only reason for such issue to pop up could only be due to this domain being bypassed somehow. -- Philippe. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-11-30 17:23 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-27 18:44 [Xenomai-help] Interrupts lost during sleep / unblock cycles Kyle Howell 2007-11-27 19:29 ` Gilles Chanteperdrix 2007-11-27 19:53 ` Kyle Howell 2007-11-27 20:08 ` Gilles Chanteperdrix 2007-11-27 20:21 ` Kyle Howell 2007-11-27 21:10 ` Gilles Chanteperdrix 2007-11-30 16:41 ` Philippe Gerum 2007-11-30 17:03 ` Gilles Chanteperdrix 2007-11-30 17:23 ` Philippe Gerum 2007-11-28 4:59 ` Kyle Howell 2007-11-28 10:57 ` Philippe Gerum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.