All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-help] Interrupts lost during sleep / unblock cycles
@ 2007-11-27 18:44 Kyle Howell
  2007-11-27 19:29 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 11+ messages in thread
From: Kyle Howell @ 2007-11-27 18:44 UTC (permalink / raw)
  To: xenomai

I have been debugging a stall problem for a couple of days, and I think
I've put together enough info to check with the pros. Everything below
was experienced on a P4 (Celeron) running 2.6.20 / Xenomai 2.3.4. I've
also reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* reproduce
this problem on a Core2 running x86_64 2.6.22.9 / 2.4RC3. 

I've reduced the problem to a fairly simple example below:

The Overview:
- Running a single real-time process with one standard thread and one RT
task
- The RT task loops on a 1sec rt_task_sleep
- The standard thread loops on nanosleep(10msec) and rt_task_unblock of
the RT task.
- When an unrelated interrupt arrives at the wrong time, the entire
system will hang until the 1sec task_sleep expires.
- After resuming, everything runs normally until another interrupt lands
at the wrong moment.

Here's the source code to the simple program:
/////////////////// Start taskTest.c //////////////////// 
#include <stdio.h>
#include <time.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <native/types.h>
#include <native/task.h>
#include <native/timer.h>
#include <nucleus/trace.h>
 
typedef unsigned long long UINT64;
RT_TASK rtTask;
char rtName[16] = "RtThread";
 
#define MAX_DELAY (50000)
 
void RtThread(void *arg);
 
int main ( int argc, char *argv[])
{
  int result;
  struct timespec req;
  struct timeval tv;
  UINT64 oldTime, newTime, diffTime; // Times in microseconds
 
  mlockall(MCL_CURRENT | MCL_FUTURE);
 
  result = rt_task_create(&rtTask, rtName, 0, 10, 0);
  result = rt_task_start(&rtTask, RtThread, NULL);
 
  req.tv_sec = 0;
  req.tv_nsec = 10000000; // 10 msec
 
  gettimeofday(&tv, NULL);
  oldTime = ((UINT64)tv.tv_sec * 1000000) + tv.tv_usec;
 
  while(1)
  {
    xntrace_special(1,1);
    rt_task_unblock(&rtTask);
 
    /* Try to sleep for 10msec */
    nanosleep(&req, NULL);
    xntrace_special(2,2);
 
    /* Check how much time has really passed */
    gettimeofday(&tv, NULL);
    newTime = ((UINT64)tv.tv_sec * 1000000) + tv.tv_usec;
    diffTime = newTime - oldTime;
    if(diffTime > MAX_DELAY)
    {
      printf("%llu - Diff time too large - %llu\n",newTime/1000000,
diffTime);
    }
 
    // Do it again
    oldTime = newTime;
  }
 
}
 
void RtThread(void *arg)
{
  RTIME startTsc, endTsc;
  printf("RtThread alive!\n");
 
  // Nothing but 1sec sleeps
  while(1)
  {
    startTsc = rt_timer_tsc();
    xntrace_special(4,4);
    rt_task_sleep(1000000000LL);
    xntrace_special(8,8);
    endTsc = rt_timer_tsc();
    // Capture this event
    if(rt_timer_tsc2ns(endTsc - startTsc) > 100000000)
      xntrace_user_freeze(endTsc - startTsc, 0);
  }
 
  printf("RtThread quitting!\n");
}
/////////////////// End taskTest.c //////////////////// 

This program produces output like the following:
# ./taskTest
RtThread alive!
1042677082 - Diff time too large - 1000308
1042677120 - Diff time too large - 1000631
1042677129 - Diff time too large - 1000406
1042677133 - Diff time too large - 1000395
etc...

Those are *one second* delays for the 10msec nanosleep to return.

Here's a capture from the I-Pipe tracer (note the user trace-points):
:    +func               -999696    0.157  __ipipe_syscall_root+0xa
(system_call+0x29)
:    +func               -999696    0.169  __ipipe_dispatch_event+0xe
(__ipipe_syscall_root+0x44)
:|   +begin   0x80000001 -999695    0.172  __ipipe_dispatch_event+0x19a
(__ipipe_syscall_root+0x44)
:| +  end     0x80000001 -999695    0.195  __ipipe_dispatch_event+0x18a
(__ipipe_syscall_root+0x44)
:  +  func               -999695    0.154  hisyscall_event+0xe
(__ipipe_dispatch_event+0xb5)
:  +  func               -999695    0.139  xnshadow_sys_trace+0xb
(hisyscall_event+0x185)
:  +  (0x01)  0x00000001 -999695    0.172  xnshadow_sys_trace+0x69
(hisyscall_event+0x185)
:| +  begin   0x80000001 -999695    0.169  __ipipe_dispatch_event+0x17b
(__ipipe_syscall_root+0x44)
:|   +end     0x80000001 -999694    0.206  __ipipe_dispatch_event+0x14f
(__ipipe_syscall_root+0x44)
:|   +begin   0x80000001 -999694    0.161  __ipipe_syscall_root+0xeb
(system_call+0x29)
:|   +end     0x80000001 -999694    0.214  __ipipe_syscall_root+0xf7
(system_call+0x29)
:    #func               -999694    0.169  __ipipe_unstall_iret_root+0x9
(restore_nocheck_notrace+0x0)
:|   #begin   0x80000000 -999694    0.172
__ipipe_unstall_iret_root+0x7c (restore_nocheck_notrace+0x0)
:|   +end     0x8000000d -999693    0.713
__ipipe_unstall_iret_root+0x2c (restore_nocheck_notrace+0x0)
:    +func               -999693    0.154  __ipipe_syscall_root+0xa
(system_call+0x29)
:    +func               -999693    0.165  __ipipe_dispatch_event+0xe
(__ipipe_syscall_root+0x44)
:|   +begin   0x80000001 -999692    0.169  __ipipe_dispatch_event+0x19a
(__ipipe_syscall_root+0x44)
:| +  end     0x80000001 -999692    0.195  __ipipe_dispatch_event+0x18a
(__ipipe_syscall_root+0x44)
:  +  func               -999692    0.210  hisyscall_event+0xe
(__ipipe_dispatch_event+0xb5)
:  +  func               -999692    0.184  __rt_task_unblock+0xc
(hisyscall_event+0x185)
:  +  func               -999692    0.221
__copy_from_user_ll_nozero+0xa (__rt_task_unblock+0x3e)
:  +  func               -999691    0.184  xnregistry_fetch+0x9
(__rt_task_unblock+0x46)
:| +  begin   0x80000000 -999691    0.229  xnregistry_fetch+0x80
(__rt_task_unblock+0x46)
:| #  func               -999691    0.210
__ipipe_restore_pipeline_head+0xa (xnregistry_fetch+0x5f)
:| +  end     0x80000000 -999691    0.191
__ipipe_restore_pipeline_head+0x4c (xnregistry_fetch+0x5f)
:  +  func               -999691    0.176  rt_task_unblock+0xa
(__rt_task_unblock+0x54)
:| +  begin   0x80000000 -999690    0.214  rt_task_unblock+0x68
(__rt_task_unblock+0x54)
:| #  func               -999690    0.184  xnpod_unblock_thread+0xb
(rt_task_unblock+0x72)
:| #  func               -999690    0.195  xnpod_resume_thread+0xe
(xnpod_unblock_thread+0x84)
:| #  [  868] RtThrea 10 -999690    0.187  xnpod_resume_thread+0x56
(xnpod_unblock_thread+0x84)
:| #  func               -999690    0.293  xntimer_do_stop_aperiodic+0xe
(xnpod_resume_thread+0x237)
:| #  func               -999689    0.180  xnpod_schedule+0xe
(rt_task_unblock+0x77)
:| #  [  867] taskTes -1 -999689    0.447  xnpod_schedule+0x90
(rt_task_unblock+0x77)
:| #  func               -999689    0.687  __switch_to+0xe
(xnpod_schedule+0x46e)
:| #  [  868] RtThrea 10 -999688    0.635  xnpod_schedule+0x546
(xnpod_suspend_thread+0x189)
:| #  func               -999687    0.229
__ipipe_restore_pipeline_head+0xa (xnpod_suspend_thread+0xa6)
:| +  end     0x80000000 -999687    0.469
__ipipe_restore_pipeline_head+0x4c (xnpod_suspend_thread+0xa6)
:| +  begin   0x80000001 -999687    0.274  __ipipe_dispatch_event+0x17b
(__ipipe_syscall_root+0x44)
:| +  end     0x80000001 -999686    0.943  __ipipe_dispatch_event+0x14f
(__ipipe_syscall_root+0x44)
**************** This ethernet interrupt appears to trigger the problem
*****************
:| +  begin   0xffffffee -999686    0.191  common_interrupt+0x29
(__ipipe_dispatch_event+0x151)
:| +  func               -999685    0.435  __ipipe_handle_irq+0xe
(common_interrupt+0x2e)
:| +  func               -999685    0.398  __ipipe_ack_irq+0x8
(__ipipe_handle_irq+0x149)
:| +  func               -999685    0.251  __ipipe_ack_fasteoi_irq+0x8
(__ipipe_ack_irq+0x19)
:| +  func               -999684    0.221  __ipipe_walk_pipeline+0xe
(__ipipe_handle_irq+0x7b)
:| +  end     0xffffffee -999684+   1.476  common_interrupt+0x38
(__ipipe_dispatch_event+0x151)
:  +  func               -999683    0.191  __ipipe_syscall_root+0xa
(system_call+0x29)
:  +  func               -999682    0.191  __ipipe_dispatch_event+0xe
(__ipipe_syscall_root+0x44)
:| +  begin   0x80000001 -999682    0.206  __ipipe_dispatch_event+0x19a
(__ipipe_syscall_root+0x44)
:| +  end     0x80000001 -999682    0.214  __ipipe_dispatch_event+0x18a
(__ipipe_syscall_root+0x44)
:  +  func               -999682    0.221  hisyscall_event+0xe
(__ipipe_dispatch_event+0xb5)
:  +  func               -999682    0.195  xnshadow_sys_trace+0xb
(hisyscall_event+0x185)
:  +  (0x08)  0x00000008 -999681    0.202  xnshadow_sys_trace+0x69
(hisyscall_event+0x185)
:| +  begin   0x80000001 -999681    0.199  __ipipe_dispatch_event+0x17b
(__ipipe_syscall_root+0x44)
:| +  end     0x80000001 -999681    0.931  __ipipe_dispatch_event+0x14f
(__ipipe_syscall_root+0x44)
:  +  func               -999680    0.150  __ipipe_syscall_root+0xa
(system_call+0x29)
:  +  func               -999680    0.165  __ipipe_dispatch_event+0xe
(__ipipe_syscall_root+0x44)
:| +  begin   0x80000001 -999680    0.169  __ipipe_dispatch_event+0x19a
(__ipipe_syscall_root+0x44)
:| +  end     0x80000001 -999680    0.184  __ipipe_dispatch_event+0x18a
(__ipipe_syscall_root+0x44)
:  +  func               -999679    0.187  hisyscall_event+0xe
(__ipipe_dispatch_event+0xb5)
:  +  func               -999679    0.150  xnshadow_sys_trace+0xb
(hisyscall_event+0x185)
:  +  (0x04)  0x00000004 -999679    0.176  xnshadow_sys_trace+0x69
(hisyscall_event+0x185)
:| +  begin   0x80000001 -999679    0.195  __ipipe_dispatch_event+0x17b
(__ipipe_syscall_root+0x44)
:| +  end     0x80000001 -999679    0.736  __ipipe_dispatch_event+0x14f
(__ipipe_syscall_root+0x44)
:  +  func               -999678    0.150  __ipipe_syscall_root+0xa
(system_call+0x29)
:  +  func               -999678    0.165  __ipipe_dispatch_event+0xe
(__ipipe_syscall_root+0x44)
:| +  begin   0x80000001 -999678    0.172  __ipipe_dispatch_event+0x19a
(__ipipe_syscall_root+0x44)
:| +  end     0x80000001 -999677    0.184  __ipipe_dispatch_event+0x18a
(__ipipe_syscall_root+0x44)
:  +  func               -999677    0.244  hisyscall_event+0xe
(__ipipe_dispatch_event+0xb5)
:  +  func               -999677    0.184  __rt_task_sleep+0xb
(hisyscall_event+0x185)
:  +  func               -999677    0.330
__copy_from_user_ll_nozero+0xa (__rt_task_sleep+0x1a)
:  +  func               -999676    0.195  rt_task_sleep+0xe
(__rt_task_sleep+0x25)
:  +  func               -999676    0.191  xnpod_suspend_thread+0xe
(rt_task_sleep+0x73)
:| +  begin   0x80000000 -999676    0.244  xnpod_suspend_thread+0x13f
(rt_task_sleep+0x73)
:| #  func               -999676    0.480
xntimer_do_start_aperiodic+0xe (xnpod_suspend_thread+0x173)
:| #  func               -999675    0.202  xnpod_schedule+0xe
(xnpod_suspend_thread+0x189)
:| #  [  868] RtThrea 10 -999675    0.417  xnpod_schedule+0x90
(xnpod_suspend_thread+0x189)
:| #  func               -999675    0.567  __switch_to+0xe
(xnpod_schedule+0x46e)
************** Hey! Shouldn't that stalled interrupt have been serviced
here? **************
:| #  [  867] taskTes -1 -999674    0.608  xnpod_schedule+0x546
(rt_task_unblock+0x77)
:| #  func               -999674    0.225
__ipipe_restore_pipeline_head+0xa (rt_task_unblock+0x57)
:| +  end     0x80000000 -999673    0.353
__ipipe_restore_pipeline_head+0x4c (rt_task_unblock+0x57)
:| +  begin   0x80000001 -999673    0.281  __ipipe_dispatch_event+0x17b
(__ipipe_syscall_root+0x44)
:|   +end     0x80000001 -999673    0.353  __ipipe_dispatch_event+0x14f
(__ipipe_syscall_root+0x44)
:|   +begin   0x80000001 -999672    0.199  __ipipe_syscall_root+0xeb
(system_call+0x29)
:|   +end     0x80000001 -999672    0.274  __ipipe_syscall_root+0xf7
(system_call+0x29)
:    #func               -999672    0.210  __ipipe_unstall_iret_root+0x9
(restore_nocheck_notrace+0x0)
:|   #begin   0x80000000 -999672    0.210
__ipipe_unstall_iret_root+0x7c (restore_nocheck_notrace+0x0)
:|   +end     0x8000000d -999671    0.969
__ipipe_unstall_iret_root+0x2c (restore_nocheck_notrace+0x0)
:    +func               -999670    0.293  __ipipe_syscall_root+0xa
(sysenter_past_esp+0x46)
:    +func               -999670    0.184  sys_nanosleep+0xc
(sysenter_past_esp+0x6e)
:    +func               -999670    0.176  copy_from_user+0xb
(sys_nanosleep+0x1e)
:    +func               -999670    0.326  __copy_from_user_ll+0xa
(copy_from_user+0x35)
:    +func               -999670    0.172  hrtimer_nanosleep+0xe
(sys_nanosleep+0x5c)
:    +func               -999669    0.225  hrtimer_init+0xb
(hrtimer_nanosleep+0x22)
:    +func               -999669    0.202  do_nanosleep+0xd
(hrtimer_nanosleep+0x41)
:    +func               -999669    0.210  hrtimer_init_sleeper+0x8
(do_nanosleep+0x1d)
:    +func               -999669    0.184  hrtimer_start+0xe
(do_nanosleep+0x4d)
:    #func               -999669    0.232  ktime_get+0xd
(hrtimer_start+0xbc)
:    #func               -999668    0.172  ktime_get_ts+0xa
(ktime_get+0x17)
:    #func               -999668    0.172  getnstimeofday+0xe
(ktime_get_ts+0x19)
:    #func               -999668    0.229  read_tsc+0x8
(getnstimeofday+0x34)
:    #func               -999668    0.214  set_normalized_timespec+0x8
(ktime_get_ts+0x40)
:    #func               -999667    0.229  enqueue_hrtimer+0xb
(hrtimer_start+0x67)
:    #func               -999667    0.202  rb_insert_color+0xe
(enqueue_hrtimer+0x56)
:    #func               -999667    0.202  __ipipe_restore_root+0x8
(hrtimer_start+0x6f)
:    #func               -999667    0.232  __ipipe_unstall_root+0x8
(__ipipe_restore_root+0x1b)
:|   #begin   0x80000000 -999667    0.266  __ipipe_unstall_root+0x41
(__ipipe_restore_root+0x1b)
:|   +end     0x80000000 -999666    0.263  __ipipe_unstall_root+0x33
(__ipipe_restore_root+0x1b)
:    +func               -999666    0.199  __sched_text_start+0xe
(do_nanosleep+0x52)
:    +func               -999666    0.417  sched_clock+0xa
(__sched_text_start+0x85)
:    #func               -999665    0.206  deactivate_task+0x9
(__sched_text_start+0x35c)
:    #func               -999665    0.319  dequeue_task+0xa
(deactivate_task+0x1e)
:|   #begin   0x80000000 -999665    0.232  __sched_text_start+0x414
(do_nanosleep+0x52)
:|   #func               -999665    0.300  __switch_to+0xe
(__sched_text_start+0x2e7)
:|   #end     0x80000000 -999664    0.202  __sched_text_start+0x403
(cpu_idle+0x64)
:    #func               -999664    0.217  __ipipe_unstall_root+0x8
(__sched_text_start+0x37b)
:|   #begin   0x80000000 -999664    0.180  __ipipe_unstall_root+0x41
(__sched_text_start+0x37b)
:|   +end     0x80000000 -999664    0.278  __ipipe_unstall_root+0x33
(__sched_text_start+0x37b)
:    +func               -999664    0.214  ipipe_suspend_domain+0xe
(cpu_idle+0x44)
:|   +begin   0x80000001 -999663    0.236  ipipe_suspend_domain+0x9d
(cpu_idle+0x44)
:|   +end     0x80000001 -999663    0.214  ipipe_suspend_domain+0xac
(cpu_idle+0x44)
:    +func               -999663    0.169  mwait_idle+0x8
(cpu_idle+0x46)
:    +func               -999663    0.165  __ipipe_unstall_root+0x8
(mwait_idle+0xd)
:|   +begin   0x80000000 -999663    0.225  __ipipe_unstall_root+0x41
(mwait_idle+0xd)
:|   +end     0x80000000 -999662    0.184  __ipipe_unstall_root+0x33
(mwait_idle+0xd)
:    +func               -999662! 999575.071  mwait_idle_with_hints+0xb
(mwait_idle+0x16)
************ WHOOPS! No Linux timer irqs, no ethernet irqs, no nothing
***************
:|   +begin   0xffffff16   -87    0.195  ipipe_ipi3+0x2e
(mwait_idle_with_hints+0x3e)
:|   +func                 -87    0.360  __ipipe_handle_irq+0xe
(ipipe_ipi3+0x33)
:|   +func                 -86    0.229  __ipipe_ack_apic+0x8
(__ipipe_handle_irq+0xa6)
:|   +func                 -86    0.526  __ipipe_dispatch_wired+0xb
(__ipipe_handle_irq+0x62)
:| #  func                 -86    0.364  xnintr_clock_handler+0x8
(__ipipe_dispatch_wired+0x85)
:| #  func                 -85    0.326  xnintr_irq_handler+0xe
(xnintr_clock_handler+0x17)
:| #  func                 -85    0.172  xnpod_announce_tick+0x8
(xnintr_irq_handler+0x3b)
:| #  func                 -85    0.661  xntimer_do_tick_aperiodic+0xe
(xnpod_announce_tick+0xf)
:|   +func                 -84    0.244  __ipipe_walk_pipeline+0xe
(__ipipe_handle_irq+0x7b)
:|  + func                 -84    0.187  ipipe_suspend_domain+0xe
(__ipipe_walk_pipeline+0xa0)
:|  + func                 -84    0.248  __ipipe_sync_stage+0xe
(ipipe_suspend_domain+0x63)
:|  # end     0x80000000   -84    0.244  __ipipe_sync_stage+0x13c
(ipipe_suspend_domain+0x63)
:   # func                 -83    0.184  shield_handler+0x8
(__ipipe_sync_stage+0xbf)
:   # func                 -83    0.199  __ipipe_schedule_irq+0xe
(shield_handler+0x19)
:|  # begin   0x80000001   -83    0.409  __ipipe_schedule_irq+0xdf
(shield_handler+0x19)
:|  # end     0x80000001   -82    0.251  __ipipe_schedule_irq+0x10f
(shield_handler+0x19)
:|  # begin   0x80000000   -82    0.319  __ipipe_sync_stage+0x130
(ipipe_suspend_domain+0x63)
:|   +func                 -82    0.206  __ipipe_sync_stage+0xe
(ipipe_suspend_domain+0x63)
:|   #end     0x80000000   -82    0.240  __ipipe_sync_stage+0x13c
(ipipe_suspend_domain+0x63)
*********** Ahh.. Now we're gonna take care of that old interrupt
***********
:    #func                 -81    0.221  do_IRQ+0xd
(__ipipe_sync_stage+0x16f)
:    #func                 -81    0.296  handle_fasteoi_irq+0xb
(do_IRQ+0x47)
:    #func                 -81    0.210  handle_IRQ_event+0xe
(handle_fasteoi_irq+0x6d)
:    #func                 -81    0.169  __ipipe_unstall_root+0x8
(handle_IRQ_event+0x65)
:|   #begin   0x80000000   -81    0.176  __ipipe_unstall_root+0x41
(handle_IRQ_event+0x65)
:|   +end     0x80000000   -80    0.221  __ipipe_unstall_root+0x33
(handle_IRQ_event+0x65)
:    +func                 -80    0.338  usb_hcd_irq+0x9
(handle_IRQ_event+0x30)
:    +func                 -80+   1.074  uhci_irq+0xe (usb_hcd_irq+0x2a)
:    +func                 -79    0.326  rtl8139_interrupt+0xe
(handle_IRQ_event+0x30)
:    +func                 -78    0.950  ioread16+0x8
(rtl8139_interrupt+0x3d)
:    +func                 -77    0.172  iowrite16+0x8
(rtl8139_interrupt+0xe4)
:    +func                 -77    0.845  ioread16+0x8
(rtl8139_interrupt+0xeb)
:    +func                 -76    0.176  __netif_rx_schedule+0xa
(rtl8139_interrupt+0xf3)
:    #func                 -76    0.172  __ipipe_restore_root+0x8
(__netif_rx_schedule+0x6b)
:    #func                 -76    0.206  __ipipe_unstall_root+0x8
(__ipipe_restore_root+0x1b)
:|   #begin   0x80000000   -76    0.180  __ipipe_unstall_root+0x41
(__ipipe_restore_root+0x1b)
:|   +end     0x80000000   -76    0.278  __ipipe_unstall_root+0x33
(__ipipe_restore_root+0x1b)
:    #func                 -75    0.195  note_interrupt+0xe
(handle_fasteoi_irq+0xb3)
:    #func                 -75    0.943  ack_ioapic_quirk_irq+0xa
(handle_fasteoi_irq+0x91)
:|   #begin   0xffffffff   -74    0.236  common_interrupt+0x29
(handle_fasteoi_irq+0xa4)
:|   #func                 -74    0.248  __ipipe_handle_irq+0xe
(common_interrupt+0x2e)
:|   #func                 -74    0.225  __ipipe_ack_irq+0x8
(__ipipe_handle_irq+0x149)
:|   #func                 -74    0.169  __ipipe_ack_edge_irq+0x8
(__ipipe_ack_irq+0x19)
:|   #func                 -73    0.214  ack_ioapic_irq+0x8
(__ipipe_ack_edge_irq+0xe)
:|   #func                 -73    0.187  __ipipe_walk_pipeline+0xe
(__ipipe_handle_irq+0x7b)
:|  +*func                 -73    0.172  ipipe_suspend_domain+0xe
(__ipipe_walk_pipeline+0xa0)
:|  +*func                 -73    0.187  __ipipe_sync_stage+0xe
(ipipe_suspend_domain+0x63)
:|  #*end     0x80000000   -73    0.199  __ipipe_sync_stage+0x13c
(ipipe_suspend_domain+0x63)
:   #*func                 -72    0.154  shield_handler+0x8
(__ipipe_sync_stage+0xbf)
:   #*func                 -72    0.172  __ipipe_schedule_irq+0xe
(shield_handler+0x19)
:|  #*begin   0x80000001   -72    0.206  __ipipe_schedule_irq+0xdf
(shield_handler+0x19)
:|  #*end     0x80000001   -72    0.244  __ipipe_schedule_irq+0x10f
(shield_handler+0x19)
:|  #*begin   0x80000000   -72    0.353  __ipipe_sync_stage+0x130
(ipipe_suspend_domain+0x63)
:|   #end     0xffffffff   -71    0.462  common_interrupt+0x38
(handle_fasteoi_irq+0xa4)
:    #func                 -71    0.210  irq_exit+0x8 (do_IRQ+0x4c)
Etc...

So I seem to see what's causing the problem now, but surfing the code to
find the bug might take a while. Anyone out there already know what this
one is? Let me know if you need more info. Also, I'd be interested to
hear if anyone else can reproduce this on their system.

Thanks,
Kyle Howell

=================================
This email and any files transmitted with it are
confidential and intended solely for the use of the
named recipient or recipients.  If you have received
this email in error please notify the sender
immediately.  


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles
  2007-11-27 18:44 [Xenomai-help] Interrupts lost during sleep / unblock cycles Kyle Howell
@ 2007-11-27 19:29 ` Gilles Chanteperdrix
  2007-11-27 19:53   ` Kyle Howell
  0 siblings, 1 reply; 11+ messages in thread
From: Gilles Chanteperdrix @ 2007-11-27 19:29 UTC (permalink / raw)
  To: Kyle Howell; +Cc: xenomai

Kyle Howell wrote:
 > I have been debugging a stall problem for a couple of days, and I think
 > I've put together enough info to check with the pros. Everything below
 > was experienced on a P4 (Celeron) running 2.6.20 / Xenomai 2.3.4. I've
 > also reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* reproduce
 > this problem on a Core2 running x86_64 2.6.22.9 / 2.4RC3. 
 > 
 > I've reduced the problem to a fairly simple example below:
 > 
 > The Overview:
 > - Running a single real-time process with one standard thread and one RT
 > task
 > - The RT task loops on a 1sec rt_task_sleep
 > - The standard thread loops on nanosleep(10msec) and rt_task_unblock of
 > the RT task.
 > - When an unrelated interrupt arrives at the wrong time, the entire
 > system will hang until the 1sec task_sleep expires.
 > - After resuming, everything runs normally until another interrupt lands
 > at the wrong moment.

Do you observe the same behaviour without the interrupt shield ?

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles
  2007-11-27 19:29 ` Gilles Chanteperdrix
@ 2007-11-27 19:53   ` Kyle Howell
  2007-11-27 20:08     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 11+ messages in thread
From: Kyle Howell @ 2007-11-27 19:53 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

>  > I have been debugging a stall problem for a couple of 
> days, and I think  > I've put together enough info to check 
> with the pros. Everything below  > was experienced on a P4 
> (Celeron) running 2.6.20 / Xenomai 2.3.4. I've  > also 
> reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* 
> reproduce  > this problem on a Core2 running x86_64 2.6.22.9 
> / 2.4RC3. 
>  >
>  > I've reduced the problem to a fairly simple example below:
>  >
>  > The Overview:
>  > - Running a single real-time process with one standard 
> thread and one RT  > task  > - The RT task loops on a 1sec 
> rt_task_sleep  > - The standard thread loops on 
> nanosleep(10msec) and rt_task_unblock of  > the RT task.
>  > - When an unrelated interrupt arrives at the wrong time, 
> the entire  > system will hang until the 1sec task_sleep expires.
>  > - After resuming, everything runs normally until another 
> interrupt lands  > at the wrong moment.
> 
> Do you observe the same behaviour without the interrupt shield ?

It doesn't appear so. I'll have to let it run longer to be 100% sure,
but the usual stressing isn't causing the problem. That's not expected
behavior with the interrupt shield, is it?

-- Kyle Howell

=================================
This email and any files transmitted with it are
confidential and intended solely for the use of the
named recipient or recipients.  If you have received
this email in error please notify the sender
immediately.  


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles
  2007-11-27 19:53   ` Kyle Howell
@ 2007-11-27 20:08     ` Gilles Chanteperdrix
  2007-11-27 20:21       ` Kyle Howell
  2007-11-28  4:59       ` Kyle Howell
  0 siblings, 2 replies; 11+ messages in thread
From: Gilles Chanteperdrix @ 2007-11-27 20:08 UTC (permalink / raw)
  To: Kyle Howell; +Cc: xenomai

Kyle Howell wrote:
 > >  > I have been debugging a stall problem for a couple of 
 > > days, and I think  > I've put together enough info to check 
 > > with the pros. Everything below  > was experienced on a P4 
 > > (Celeron) running 2.6.20 / Xenomai 2.3.4. I've  > also 
 > > reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* 
 > > reproduce  > this problem on a Core2 running x86_64 2.6.22.9 
 > > / 2.4RC3. 
 > >  >
 > >  > I've reduced the problem to a fairly simple example below:
 > >  >
 > >  > The Overview:
 > >  > - Running a single real-time process with one standard 
 > > thread and one RT  > task  > - The RT task loops on a 1sec 
 > > rt_task_sleep  > - The standard thread loops on 
 > > nanosleep(10msec) and rt_task_unblock of  > the RT task.
 > >  > - When an unrelated interrupt arrives at the wrong time, 
 > > the entire  > system will hang until the 1sec task_sleep expires.
 > >  > - After resuming, everything runs normally until another 
 > > interrupt lands  > at the wrong moment.
 > > 
 > > Do you observe the same behaviour without the interrupt shield ?
 > 
 > It doesn't appear so. I'll have to let it run longer to be 100% sure,
 > but the usual stressing isn't causing the problem. That's not expected
 > behavior with the interrupt shield, is it?

No, it is not an expected behavior.

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles
  2007-11-27 20:08     ` Gilles Chanteperdrix
@ 2007-11-27 20:21       ` Kyle Howell
  2007-11-27 21:10         ` Gilles Chanteperdrix
  2007-11-28  4:59       ` Kyle Howell
  1 sibling, 1 reply; 11+ messages in thread
From: Kyle Howell @ 2007-11-27 20:21 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

>  > >  > I have been debugging a stall problem for a couple of 
>  > > days, and I think  > I've put together enough info to check 
>  > > with the pros. Everything below  > was experienced on a P4 
>  > > (Celeron) running 2.6.20 / Xenomai 2.3.4. I've  > also 
>  > > reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* 
>  > > reproduce  > this problem on a Core2 running x86_64 2.6.22.9 
>  > > / 2.4RC3. 
>  > >  >
>  > >  > I've reduced the problem to a fairly simple example below:
>  > >  >
>  > >  > The Overview:
>  > >  > - Running a single real-time process with one standard 
>  > > thread and one RT  > task  > - The RT task loops on a 1sec 
>  > > rt_task_sleep  > - The standard thread loops on 
>  > > nanosleep(10msec) and rt_task_unblock of  > the RT task.
>  > >  > - When an unrelated interrupt arrives at the wrong time, 
>  > > the entire  > system will hang until the 1sec task_sleep expires.
>  > >  > - After resuming, everything runs normally until another 
>  > > interrupt lands  > at the wrong moment.
>  > > 
>  > > Do you observe the same behaviour without the interrupt shield ?
>  > 
>  > It doesn't appear so. I'll have to let it run longer to be 
> 100% sure,
>  > but the usual stressing isn't causing the problem. That's 
> not expected
>  > behavior with the interrupt shield, is it?
> 
> No, it is not an expected behavior.

Well, that's good news. Still no stalls without the IShield, so that's
certainly narrowed it down.

Another note: I'm currently using the IPipe 1.8-08 that is packaged with
Xenomai 2.3.4. Do you expect a change if I grab 1.10-12 or 1.11-00? Are
the Xenomai releases tightly coupled to a particular I-Pipe version (had
problems with this in the past)?

BTW, thanks for being so responsive. Just focusing on the IShield has
already been a tremendous help.

-- Kyle Howell

=================================
This email and any files transmitted with it are
confidential and intended solely for the use of the
named recipient or recipients.  If you have received
this email in error please notify the sender
immediately.  


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles
  2007-11-27 20:21       ` Kyle Howell
@ 2007-11-27 21:10         ` Gilles Chanteperdrix
  2007-11-30 16:41           ` Philippe Gerum
  0 siblings, 1 reply; 11+ messages in thread
From: Gilles Chanteperdrix @ 2007-11-27 21:10 UTC (permalink / raw)
  To: Kyle Howell; +Cc: xenomai

Kyle Howell wrote:
 > >  > >  > I have been debugging a stall problem for a couple of 
 > >  > > days, and I think  > I've put together enough info to check 
 > >  > > with the pros. Everything below  > was experienced on a P4 
 > >  > > (Celeron) running 2.6.20 / Xenomai 2.3.4. I've  > also 
 > >  > > reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* 
 > >  > > reproduce  > this problem on a Core2 running x86_64 2.6.22.9 
 > >  > > / 2.4RC3. 
 > >  > >  >
 > >  > >  > I've reduced the problem to a fairly simple example below:
 > >  > >  >
 > >  > >  > The Overview:
 > >  > >  > - Running a single real-time process with one standard 
 > >  > > thread and one RT  > task  > - The RT task loops on a 1sec 
 > >  > > rt_task_sleep  > - The standard thread loops on 
 > >  > > nanosleep(10msec) and rt_task_unblock of  > the RT task.
 > >  > >  > - When an unrelated interrupt arrives at the wrong time, 
 > >  > > the entire  > system will hang until the 1sec task_sleep expires.
 > >  > >  > - After resuming, everything runs normally until another 
 > >  > > interrupt lands  > at the wrong moment.
 > >  > > 
 > >  > > Do you observe the same behaviour without the interrupt shield ?
 > >  > 
 > >  > It doesn't appear so. I'll have to let it run longer to be 
 > > 100% sure,
 > >  > but the usual stressing isn't causing the problem. That's 
 > > not expected
 > >  > behavior with the interrupt shield, is it?
 > > 
 > > No, it is not an expected behavior.
 > 
 > Well, that's good news. Still no stalls without the IShield, so that's
 > certainly narrowed it down.
 > 
 > Another note: I'm currently using the IPipe 1.8-08 that is packaged with
 > Xenomai 2.3.4. Do you expect a change if I grab 1.10-12 or 1.11-00? Are
 > the Xenomai releases tightly coupled to a particular I-Pipe version (had
 > problems with this in the past)?

Usually, a Xenomai version is compatible with past I-pipe releases. But
you should expect problems using a new I-pipe release with an older
version of Xenomai.

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles
  2007-11-27 20:08     ` Gilles Chanteperdrix
  2007-11-27 20:21       ` Kyle Howell
@ 2007-11-28  4:59       ` Kyle Howell
  2007-11-28 10:57         ` Philippe Gerum
  1 sibling, 1 reply; 11+ messages in thread
From: Kyle Howell @ 2007-11-28  4:59 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

>  > >  > I have been debugging a stall problem for a couple of 
>  > > days, and I think  > I've put together enough info to check 
>  > > with the pros. Everything below  > was experienced on a P4 
>  > > (Celeron) running 2.6.20 / Xenomai 2.3.4. I've  > also 
>  > > reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* 
>  > > reproduce  > this problem on a Core2 running x86_64 2.6.22.9 
>  > > / 2.4RC3. 
>  > >  >
>  > >  > I've reduced the problem to a fairly simple example below:
>  > >  >
>  > >  > The Overview:
>  > >  > - Running a single real-time process with one standard 
>  > > thread and one RT  > task  > - The RT task loops on a 1sec 
>  > > rt_task_sleep  > - The standard thread loops on 
>  > > nanosleep(10msec) and rt_task_unblock of  > the RT task.
>  > >  > - When an unrelated interrupt arrives at the wrong time, 
>  > > the entire  > system will hang until the 1sec task_sleep expires.
>  > >  > - After resuming, everything runs normally until another 
>  > > interrupt lands  > at the wrong moment.
>  > > 
>  > > Do you observe the same behaviour without the interrupt shield ?
>  > 
>  > It doesn't appear so. I'll have to let it run longer to be 
> 100% sure,
>  > but the usual stressing isn't causing the problem. That's 
> not expected
>  > behavior with the interrupt shield, is it?
> 
> No, it is not an expected behavior.
> 

After considerable staring and code surfing, I think I have an idea of
what's happening. There are still enough parts of the code I don't fully
undertand that I'm not positive, though. Check this theory out for me:

Flow of events when it works:
1. Process running in root domain.
2. Interrupt fires, IShield pending bit set.
3. ipipe_walk_pipeline calls IShield handler.
4. IShield propagates interrupt to root domain.
5. Root domain finishes restoring the APIC.
6. Everything continues as expected.
 - or -
1. Process running in Xenomai domain.
2. Interrupt fires, IShield pending bit set.
3. ipipe_walk_pipeline resumes high-priority Xenomai domain.
4. Xenomai domain finishes and suspends.
3. ipipe_walk_pipeline calls IShield handler.
4. IShield propagates interrupt to root domain.
5. Root domain finishes restoring the APIC.
6. Everything continues as expected.

Flow of events when it fails:
1. Process running in root domain, makes syscall *requiring Xenomai
domain*.
2. Thread is temporarily promoted to Xenomai domain to execute syscall.
3. (Optional) Syscall results in another Xenomai task gaining control.
3. Interrupt fires, IShield pending bit set.
4. ipipe_walk_pipeline resumes high-priority Xenomai domain.
5. (Optional) Other Xenomai task completes, promoted syscall resumes.
6. Syscall returns to root domain, never calling ipipe_sync_pipeline on
IShield domain.
7. Root domain sleeps without ever restoring the APIC.
8. System hangs until event-timer fires for Xenomai task.
9. Xenomai task finishes and suspends.
10. ipipe_walk_pipeline calls Ishield handler.
11. IShield propagates interrupt to root domain.
12. Root domain finishes restoring the APIC.
13. Everything continues as expected.

To put it in a sentence, it looks like there's a loop-hole where a
promoted syscall can get back to the root domain without the
intermediate domains being checked for pending interrupts. The propagate
logic in ipipe_dispatch_event *seems* like it would take care of this,
but I'm guessing I'm making bad assumptions about variables.

To force quick reproduction of this problem, run the previously posted
taskTest, and then create some heavy interrupt activity. The
rt_task_unblock call (from root) ends up being the ideal instigator for
this hole because it runs in the primary domain and doesn't return until
the unblocked task is done, but I'd imagine any primary-domain syscall
could occasionally reproduce it.

I may eventually be able to produce a patch for this, but I imagine
someone else out there is already familiar enough with the code to close
this up a lot faster. I certainly wouldn't be offended.

--
Kyle Howell

=================================
This email and any files transmitted with it are
confidential and intended solely for the use of the
named recipient or recipients.  If you have received
this email in error please notify the sender
immediately.  


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles
  2007-11-28  4:59       ` Kyle Howell
@ 2007-11-28 10:57         ` Philippe Gerum
  0 siblings, 0 replies; 11+ messages in thread
From: Philippe Gerum @ 2007-11-28 10:57 UTC (permalink / raw)
  To: Kyle Howell; +Cc: xenomai

Kyle Howell wrote:
>>  > >  > I have been debugging a stall problem for a couple of 
>>  > > days, and I think  > I've put together enough info to check 
>>  > > with the pros. Everything below  > was experienced on a P4 
>>  > > (Celeron) running 2.6.20 / Xenomai 2.3.4. I've  > also 
>>  > > reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* 
>>  > > reproduce  > this problem on a Core2 running x86_64 2.6.22.9 
>>  > > / 2.4RC3. 
>>  > >  >
>>  > >  > I've reduced the problem to a fairly simple example below:
>>  > >  >
>>  > >  > The Overview:
>>  > >  > - Running a single real-time process with one standard 
>>  > > thread and one RT  > task  > - The RT task loops on a 1sec 
>>  > > rt_task_sleep  > - The standard thread loops on 
>>  > > nanosleep(10msec) and rt_task_unblock of  > the RT task.
>>  > >  > - When an unrelated interrupt arrives at the wrong time, 
>>  > > the entire  > system will hang until the 1sec task_sleep expires.
>>  > >  > - After resuming, everything runs normally until another 
>>  > > interrupt lands  > at the wrong moment.
>>  > > 
>>  > > Do you observe the same behaviour without the interrupt shield ?
>>  > 
>>  > It doesn't appear so. I'll have to let it run longer to be 
>> 100% sure,
>>  > but the usual stressing isn't causing the problem. That's 
>> not expected
>>  > behavior with the interrupt shield, is it?
>>
>> No, it is not an expected behavior.
>>
> 
> After considerable staring and code surfing, I think I have an idea of
> what's happening. There are still enough parts of the code I don't fully
> undertand that I'm not positive, though. Check this theory out for me:
> 
> Flow of events when it works:
> 1. Process running in root domain.
> 2. Interrupt fires, IShield pending bit set.
> 3. ipipe_walk_pipeline calls IShield handler.
> 4. IShield propagates interrupt to root domain.
> 5. Root domain finishes restoring the APIC.
> 6. Everything continues as expected.
>  - or -
> 1. Process running in Xenomai domain.
> 2. Interrupt fires, IShield pending bit set.
> 3. ipipe_walk_pipeline resumes high-priority Xenomai domain.
> 4. Xenomai domain finishes and suspends.
> 3. ipipe_walk_pipeline calls IShield handler.
> 4. IShield propagates interrupt to root domain.
> 5. Root domain finishes restoring the APIC.
> 6. Everything continues as expected.
> 
> Flow of events when it fails:
> 1. Process running in root domain, makes syscall *requiring Xenomai
> domain*.
> 2. Thread is temporarily promoted to Xenomai domain to execute syscall.
> 3. (Optional) Syscall results in another Xenomai task gaining control.
> 3. Interrupt fires, IShield pending bit set.
> 4. ipipe_walk_pipeline resumes high-priority Xenomai domain.
> 5. (Optional) Other Xenomai task completes, promoted syscall resumes.
> 6. Syscall returns to root domain, never calling ipipe_sync_pipeline on
> IShield domain.
> 7. Root domain sleeps without ever restoring the APIC.
> 8. System hangs until event-timer fires for Xenomai task.
> 9. Xenomai task finishes and suspends.
> 10. ipipe_walk_pipeline calls Ishield handler.
> 11. IShield propagates interrupt to root domain.
> 12. Root domain finishes restoring the APIC.
> 13. Everything continues as expected.
> 
> To put it in a sentence, it looks like there's a loop-hole where a
> promoted syscall can get back to the root domain without the
> intermediate domains being checked for pending interrupts.

Your analysis makes a lot of sense, even if I can't spot the loophole
immediately in the I-pipe code.

 The propagate
> logic in ipipe_dispatch_event *seems* like it would take care of this,

This routine is indeed where I would point my finger at, as a first
guess. As you explained, it does look like an adverse effect of domain
migration taking some sideway in the pipeline logic, which ends up
breaking the propagation of events. Normally, the interrupt shield
domain is never stalled, so the only reason for such issue to pop up
could only be due to this domain being bypassed somehow.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles
  2007-11-27 21:10         ` Gilles Chanteperdrix
@ 2007-11-30 16:41           ` Philippe Gerum
  2007-11-30 17:03             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2007-11-30 16:41 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Gilles Chanteperdrix wrote:
> Kyle Howell wrote:
>  > >  > >  > I have been debugging a stall problem for a couple of 
>  > >  > > days, and I think  > I've put together enough info to check 
>  > >  > > with the pros. Everything below  > was experienced on a P4 
>  > >  > > (Celeron) running 2.6.20 / Xenomai 2.3.4. I've  > also 
>  > >  > > reproduced it on 2.6.19.7 / 2.3.1. A quick test *did not* 
>  > >  > > reproduce  > this problem on a Core2 running x86_64 2.6.22.9 
>  > >  > > / 2.4RC3. 
>  > >  > >  >
>  > >  > >  > I've reduced the problem to a fairly simple example below:
>  > >  > >  >
>  > >  > >  > The Overview:
>  > >  > >  > - Running a single real-time process with one standard 
>  > >  > > thread and one RT  > task  > - The RT task loops on a 1sec 
>  > >  > > rt_task_sleep  > - The standard thread loops on 
>  > >  > > nanosleep(10msec) and rt_task_unblock of  > the RT task.
>  > >  > >  > - When an unrelated interrupt arrives at the wrong time, 
>  > >  > > the entire  > system will hang until the 1sec task_sleep expires.
>  > >  > >  > - After resuming, everything runs normally until another 
>  > >  > > interrupt lands  > at the wrong moment.
>  > >  > > 
>  > >  > > Do you observe the same behaviour without the interrupt shield ?
>  > >  > 
>  > >  > It doesn't appear so. I'll have to let it run longer to be 
>  > > 100% sure,
>  > >  > but the usual stressing isn't causing the problem. That's 
>  > > not expected
>  > >  > behavior with the interrupt shield, is it?
>  > > 
>  > > No, it is not an expected behavior.
>  > 
>  > Well, that's good news. Still no stalls without the IShield, so that's
>  > certainly narrowed it down.
>  > 
>  > Another note: I'm currently using the IPipe 1.8-08 that is packaged with
>  > Xenomai 2.3.4. Do you expect a change if I grab 1.10-12 or 1.11-00? Are
>  > the Xenomai releases tightly coupled to a particular I-Pipe version (had
>  > problems with this in the past)?
> 
> Usually, a Xenomai version is compatible with past I-pipe releases. But
> you should expect problems using a new I-pipe release with an older
> version of Xenomai.
> 

To be more specific about this: we try really, really, really, awfully
and painfully hard to keep recent I-pipe patches compatible with
(reasonably) older Xenomai releases. No kidding. You may have noticed
that the I-pipe API has been quite stable over time for that particular
reason, and when we have to break it, there is most often some built-in
compat code.

The rule of thumb is: if the fully patched kernel compiles properly
(I-pipe + Xenomai), then this should work, for two reasons: first,
externally visible changes in some I-pipe release usually come with
wrappers to please older code, and second, we are careful in not
changing the semantics of existing calls even in subtle ways without
also forcing a syntactical change to make sure the issue is noticed
downstream, or at least provide a sane wrapper. In the former case, a
compilation error should warn you, at least.

Sometimes the generic part of the interrupt pipelining engine has to be
changed (e.g. recent "flat log" update), and this may have consequences
on the arch-dep I-pipe core interfaced with it, but in such a case,
problems have to be solved at I-pipe level, and should not leak to the
Xenomai space anyway.

In the x86 case, we have a particular situation due to mainline being
largely in a state of flux wrt some of its core layers since ages. As a
result of this:

- post-2.6.20 kernels won't work with Xenomai 2.3.x, because the Linux
clock/timer infrastructure has changed dramatically since then, in a way
that required a significant refactoring of the core Xenomai code for
x86, i.e. no wrapping possible. For this reason, there has been no
I-pipe support for 2.6.21/x86, and we directly jumped to 2.6.22/x86.
Said differently, supporting the new generic clock event layer required
significant surgery in both the I-pipe and Xenomai code.

- the latest I-pipe patch for 2.6.23/x86 broke the Adeos API,
specifically regarding the very recent ipipe_request_tickdev() service -
which depends on the above mainline change - but since you can't use
2.6.23 with Xenomai 2.3.x, this should not be a big deal for existing
production setups. OTOH, v2.4-rc7 and on will still accept older
kernels, even if you may want to run them preferably over 2.6.23 and
beyond. Other archs were not impacted since this service is only defined
for x86 for now.

- Because some people may not want to upgrade to 2.6.22+, most
improvements and fixes available with the latest I-pipe releases for
recent kernels have been backported to 2.6.20/x86. This patch will work
with both 2.3.x and 2.4 Xenomai releases. The same goes for powerpc32.

Sometimes, backward compatibility is not a sane option though. For
instance, the ipipe_tune_timer() service has been removed months
ago from newer patches with no replacement, because it put the burden of
managing periodic timing on the shoulders of the I-pipe, albeit this
should be the client code's business only. This caused hairy code to be
needed in order to port the I-pipe to other archs, with no actual
upside, since managing periodic timing is way more efficient when done
from the upper layers, e.g. Xenomai.

HTH,

-- 
Philippe.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles
  2007-11-30 16:41           ` Philippe Gerum
@ 2007-11-30 17:03             ` Gilles Chanteperdrix
  2007-11-30 17:23               ` Philippe Gerum
  0 siblings, 1 reply; 11+ messages in thread
From: Gilles Chanteperdrix @ 2007-11-30 17:03 UTC (permalink / raw)
  To: rpm; +Cc: xenomai

On Nov 30, 2007 5:41 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> Gilles Chanteperdrix wrote:
> > Usually, a Xenomai version is compatible with past I-pipe releases. But
> > you should expect problems using a new I-pipe release with an older
> > version of Xenomai.
> >
>
> To be more specific about this: we try really, really, really, awfully
> and painfully hard to keep recent I-pipe patches compatible with
> (reasonably) older Xenomai releases. No kidding. You may have noticed
> that the I-pipe API has been quite stable over time for that particular
> reason, and when we have to break it, there is most often some built-in
> compat code.

Then, I really have a problem with the newer ARM I-pipe patches, the
ones that no longer shut irqs over the mm switch, because they will
probably compile with a Xenomai 2.2.x, but will not work properly.

The only way I see to work around this is to make the ARM patch depend
on a CONFIG_ symbol which would be set by the newer Xenomai.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-help] Interrupts lost during sleep / unblock cycles
  2007-11-30 17:03             ` Gilles Chanteperdrix
@ 2007-11-30 17:23               ` Philippe Gerum
  0 siblings, 0 replies; 11+ messages in thread
From: Philippe Gerum @ 2007-11-30 17:23 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Gilles Chanteperdrix wrote:
> On Nov 30, 2007 5:41 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>> Gilles Chanteperdrix wrote:
>>> Usually, a Xenomai version is compatible with past I-pipe releases. But
>>> you should expect problems using a new I-pipe release with an older
>>> version of Xenomai.
>>>
>> To be more specific about this: we try really, really, really, awfully
>> and painfully hard to keep recent I-pipe patches compatible with
>> (reasonably) older Xenomai releases. No kidding. You may have noticed
>> that the I-pipe API has been quite stable over time for that particular
>> reason, and when we have to break it, there is most often some built-in
>> compat code.
> 
> Then, I really have a problem with the newer ARM I-pipe patches, the
> ones that no longer shut irqs over the mm switch, because they will
> probably compile with a Xenomai 2.2.x, but will not work properly.
>

Well, this is the purpose of "reasonably older" in the sentence. 2.2.x
is already a bit far. 2.3.x is a more reasonable target, particularly
because a shiny new I-pipe patch without all core fixes that went over
time into an entire major Xenomai milestone + maintenance time would not
bring that much. The mm change you mentioned require core surgery in
Xenomai to be compatible, but this was not an API issue.

I'm not sure such kind of changes could ever be detected sanely in older
code, since they don't affect the external interfaces, but require both
the I-pipe and Xenomai cores to agree on interrupt management. This is a
grey area, but not due to API changes.

> The only way I see to work around this is to make the ARM patch depend
> on a CONFIG_ symbol which would be set by the newer Xenomai.
> 

No, we can't do that. We have to admit that sometimes backward compat is
just not possible, unless we start doing really braindamage things. I'm
doing enough silly mistakes unwillingly without wanting to add more of
them deliberately...

-- 
Philippe.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-11-30 17:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-27 18:44 [Xenomai-help] Interrupts lost during sleep / unblock cycles Kyle Howell
2007-11-27 19:29 ` Gilles Chanteperdrix
2007-11-27 19:53   ` Kyle Howell
2007-11-27 20:08     ` Gilles Chanteperdrix
2007-11-27 20:21       ` Kyle Howell
2007-11-27 21:10         ` Gilles Chanteperdrix
2007-11-30 16:41           ` Philippe Gerum
2007-11-30 17:03             ` Gilles Chanteperdrix
2007-11-30 17:23               ` Philippe Gerum
2007-11-28  4:59       ` Kyle Howell
2007-11-28 10:57         ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.