All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-core] hanging in Xenomai 2.5.5
@ 2010-10-16  5:43 Stefan Schaal
  2010-10-16  8:48 ` Philippe Gerum
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Schaal @ 2010-10-16  5:43 UTC (permalink / raw)
  To: xenomai

Hi everybody,

  here is a quick first report on an issue that appeared with Xenomai 2.5.5 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue.

We run multiple real-time processes, synchronized by semaphores and interprocess communication using shared memory. All is cleanly implemented using the xenomai real-time functions, no mode switches. The different processes are distributed on different processors of our multi-core machine using rt_task_spawn() with the T_CPU directive. 

Up to version 2.5.4, this worked fine.

With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of running (CPU consumption goes to zero), and usually one of them hangs so badly that it cannot be killed anymore with kill -9 -- thus reboot is required.

The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, kernel 2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 2.6.31.4). Thus, this seems to be specific to the xenomai release 2.5.5 and higher.

No "dmesg" print-outs when this error occurs.

We will try to create a simple test program to illustrate the problem, but maybe the issue is already obvious to some of the experts on this list.

Best wishes,

-Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xenomai-core] hanging in Xenomai 2.5.5
  2010-10-16  5:43 [Xenomai-core] hanging in Xenomai 2.5.5 Stefan Schaal
@ 2010-10-16  8:48 ` Philippe Gerum
  2010-12-25 17:56   ` Stefan Schaal
       [not found]   ` <A059C858-912A-4207-A834-411C50184622@domain.hid>
  0 siblings, 2 replies; 5+ messages in thread
From: Philippe Gerum @ 2010-10-16  8:48 UTC (permalink / raw)
  To: Stefan Schaal; +Cc: xenomai

On Fri, 2010-10-15 at 22:43 -0700, Stefan Schaal wrote:
> Hi everybody,
> 
>   here is a quick first report on an issue that appeared with Xenomai 2.5.5 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue.
> 
> We run multiple real-time processes, synchronized by semaphores and interprocess communication using shared memory. All is cleanly implemented using the xenomai real-time functions, no mode switches. The different processes are distributed on different processors of our multi-core machine using rt_task_spawn() with the T_CPU directive. 
> 
> Up to version 2.5.4, this worked fine.
> 
> With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of running (CPU consumption goes to zero), and usually one of them hangs so badly that it cannot be killed anymore with kill -9 -- thus reboot is required.
> 
> The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, kernel 2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 2.6.31.4). Thus, this seems to be specific to the xenomai release 2.5.5 and higher.
> 
> No "dmesg" print-outs when this error occurs.
> 
> We will try to create a simple test program to illustrate the problem, but maybe the issue is already obvious to some of the experts on this list.
> 

$ cat /proc/xenomai/stat
$ cat /proc/xenomai/sched

when the threads hang would help.

Additionally, please clone the -stable repo from there:
git://git.xenomai.org/xenomai-2.5.git

then branch+build and test from these commits:

- 6a020f5 first; if the bug does not show up anymore, check the next one
- 5e7cfa5; if the bug is still there, try disabling
CONFIG_XENO_OPT_PRIOCPL to test the basic system and re-check.

> Best wishes,
> 
> -Stefan
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xenomai-core] hanging in Xenomai 2.5.5
  2010-10-16  8:48 ` Philippe Gerum
@ 2010-12-25 17:56   ` Stefan Schaal
       [not found]   ` <A059C858-912A-4207-A834-411C50184622@domain.hid>
  1 sibling, 0 replies; 5+ messages in thread
From: Stefan Schaal @ 2010-12-25 17:56 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 4600 bytes --]

Hi Phiippe,

  thanks so much for your replay -- it took me a moment to get back to this problem. Here are some first observations:

1) the problem only occurs when I distribute the communicating processes over multiple cores -- in Xenomai 2.5.4, this has never been a problem.

2) The /proc/xenomai/stat looks like:

CPU  PID    MSW        CSW        PF    STAT       %CPU  NAME
  0  0      0          2349229    0     00500080  100.0  ROOT/0
  1  0      0          20328410   0     00500080  100.0  ROOT/1
  2  0      0          1040321    0     00500080  100.0  ROOT/2
  3  0      0          445786     0     00500080  100.0  ROOT/3
  4  0      0          71162      0     00500080  100.0  ROOT/4
  5  0      0          0          0     00500080  100.0  ROOT/5
  6  0      0          0          0     00500080  100.0  ROOT/6
  7  0      0          0          0     00500080  100.0  ROOT/7
  1  3128   0          91261      0     00300182    0.0  sem1_task
  2  3166   0          90470      0     00300188    0.0  sem2_task
  3  3195   0          45237      0     00300182    0.0  sem3_task
  0  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  1  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  2  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  3  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  4  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  5  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  6  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  7  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  1  0      0          39326230   0     00000000    0.0  IRQ521: [timer]
  2  0      0          1641532    0     00000000    0.0  IRQ521: [timer]
  3  0      0          1258571    0     00000000    0.0  IRQ521: [timer]
  4  0      0          722843     0     00000000    0.0  IRQ521: [timer]
  5  0      0          780591     0     00000000    0.0  IRQ521: [timer]
  6  0      0          764817     0     00000000    0.0  IRQ521: [timer]
  7  0      0          385421     0     00000000    0.0  IRQ521: [timer]

The three communicating processes are sem1_task, sem2_task, sem3_task -- they are currently hanging with 0% CPU

3) the /proc/xenomai/sched look like:

CPU  PID    CLASS  PRI      TIMEOUT   TIMEBASE   STAT       NAME
  0  0      idle    -1      -         master     R          ROOT/0
  1  0      idle    -1      -         master     R          ROOT/1
  2  0      idle    -1      -         master     R          ROOT/2
  3  0      idle    -1      -         master     R          ROOT/3
  4  0      idle    -1      -         master     R          ROOT/4
  5  0      idle    -1      -         master     R          ROOT/5
  6  0      idle    -1      -         master     R          ROOT/6
  7  0      idle    -1      -         master     R          ROOT/7
  1  3128   rt      50      -         master     W          sem1_task
  2  3166   rt      50      -         master     R          sem2_task
  3  3195   rt      50      -         master     W          sem3_task

Interestingly, despite sem2_task is supposed to be running, it doesn't.


4) When I try to terminate the three processes, sem2_task would hand and I cannot kill it. Interestingly, if I start another program that does a similar semaphore communication, sem2_task is finally released. Indeed, when I start this other program, the three processes (sem1_task, sem2_task, sem3_task) start running again, until they hang again.


5) I appended the little test program I used -- it is called xtest_xeno_sem.c

I compile with:

gcc -o xtest -I/usr/xenomai/include -D_GNU_SOURCE -D_REENTRANT -Wall -pipe -D__XENO__ -lnative -L/usr/xenomai/lib -lxenomai -lpthread -lrt -lrtdk -lnative test_xeno_sem.c

To create three communicating processes on different cores, I execute:

terminal1>  xtest 1 1
terminal2>  xtest 2 1
terminal3>  xtest 3 1


To create three communicating processes on ONE core, I execute:

terminal1>  xtest 1 0
terminal2>  xtest 2 0
terminal3>  xtest 3 0


6) I haven't tested the other commits yet --  this comes next. But maybe the information above already tells you all you need to know.

Best wishes, and, as always, a thousand thanks for your kind help!

-Stefan

------------------------------------------- test_xeno_sem.c ------------------------------------------------------------------------


[-- Attachment #2: test_xeno_sem.c --]
[-- Type: application/octet-stream, Size: 2636 bytes --]

#include <stdio.h>
#include <sys/mman.h>
#include <native/timer.h>
#include <native/task.h>
#include <native/mutex.h>
#include <native/heap.h>
#include <native/sem.h>
#include <pthread.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>
#include <errno.h>
#include <rtdk.h>


int
main(int argc, char**argv)

{
  int        rc;
  RT_SEM     sem1;
  RT_SEM     sem2;
  RT_SEM     sem3;
  RT_SEM    *sem_give,*sem_wait;
  char       sem1_name[]="sem1";
  char       sem2_name[]="sem2";
  char       sem3_name[]="sem3";
  long       del;
  int        multicore = 0;

  if (argc < 2) {
    printf("too few arguments\n");
    return 0;
  }
    
  if (argc > 2) {
    sscanf(argv[2],"%d",&multicore);
  }
    
  // lock all of the pages currently and pages that become
  // mapped into the address space of the process
  mlockall(MCL_CURRENT | MCL_FUTURE);

  // establish semaphores
  rc = rt_sem_bind(&sem1,sem1_name,TM_NONBLOCK);
  if (rc) {
    rc = rt_sem_create(&sem1,sem1_name,0,S_FIFO);	
    if (rc) {
      printf("Error: rt_sem_create returned %d\n",rc);
      return 0;
    }
  }

  rc = rt_sem_bind(&sem2,sem2_name,TM_NONBLOCK);
  if (rc) {
    rc = rt_sem_create(&sem2,sem2_name,0,S_FIFO);	
    if (rc) {
      printf("Error: rt_sem_create returned %d\n",rc);
      return 0;
    }
  }

  rc = rt_sem_bind(&sem3,sem3_name,TM_NONBLOCK);
  if (rc) {
    rc = rt_sem_create(&sem3,sem3_name,0,S_FIFO);	
    if (rc) {
      printf("Error: rt_sem_create returned %d\n",rc);
      return 0;
    }
  }

  // create a simple back-and-forth communication
  if (argv[1][0] == '1') {
    printf("wait for 1, give 2\n");
    sem_wait = &sem1;
    sem_give = &sem2;
    del = 1000000;
    rc=rt_task_shadow(NULL,"sem1_task",50,T_FPU|T_CPU(1*multicore));
    if (rc != 0) {
      printf("rc = %d\n",rc);
      return 0;
    }
  } else if (argv[1][0] == '2') {
    printf("wait for 2, give 3\n");
    sem_wait = &sem2;
    sem_give = &sem3;
    del = 2000000;
    rc=rt_task_shadow(NULL,"sem2_task",50,T_FPU|T_CPU(2*multicore));
    if (rc != 0) {
      printf("rc = %d\n",rc);
      return 0;
    }
  } else {
    printf("wait for 3, give 1\n");
    sem_wait = &sem3;
    sem_give = &sem1;
    del = 3000000;
    rc=rt_task_shadow(NULL,"sem3_task",50,T_FPU|T_CPU(3*multicore));
    if (rc != 0) {
      printf("rc = %d\n",rc);
      return 0;
    }
  }

  rt_sem_v(sem_give);

  while (1) {

    rc = rt_sem_p(sem_wait,TM_INFINITE);
    if (rc != 0) {
      printf("rc = %d\n",rc);
      return 0;
    }

    rt_task_sleep((RTIME) del);    

    rt_sem_v(sem_give);

  }


  rt_sem_delete(&sem1);
  rt_sem_delete(&sem2);

  return 0;

}

[-- Attachment #3: Type: text/plain, Size: 2194 bytes --]




--------------------------------------------------------------------------------------------------------------------------------------------

On Oct 16, 2010, at 1:48, Philippe Gerum wrote:

> On Fri, 2010-10-15 at 22:43 -0700, Stefan Schaal wrote:
>> Hi everybody,
>> 
>>  here is a quick first report on an issue that appeared with Xenomai 2.5.5 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue.
>> 
>> We run multiple real-time processes, synchronized by semaphores and interprocess communication using shared memory. All is cleanly implemented using the xenomai real-time functions, no mode switches. The different processes are distributed on different processors of our multi-core machine using rt_task_spawn() with the T_CPU directive. 
>> 
>> Up to version 2.5.4, this worked fine.
>> 
>> With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of running (CPU consumption goes to zero), and usually one of them hangs so badly that it cannot be killed anymore with kill -9 -- thus reboot is required.
>> 
>> The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, kernel 2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 2.6.31.4). Thus, this seems to be specific to the xenomai release 2.5.5 and higher.
>> 
>> No "dmesg" print-outs when this error occurs.
>> 
>> We will try to create a simple test program to illustrate the problem, but maybe the issue is already obvious to some of the experts on this list.
>> 
> 
> $ cat /proc/xenomai/stat
> $ cat /proc/xenomai/sched
> 
> when the threads hang would help.
> 
> Additionally, please clone the -stable repo from there:
> git://git.xenomai.org/xenomai-2.5.git
> 
> then branch+build and test from these commits:
> 
> - 6a020f5 first; if the bug does not show up anymore, check the next one
> - 5e7cfa5; if the bug is still there, try disabling
> CONFIG_XENO_OPT_PRIOCPL to test the basic system and re-check.
> 
>> Best wishes,
>> 
>> -Stefan
>> _______________________________________________
>> Xenomai-core mailing list
>> Xenomai-core@domain.hid
>> https://mail.gna.org/listinfo/xenomai-core
> 
> -- 
> Philippe.
> 
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xenomai-core] hanging in Xenomai 2.5.5
       [not found]         ` <1294242791.1828.6.camel@domain.hid>
@ 2011-01-06 22:05           ` Stefan Schaal
  2011-01-07 12:58             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Schaal @ 2011-01-06 22:05 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

Hi Philippe,

  thanks a lot for the hint. I configured my kernel from scratch, and got rid of the linux compile problems. I could thus verify that the commit you mentioned below DOES NOT have the problem I described, i.e., semaphores used by multiple processes which are running on different cores DID NOT hang anymore.

   Then, I thought I try to bisect the problem with git, and I pulled the latest version of the 2.5 repository. Interestingly, with the very latest commits, my problem has gone away. I confirmed this by switching back to Alexis' analogy branch, which I need for my development. This branch is not quite as up-to-date as the 2.5 branch, and the hanging problem still exists. I merged the analogy branch with the latest 2.5 branch, and now nothing hangs anymore.

  I guess, I stop investigating at this point, unless the problem re-apprears.

Thanks so much for you help!

Best wishes,

-Stefan



On Jan 5, 2011, at 7:53, Philippe Gerum wrote:

> On Wed, 2011-01-05 at 07:41 -0800, Stefan Schaal wrote:
>> HI Philippe,
>> 
>>  sorry, I must have mis-communicated. This was, of course, a xenomai commit that I tried, and the errors I sent you resulted when recompiling the linux kernel with this xenomai version.
>> 
> 
> Those errors are not related to Xenomai, they happen on basic linux
> code. Make sure to work from a fresh build tree, using a proper
> toolchain. It looks like something is severely broken in your build env.
> 
>> -Stefan
>> 
>> 
>> On Jan 5, 2011, at 6:07, Philippe Gerum wrote:
>> 
>>> On Sat, 2010-12-25 at 11:02 -0800, Stefan Schaal wrote:
>>>> 6a020f5
>>> 
>>> I don't see how this messages could be related to Xenomai. I was
>>> mentioning a Xenomai commit, not a linux one. You should reset to this
>>> commit:
>>> 
>>> commit 6a020f5a89955a42f1e03621ae6c63a587e9c75c
>>> Author: Philippe Gerum <rpm@xenomai.org>
>>> Date:   Sat Aug 28 13:04:45 2010 +0200
>>> 
>>>   nucleus, posix: use fast APC scheduling call
>>> 
>>> -- 
>>> Philippe.
>>> 
>>> 
>> 
> 
> -- 
> Philippe.
> 
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xenomai-core] hanging in Xenomai 2.5.5
  2011-01-06 22:05           ` Stefan Schaal
@ 2011-01-07 12:58             ` Gilles Chanteperdrix
  0 siblings, 0 replies; 5+ messages in thread
From: Gilles Chanteperdrix @ 2011-01-07 12:58 UTC (permalink / raw)
  To: Stefan Schaal; +Cc: xenomai

Stefan Schaal wrote:
> Hi Philippe,
> 
> thanks a lot for the hint. I configured my kernel from scratch, and
> got rid of the linux compile problems. I could thus verify that the
> commit you mentioned below DOES NOT have the problem I described,
> i.e., semaphores used by multiple processes which are running on
> different cores DID NOT hang anymore.
> 
> Then, I thought I try to bisect the problem with git, and I pulled
> the latest version of the 2.5 repository. Interestingly, with the
> very latest commits, my problem has gone away. I confirmed this by
> switching back to Alexis' analogy branch, which I need for my
> development. This branch is not quite as up-to-date as the 2.5
> branch, and the hanging problem still exists. I merged the analogy
> branch with the latest 2.5 branch, and now nothing hangs anymore.
> 
> I guess, I stop investigating at this point, unless the problem
> re-apprears.

2.5.6 should be out soon, which should allow you to avoid doing this.

But in the mean-time, you can probably merge the two branches, they
should be fairly orthogonal.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-01-07 12:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-16  5:43 [Xenomai-core] hanging in Xenomai 2.5.5 Stefan Schaal
2010-10-16  8:48 ` Philippe Gerum
2010-12-25 17:56   ` Stefan Schaal
     [not found]   ` <A059C858-912A-4207-A834-411C50184622@domain.hid>
     [not found]     ` <1294236456.1828.3.camel@domain.hid>
     [not found]       ` <B531163E-B46F-465A-8970-13BC59A61496@domain.hid>
     [not found]         ` <1294242791.1828.6.camel@domain.hid>
2011-01-06 22:05           ` Stefan Schaal
2011-01-07 12:58             ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.