* [Xenomai-core] hanging in Xenomai 2.5.5
@ 2010-10-16 5:43 Stefan Schaal
2010-10-16 8:48 ` Philippe Gerum
0 siblings, 1 reply; 5+ messages in thread
From: Stefan Schaal @ 2010-10-16 5:43 UTC (permalink / raw)
To: xenomai
Hi everybody,
here is a quick first report on an issue that appeared with Xenomai 2.5.5 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue.
We run multiple real-time processes, synchronized by semaphores and interprocess communication using shared memory. All is cleanly implemented using the xenomai real-time functions, no mode switches. The different processes are distributed on different processors of our multi-core machine using rt_task_spawn() with the T_CPU directive.
Up to version 2.5.4, this worked fine.
With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of running (CPU consumption goes to zero), and usually one of them hangs so badly that it cannot be killed anymore with kill -9 -- thus reboot is required.
The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, kernel 2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 2.6.31.4). Thus, this seems to be specific to the xenomai release 2.5.5 and higher.
No "dmesg" print-outs when this error occurs.
We will try to create a simple test program to illustrate the problem, but maybe the issue is already obvious to some of the experts on this list.
Best wishes,
-Stefan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-core] hanging in Xenomai 2.5.5
2010-10-16 5:43 [Xenomai-core] hanging in Xenomai 2.5.5 Stefan Schaal
@ 2010-10-16 8:48 ` Philippe Gerum
2010-12-25 17:56 ` Stefan Schaal
[not found] ` <A059C858-912A-4207-A834-411C50184622@domain.hid>
0 siblings, 2 replies; 5+ messages in thread
From: Philippe Gerum @ 2010-10-16 8:48 UTC (permalink / raw)
To: Stefan Schaal; +Cc: xenomai
On Fri, 2010-10-15 at 22:43 -0700, Stefan Schaal wrote:
> Hi everybody,
>
> here is a quick first report on an issue that appeared with Xenomai 2.5.5 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue.
>
> We run multiple real-time processes, synchronized by semaphores and interprocess communication using shared memory. All is cleanly implemented using the xenomai real-time functions, no mode switches. The different processes are distributed on different processors of our multi-core machine using rt_task_spawn() with the T_CPU directive.
>
> Up to version 2.5.4, this worked fine.
>
> With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of running (CPU consumption goes to zero), and usually one of them hangs so badly that it cannot be killed anymore with kill -9 -- thus reboot is required.
>
> The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, kernel 2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 2.6.31.4). Thus, this seems to be specific to the xenomai release 2.5.5 and higher.
>
> No "dmesg" print-outs when this error occurs.
>
> We will try to create a simple test program to illustrate the problem, but maybe the issue is already obvious to some of the experts on this list.
>
$ cat /proc/xenomai/stat
$ cat /proc/xenomai/sched
when the threads hang would help.
Additionally, please clone the -stable repo from there:
git://git.xenomai.org/xenomai-2.5.git
then branch+build and test from these commits:
- 6a020f5 first; if the bug does not show up anymore, check the next one
- 5e7cfa5; if the bug is still there, try disabling
CONFIG_XENO_OPT_PRIOCPL to test the basic system and re-check.
> Best wishes,
>
> -Stefan
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core
--
Philippe.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-core] hanging in Xenomai 2.5.5
2010-10-16 8:48 ` Philippe Gerum
@ 2010-12-25 17:56 ` Stefan Schaal
[not found] ` <A059C858-912A-4207-A834-411C50184622@domain.hid>
1 sibling, 0 replies; 5+ messages in thread
From: Stefan Schaal @ 2010-12-25 17:56 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
[-- Attachment #1: Type: text/plain, Size: 4600 bytes --]
Hi Phiippe,
thanks so much for your replay -- it took me a moment to get back to this problem. Here are some first observations:
1) the problem only occurs when I distribute the communicating processes over multiple cores -- in Xenomai 2.5.4, this has never been a problem.
2) The /proc/xenomai/stat looks like:
CPU PID MSW CSW PF STAT %CPU NAME
0 0 0 2349229 0 00500080 100.0 ROOT/0
1 0 0 20328410 0 00500080 100.0 ROOT/1
2 0 0 1040321 0 00500080 100.0 ROOT/2
3 0 0 445786 0 00500080 100.0 ROOT/3
4 0 0 71162 0 00500080 100.0 ROOT/4
5 0 0 0 0 00500080 100.0 ROOT/5
6 0 0 0 0 00500080 100.0 ROOT/6
7 0 0 0 0 00500080 100.0 ROOT/7
1 3128 0 91261 0 00300182 0.0 sem1_task
2 3166 0 90470 0 00300188 0.0 sem2_task
3 3195 0 45237 0 00300182 0.0 sem3_task
0 0 0 0 0 00000000 0.0 IRQ56: Analogy device
1 0 0 0 0 00000000 0.0 IRQ56: Analogy device
2 0 0 0 0 00000000 0.0 IRQ56: Analogy device
3 0 0 0 0 00000000 0.0 IRQ56: Analogy device
4 0 0 0 0 00000000 0.0 IRQ56: Analogy device
5 0 0 0 0 00000000 0.0 IRQ56: Analogy device
6 0 0 0 0 00000000 0.0 IRQ56: Analogy device
7 0 0 0 0 00000000 0.0 IRQ56: Analogy device
1 0 0 39326230 0 00000000 0.0 IRQ521: [timer]
2 0 0 1641532 0 00000000 0.0 IRQ521: [timer]
3 0 0 1258571 0 00000000 0.0 IRQ521: [timer]
4 0 0 722843 0 00000000 0.0 IRQ521: [timer]
5 0 0 780591 0 00000000 0.0 IRQ521: [timer]
6 0 0 764817 0 00000000 0.0 IRQ521: [timer]
7 0 0 385421 0 00000000 0.0 IRQ521: [timer]
The three communicating processes are sem1_task, sem2_task, sem3_task -- they are currently hanging with 0% CPU
3) the /proc/xenomai/sched look like:
CPU PID CLASS PRI TIMEOUT TIMEBASE STAT NAME
0 0 idle -1 - master R ROOT/0
1 0 idle -1 - master R ROOT/1
2 0 idle -1 - master R ROOT/2
3 0 idle -1 - master R ROOT/3
4 0 idle -1 - master R ROOT/4
5 0 idle -1 - master R ROOT/5
6 0 idle -1 - master R ROOT/6
7 0 idle -1 - master R ROOT/7
1 3128 rt 50 - master W sem1_task
2 3166 rt 50 - master R sem2_task
3 3195 rt 50 - master W sem3_task
Interestingly, despite sem2_task is supposed to be running, it doesn't.
4) When I try to terminate the three processes, sem2_task would hand and I cannot kill it. Interestingly, if I start another program that does a similar semaphore communication, sem2_task is finally released. Indeed, when I start this other program, the three processes (sem1_task, sem2_task, sem3_task) start running again, until they hang again.
5) I appended the little test program I used -- it is called xtest_xeno_sem.c
I compile with:
gcc -o xtest -I/usr/xenomai/include -D_GNU_SOURCE -D_REENTRANT -Wall -pipe -D__XENO__ -lnative -L/usr/xenomai/lib -lxenomai -lpthread -lrt -lrtdk -lnative test_xeno_sem.c
To create three communicating processes on different cores, I execute:
terminal1> xtest 1 1
terminal2> xtest 2 1
terminal3> xtest 3 1
To create three communicating processes on ONE core, I execute:
terminal1> xtest 1 0
terminal2> xtest 2 0
terminal3> xtest 3 0
6) I haven't tested the other commits yet -- this comes next. But maybe the information above already tells you all you need to know.
Best wishes, and, as always, a thousand thanks for your kind help!
-Stefan
------------------------------------------- test_xeno_sem.c ------------------------------------------------------------------------
[-- Attachment #2: test_xeno_sem.c --]
[-- Type: application/octet-stream, Size: 2636 bytes --]
#include <stdio.h>
#include <sys/mman.h>
#include <native/timer.h>
#include <native/task.h>
#include <native/mutex.h>
#include <native/heap.h>
#include <native/sem.h>
#include <pthread.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>
#include <errno.h>
#include <rtdk.h>
int
main(int argc, char**argv)
{
int rc;
RT_SEM sem1;
RT_SEM sem2;
RT_SEM sem3;
RT_SEM *sem_give,*sem_wait;
char sem1_name[]="sem1";
char sem2_name[]="sem2";
char sem3_name[]="sem3";
long del;
int multicore = 0;
if (argc < 2) {
printf("too few arguments\n");
return 0;
}
if (argc > 2) {
sscanf(argv[2],"%d",&multicore);
}
// lock all of the pages currently and pages that become
// mapped into the address space of the process
mlockall(MCL_CURRENT | MCL_FUTURE);
// establish semaphores
rc = rt_sem_bind(&sem1,sem1_name,TM_NONBLOCK);
if (rc) {
rc = rt_sem_create(&sem1,sem1_name,0,S_FIFO);
if (rc) {
printf("Error: rt_sem_create returned %d\n",rc);
return 0;
}
}
rc = rt_sem_bind(&sem2,sem2_name,TM_NONBLOCK);
if (rc) {
rc = rt_sem_create(&sem2,sem2_name,0,S_FIFO);
if (rc) {
printf("Error: rt_sem_create returned %d\n",rc);
return 0;
}
}
rc = rt_sem_bind(&sem3,sem3_name,TM_NONBLOCK);
if (rc) {
rc = rt_sem_create(&sem3,sem3_name,0,S_FIFO);
if (rc) {
printf("Error: rt_sem_create returned %d\n",rc);
return 0;
}
}
// create a simple back-and-forth communication
if (argv[1][0] == '1') {
printf("wait for 1, give 2\n");
sem_wait = &sem1;
sem_give = &sem2;
del = 1000000;
rc=rt_task_shadow(NULL,"sem1_task",50,T_FPU|T_CPU(1*multicore));
if (rc != 0) {
printf("rc = %d\n",rc);
return 0;
}
} else if (argv[1][0] == '2') {
printf("wait for 2, give 3\n");
sem_wait = &sem2;
sem_give = &sem3;
del = 2000000;
rc=rt_task_shadow(NULL,"sem2_task",50,T_FPU|T_CPU(2*multicore));
if (rc != 0) {
printf("rc = %d\n",rc);
return 0;
}
} else {
printf("wait for 3, give 1\n");
sem_wait = &sem3;
sem_give = &sem1;
del = 3000000;
rc=rt_task_shadow(NULL,"sem3_task",50,T_FPU|T_CPU(3*multicore));
if (rc != 0) {
printf("rc = %d\n",rc);
return 0;
}
}
rt_sem_v(sem_give);
while (1) {
rc = rt_sem_p(sem_wait,TM_INFINITE);
if (rc != 0) {
printf("rc = %d\n",rc);
return 0;
}
rt_task_sleep((RTIME) del);
rt_sem_v(sem_give);
}
rt_sem_delete(&sem1);
rt_sem_delete(&sem2);
return 0;
}
[-- Attachment #3: Type: text/plain, Size: 2194 bytes --]
--------------------------------------------------------------------------------------------------------------------------------------------
On Oct 16, 2010, at 1:48, Philippe Gerum wrote:
> On Fri, 2010-10-15 at 22:43 -0700, Stefan Schaal wrote:
>> Hi everybody,
>>
>> here is a quick first report on an issue that appeared with Xenomai 2.5.5 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue.
>>
>> We run multiple real-time processes, synchronized by semaphores and interprocess communication using shared memory. All is cleanly implemented using the xenomai real-time functions, no mode switches. The different processes are distributed on different processors of our multi-core machine using rt_task_spawn() with the T_CPU directive.
>>
>> Up to version 2.5.4, this worked fine.
>>
>> With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of running (CPU consumption goes to zero), and usually one of them hangs so badly that it cannot be killed anymore with kill -9 -- thus reboot is required.
>>
>> The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, kernel 2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 2.6.31.4). Thus, this seems to be specific to the xenomai release 2.5.5 and higher.
>>
>> No "dmesg" print-outs when this error occurs.
>>
>> We will try to create a simple test program to illustrate the problem, but maybe the issue is already obvious to some of the experts on this list.
>>
>
> $ cat /proc/xenomai/stat
> $ cat /proc/xenomai/sched
>
> when the threads hang would help.
>
> Additionally, please clone the -stable repo from there:
> git://git.xenomai.org/xenomai-2.5.git
>
> then branch+build and test from these commits:
>
> - 6a020f5 first; if the bug does not show up anymore, check the next one
> - 5e7cfa5; if the bug is still there, try disabling
> CONFIG_XENO_OPT_PRIOCPL to test the basic system and re-check.
>
>> Best wishes,
>>
>> -Stefan
>> _______________________________________________
>> Xenomai-core mailing list
>> Xenomai-core@domain.hid
>> https://mail.gna.org/listinfo/xenomai-core
>
> --
> Philippe.
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-core] hanging in Xenomai 2.5.5
[not found] ` <1294242791.1828.6.camel@domain.hid>
@ 2011-01-06 22:05 ` Stefan Schaal
2011-01-07 12:58 ` Gilles Chanteperdrix
0 siblings, 1 reply; 5+ messages in thread
From: Stefan Schaal @ 2011-01-06 22:05 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
Hi Philippe,
thanks a lot for the hint. I configured my kernel from scratch, and got rid of the linux compile problems. I could thus verify that the commit you mentioned below DOES NOT have the problem I described, i.e., semaphores used by multiple processes which are running on different cores DID NOT hang anymore.
Then, I thought I try to bisect the problem with git, and I pulled the latest version of the 2.5 repository. Interestingly, with the very latest commits, my problem has gone away. I confirmed this by switching back to Alexis' analogy branch, which I need for my development. This branch is not quite as up-to-date as the 2.5 branch, and the hanging problem still exists. I merged the analogy branch with the latest 2.5 branch, and now nothing hangs anymore.
I guess, I stop investigating at this point, unless the problem re-apprears.
Thanks so much for you help!
Best wishes,
-Stefan
On Jan 5, 2011, at 7:53, Philippe Gerum wrote:
> On Wed, 2011-01-05 at 07:41 -0800, Stefan Schaal wrote:
>> HI Philippe,
>>
>> sorry, I must have mis-communicated. This was, of course, a xenomai commit that I tried, and the errors I sent you resulted when recompiling the linux kernel with this xenomai version.
>>
>
> Those errors are not related to Xenomai, they happen on basic linux
> code. Make sure to work from a fresh build tree, using a proper
> toolchain. It looks like something is severely broken in your build env.
>
>> -Stefan
>>
>>
>> On Jan 5, 2011, at 6:07, Philippe Gerum wrote:
>>
>>> On Sat, 2010-12-25 at 11:02 -0800, Stefan Schaal wrote:
>>>> 6a020f5
>>>
>>> I don't see how this messages could be related to Xenomai. I was
>>> mentioning a Xenomai commit, not a linux one. You should reset to this
>>> commit:
>>>
>>> commit 6a020f5a89955a42f1e03621ae6c63a587e9c75c
>>> Author: Philippe Gerum <rpm@xenomai.org>
>>> Date: Sat Aug 28 13:04:45 2010 +0200
>>>
>>> nucleus, posix: use fast APC scheduling call
>>>
>>> --
>>> Philippe.
>>>
>>>
>>
>
> --
> Philippe.
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-core] hanging in Xenomai 2.5.5
2011-01-06 22:05 ` Stefan Schaal
@ 2011-01-07 12:58 ` Gilles Chanteperdrix
0 siblings, 0 replies; 5+ messages in thread
From: Gilles Chanteperdrix @ 2011-01-07 12:58 UTC (permalink / raw)
To: Stefan Schaal; +Cc: xenomai
Stefan Schaal wrote:
> Hi Philippe,
>
> thanks a lot for the hint. I configured my kernel from scratch, and
> got rid of the linux compile problems. I could thus verify that the
> commit you mentioned below DOES NOT have the problem I described,
> i.e., semaphores used by multiple processes which are running on
> different cores DID NOT hang anymore.
>
> Then, I thought I try to bisect the problem with git, and I pulled
> the latest version of the 2.5 repository. Interestingly, with the
> very latest commits, my problem has gone away. I confirmed this by
> switching back to Alexis' analogy branch, which I need for my
> development. This branch is not quite as up-to-date as the 2.5
> branch, and the hanging problem still exists. I merged the analogy
> branch with the latest 2.5 branch, and now nothing hangs anymore.
>
> I guess, I stop investigating at this point, unless the problem
> re-apprears.
2.5.6 should be out soon, which should allow you to avoid doing this.
But in the mean-time, you can probably merge the two branches, they
should be fairly orthogonal.
--
Gilles.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-01-07 12:58 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-16 5:43 [Xenomai-core] hanging in Xenomai 2.5.5 Stefan Schaal
2010-10-16 8:48 ` Philippe Gerum
2010-12-25 17:56 ` Stefan Schaal
[not found] ` <A059C858-912A-4207-A834-411C50184622@domain.hid>
[not found] ` <1294236456.1828.3.camel@domain.hid>
[not found] ` <B531163E-B46F-465A-8970-13BC59A61496@domain.hid>
[not found] ` <1294242791.1828.6.camel@domain.hid>
2011-01-06 22:05 ` Stefan Schaal
2011-01-07 12:58 ` Gilles Chanteperdrix
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.