* [Xenomai-core] hanging in Xenomai 2.5.5 @ 2010-10-16 5:43 Stefan Schaal 2010-10-16 8:48 ` Philippe Gerum 0 siblings, 1 reply; 5+ messages in thread From: Stefan Schaal @ 2010-10-16 5:43 UTC (permalink / raw) To: xenomai Hi everybody, here is a quick first report on an issue that appeared with Xenomai 2.5.5 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue. We run multiple real-time processes, synchronized by semaphores and interprocess communication using shared memory. All is cleanly implemented using the xenomai real-time functions, no mode switches. The different processes are distributed on different processors of our multi-core machine using rt_task_spawn() with the T_CPU directive. Up to version 2.5.4, this worked fine. With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of running (CPU consumption goes to zero), and usually one of them hangs so badly that it cannot be killed anymore with kill -9 -- thus reboot is required. The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, kernel 2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 2.6.31.4). Thus, this seems to be specific to the xenomai release 2.5.5 and higher. No "dmesg" print-outs when this error occurs. We will try to create a simple test program to illustrate the problem, but maybe the issue is already obvious to some of the experts on this list. Best wishes, -Stefan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-core] hanging in Xenomai 2.5.5 2010-10-16 5:43 [Xenomai-core] hanging in Xenomai 2.5.5 Stefan Schaal @ 2010-10-16 8:48 ` Philippe Gerum 2010-12-25 17:56 ` Stefan Schaal [not found] ` <A059C858-912A-4207-A834-411C50184622@domain.hid> 0 siblings, 2 replies; 5+ messages in thread From: Philippe Gerum @ 2010-10-16 8:48 UTC (permalink / raw) To: Stefan Schaal; +Cc: xenomai On Fri, 2010-10-15 at 22:43 -0700, Stefan Schaal wrote: > Hi everybody, > > here is a quick first report on an issue that appeared with Xenomai 2.5.5 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue. > > We run multiple real-time processes, synchronized by semaphores and interprocess communication using shared memory. All is cleanly implemented using the xenomai real-time functions, no mode switches. The different processes are distributed on different processors of our multi-core machine using rt_task_spawn() with the T_CPU directive. > > Up to version 2.5.4, this worked fine. > > With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of running (CPU consumption goes to zero), and usually one of them hangs so badly that it cannot be killed anymore with kill -9 -- thus reboot is required. > > The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, kernel 2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 2.6.31.4). Thus, this seems to be specific to the xenomai release 2.5.5 and higher. > > No "dmesg" print-outs when this error occurs. > > We will try to create a simple test program to illustrate the problem, but maybe the issue is already obvious to some of the experts on this list. > $ cat /proc/xenomai/stat $ cat /proc/xenomai/sched when the threads hang would help. Additionally, please clone the -stable repo from there: git://git.xenomai.org/xenomai-2.5.git then branch+build and test from these commits: - 6a020f5 first; if the bug does not show up anymore, check the next one - 5e7cfa5; if the bug is still there, try disabling CONFIG_XENO_OPT_PRIOCPL to test the basic system and re-check. > Best wishes, > > -Stefan > _______________________________________________ > Xenomai-core mailing list > Xenomai-core@domain.hid > https://mail.gna.org/listinfo/xenomai-core -- Philippe. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-core] hanging in Xenomai 2.5.5 2010-10-16 8:48 ` Philippe Gerum @ 2010-12-25 17:56 ` Stefan Schaal [not found] ` <A059C858-912A-4207-A834-411C50184622@domain.hid> 1 sibling, 0 replies; 5+ messages in thread From: Stefan Schaal @ 2010-12-25 17:56 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai [-- Attachment #1: Type: text/plain, Size: 4600 bytes --] Hi Phiippe, thanks so much for your replay -- it took me a moment to get back to this problem. Here are some first observations: 1) the problem only occurs when I distribute the communicating processes over multiple cores -- in Xenomai 2.5.4, this has never been a problem. 2) The /proc/xenomai/stat looks like: CPU PID MSW CSW PF STAT %CPU NAME 0 0 0 2349229 0 00500080 100.0 ROOT/0 1 0 0 20328410 0 00500080 100.0 ROOT/1 2 0 0 1040321 0 00500080 100.0 ROOT/2 3 0 0 445786 0 00500080 100.0 ROOT/3 4 0 0 71162 0 00500080 100.0 ROOT/4 5 0 0 0 0 00500080 100.0 ROOT/5 6 0 0 0 0 00500080 100.0 ROOT/6 7 0 0 0 0 00500080 100.0 ROOT/7 1 3128 0 91261 0 00300182 0.0 sem1_task 2 3166 0 90470 0 00300188 0.0 sem2_task 3 3195 0 45237 0 00300182 0.0 sem3_task 0 0 0 0 0 00000000 0.0 IRQ56: Analogy device 1 0 0 0 0 00000000 0.0 IRQ56: Analogy device 2 0 0 0 0 00000000 0.0 IRQ56: Analogy device 3 0 0 0 0 00000000 0.0 IRQ56: Analogy device 4 0 0 0 0 00000000 0.0 IRQ56: Analogy device 5 0 0 0 0 00000000 0.0 IRQ56: Analogy device 6 0 0 0 0 00000000 0.0 IRQ56: Analogy device 7 0 0 0 0 00000000 0.0 IRQ56: Analogy device 1 0 0 39326230 0 00000000 0.0 IRQ521: [timer] 2 0 0 1641532 0 00000000 0.0 IRQ521: [timer] 3 0 0 1258571 0 00000000 0.0 IRQ521: [timer] 4 0 0 722843 0 00000000 0.0 IRQ521: [timer] 5 0 0 780591 0 00000000 0.0 IRQ521: [timer] 6 0 0 764817 0 00000000 0.0 IRQ521: [timer] 7 0 0 385421 0 00000000 0.0 IRQ521: [timer] The three communicating processes are sem1_task, sem2_task, sem3_task -- they are currently hanging with 0% CPU 3) the /proc/xenomai/sched look like: CPU PID CLASS PRI TIMEOUT TIMEBASE STAT NAME 0 0 idle -1 - master R ROOT/0 1 0 idle -1 - master R ROOT/1 2 0 idle -1 - master R ROOT/2 3 0 idle -1 - master R ROOT/3 4 0 idle -1 - master R ROOT/4 5 0 idle -1 - master R ROOT/5 6 0 idle -1 - master R ROOT/6 7 0 idle -1 - master R ROOT/7 1 3128 rt 50 - master W sem1_task 2 3166 rt 50 - master R sem2_task 3 3195 rt 50 - master W sem3_task Interestingly, despite sem2_task is supposed to be running, it doesn't. 4) When I try to terminate the three processes, sem2_task would hand and I cannot kill it. Interestingly, if I start another program that does a similar semaphore communication, sem2_task is finally released. Indeed, when I start this other program, the three processes (sem1_task, sem2_task, sem3_task) start running again, until they hang again. 5) I appended the little test program I used -- it is called xtest_xeno_sem.c I compile with: gcc -o xtest -I/usr/xenomai/include -D_GNU_SOURCE -D_REENTRANT -Wall -pipe -D__XENO__ -lnative -L/usr/xenomai/lib -lxenomai -lpthread -lrt -lrtdk -lnative test_xeno_sem.c To create three communicating processes on different cores, I execute: terminal1> xtest 1 1 terminal2> xtest 2 1 terminal3> xtest 3 1 To create three communicating processes on ONE core, I execute: terminal1> xtest 1 0 terminal2> xtest 2 0 terminal3> xtest 3 0 6) I haven't tested the other commits yet -- this comes next. But maybe the information above already tells you all you need to know. Best wishes, and, as always, a thousand thanks for your kind help! -Stefan ------------------------------------------- test_xeno_sem.c ------------------------------------------------------------------------ [-- Attachment #2: test_xeno_sem.c --] [-- Type: application/octet-stream, Size: 2636 bytes --] #include <stdio.h> #include <sys/mman.h> #include <native/timer.h> #include <native/task.h> #include <native/mutex.h> #include <native/heap.h> #include <native/sem.h> #include <pthread.h> #include <fcntl.h> #include <unistd.h> #include <time.h> #include <errno.h> #include <rtdk.h> int main(int argc, char**argv) { int rc; RT_SEM sem1; RT_SEM sem2; RT_SEM sem3; RT_SEM *sem_give,*sem_wait; char sem1_name[]="sem1"; char sem2_name[]="sem2"; char sem3_name[]="sem3"; long del; int multicore = 0; if (argc < 2) { printf("too few arguments\n"); return 0; } if (argc > 2) { sscanf(argv[2],"%d",&multicore); } // lock all of the pages currently and pages that become // mapped into the address space of the process mlockall(MCL_CURRENT | MCL_FUTURE); // establish semaphores rc = rt_sem_bind(&sem1,sem1_name,TM_NONBLOCK); if (rc) { rc = rt_sem_create(&sem1,sem1_name,0,S_FIFO); if (rc) { printf("Error: rt_sem_create returned %d\n",rc); return 0; } } rc = rt_sem_bind(&sem2,sem2_name,TM_NONBLOCK); if (rc) { rc = rt_sem_create(&sem2,sem2_name,0,S_FIFO); if (rc) { printf("Error: rt_sem_create returned %d\n",rc); return 0; } } rc = rt_sem_bind(&sem3,sem3_name,TM_NONBLOCK); if (rc) { rc = rt_sem_create(&sem3,sem3_name,0,S_FIFO); if (rc) { printf("Error: rt_sem_create returned %d\n",rc); return 0; } } // create a simple back-and-forth communication if (argv[1][0] == '1') { printf("wait for 1, give 2\n"); sem_wait = &sem1; sem_give = &sem2; del = 1000000; rc=rt_task_shadow(NULL,"sem1_task",50,T_FPU|T_CPU(1*multicore)); if (rc != 0) { printf("rc = %d\n",rc); return 0; } } else if (argv[1][0] == '2') { printf("wait for 2, give 3\n"); sem_wait = &sem2; sem_give = &sem3; del = 2000000; rc=rt_task_shadow(NULL,"sem2_task",50,T_FPU|T_CPU(2*multicore)); if (rc != 0) { printf("rc = %d\n",rc); return 0; } } else { printf("wait for 3, give 1\n"); sem_wait = &sem3; sem_give = &sem1; del = 3000000; rc=rt_task_shadow(NULL,"sem3_task",50,T_FPU|T_CPU(3*multicore)); if (rc != 0) { printf("rc = %d\n",rc); return 0; } } rt_sem_v(sem_give); while (1) { rc = rt_sem_p(sem_wait,TM_INFINITE); if (rc != 0) { printf("rc = %d\n",rc); return 0; } rt_task_sleep((RTIME) del); rt_sem_v(sem_give); } rt_sem_delete(&sem1); rt_sem_delete(&sem2); return 0; } [-- Attachment #3: Type: text/plain, Size: 2194 bytes --] -------------------------------------------------------------------------------------------------------------------------------------------- On Oct 16, 2010, at 1:48, Philippe Gerum wrote: > On Fri, 2010-10-15 at 22:43 -0700, Stefan Schaal wrote: >> Hi everybody, >> >> here is a quick first report on an issue that appeared with Xenomai 2.5.5 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue. >> >> We run multiple real-time processes, synchronized by semaphores and interprocess communication using shared memory. All is cleanly implemented using the xenomai real-time functions, no mode switches. The different processes are distributed on different processors of our multi-core machine using rt_task_spawn() with the T_CPU directive. >> >> Up to version 2.5.4, this worked fine. >> >> With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of running (CPU consumption goes to zero), and usually one of them hangs so badly that it cannot be killed anymore with kill -9 -- thus reboot is required. >> >> The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, kernel 2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 2.6.31.4). Thus, this seems to be specific to the xenomai release 2.5.5 and higher. >> >> No "dmesg" print-outs when this error occurs. >> >> We will try to create a simple test program to illustrate the problem, but maybe the issue is already obvious to some of the experts on this list. >> > > $ cat /proc/xenomai/stat > $ cat /proc/xenomai/sched > > when the threads hang would help. > > Additionally, please clone the -stable repo from there: > git://git.xenomai.org/xenomai-2.5.git > > then branch+build and test from these commits: > > - 6a020f5 first; if the bug does not show up anymore, check the next one > - 5e7cfa5; if the bug is still there, try disabling > CONFIG_XENO_OPT_PRIOCPL to test the basic system and re-check. > >> Best wishes, >> >> -Stefan >> _______________________________________________ >> Xenomai-core mailing list >> Xenomai-core@domain.hid >> https://mail.gna.org/listinfo/xenomai-core > > -- > Philippe. > > ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <A059C858-912A-4207-A834-411C50184622@domain.hid>]
[parent not found: <1294236456.1828.3.camel@domain.hid>]
[parent not found: <B531163E-B46F-465A-8970-13BC59A61496@domain.hid>]
[parent not found: <1294242791.1828.6.camel@domain.hid>]
* Re: [Xenomai-core] hanging in Xenomai 2.5.5 [not found] ` <1294242791.1828.6.camel@domain.hid> @ 2011-01-06 22:05 ` Stefan Schaal 2011-01-07 12:58 ` Gilles Chanteperdrix 0 siblings, 1 reply; 5+ messages in thread From: Stefan Schaal @ 2011-01-06 22:05 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai Hi Philippe, thanks a lot for the hint. I configured my kernel from scratch, and got rid of the linux compile problems. I could thus verify that the commit you mentioned below DOES NOT have the problem I described, i.e., semaphores used by multiple processes which are running on different cores DID NOT hang anymore. Then, I thought I try to bisect the problem with git, and I pulled the latest version of the 2.5 repository. Interestingly, with the very latest commits, my problem has gone away. I confirmed this by switching back to Alexis' analogy branch, which I need for my development. This branch is not quite as up-to-date as the 2.5 branch, and the hanging problem still exists. I merged the analogy branch with the latest 2.5 branch, and now nothing hangs anymore. I guess, I stop investigating at this point, unless the problem re-apprears. Thanks so much for you help! Best wishes, -Stefan On Jan 5, 2011, at 7:53, Philippe Gerum wrote: > On Wed, 2011-01-05 at 07:41 -0800, Stefan Schaal wrote: >> HI Philippe, >> >> sorry, I must have mis-communicated. This was, of course, a xenomai commit that I tried, and the errors I sent you resulted when recompiling the linux kernel with this xenomai version. >> > > Those errors are not related to Xenomai, they happen on basic linux > code. Make sure to work from a fresh build tree, using a proper > toolchain. It looks like something is severely broken in your build env. > >> -Stefan >> >> >> On Jan 5, 2011, at 6:07, Philippe Gerum wrote: >> >>> On Sat, 2010-12-25 at 11:02 -0800, Stefan Schaal wrote: >>>> 6a020f5 >>> >>> I don't see how this messages could be related to Xenomai. I was >>> mentioning a Xenomai commit, not a linux one. You should reset to this >>> commit: >>> >>> commit 6a020f5a89955a42f1e03621ae6c63a587e9c75c >>> Author: Philippe Gerum <rpm@xenomai.org> >>> Date: Sat Aug 28 13:04:45 2010 +0200 >>> >>> nucleus, posix: use fast APC scheduling call >>> >>> -- >>> Philippe. >>> >>> >> > > -- > Philippe. > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-core] hanging in Xenomai 2.5.5 2011-01-06 22:05 ` Stefan Schaal @ 2011-01-07 12:58 ` Gilles Chanteperdrix 0 siblings, 0 replies; 5+ messages in thread From: Gilles Chanteperdrix @ 2011-01-07 12:58 UTC (permalink / raw) To: Stefan Schaal; +Cc: xenomai Stefan Schaal wrote: > Hi Philippe, > > thanks a lot for the hint. I configured my kernel from scratch, and > got rid of the linux compile problems. I could thus verify that the > commit you mentioned below DOES NOT have the problem I described, > i.e., semaphores used by multiple processes which are running on > different cores DID NOT hang anymore. > > Then, I thought I try to bisect the problem with git, and I pulled > the latest version of the 2.5 repository. Interestingly, with the > very latest commits, my problem has gone away. I confirmed this by > switching back to Alexis' analogy branch, which I need for my > development. This branch is not quite as up-to-date as the 2.5 > branch, and the hanging problem still exists. I merged the analogy > branch with the latest 2.5 branch, and now nothing hangs anymore. > > I guess, I stop investigating at this point, unless the problem > re-apprears. 2.5.6 should be out soon, which should allow you to avoid doing this. But in the mean-time, you can probably merge the two branches, they should be fairly orthogonal. -- Gilles. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-01-07 12:58 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-16 5:43 [Xenomai-core] hanging in Xenomai 2.5.5 Stefan Schaal
2010-10-16 8:48 ` Philippe Gerum
2010-12-25 17:56 ` Stefan Schaal
[not found] ` <A059C858-912A-4207-A834-411C50184622@domain.hid>
[not found] ` <1294236456.1828.3.camel@domain.hid>
[not found] ` <B531163E-B46F-465A-8970-13BC59A61496@domain.hid>
[not found] ` <1294242791.1828.6.camel@domain.hid>
2011-01-06 22:05 ` Stefan Schaal
2011-01-07 12:58 ` Gilles Chanteperdrix
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.