From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <512FBFAE.1080005@xenomai.org> Date: Thu, 28 Feb 2013 21:35:58 +0100 From: Gilles Chanteperdrix MIME-Version: 1.0 References: <512FB9B5.9040709@xenomai.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai] Xenomai-forge: thread using 100% cpu load List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ronny Meeus Cc: xenomai@xenomai.org On 02/28/2013 09:30 PM, Ronny Meeus wrote: > On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix > wrote: >> On 02/28/2013 08:19 PM, Ronny Meeus wrote: >> >>> Hello >>> >>> we are using the PSOS interface of Xenomai forge, running completely >>> in user-space using the mercury code. >>> We deploy our application on different processors, one product is >>> running on PPC multicore (P4040, P4080, P4034) and another one on >>> Cavium (8 core device). >>> The Linux version we use is 2.6.32 but I would assume that this is not >>> so relevant. >>> >>> Our Xenomai application is running on one of the cores (affinity is >>> set), while the other cores are running other code. >>> >>> On both architectures we recently start to see issues that one thread >>> is consuming 100% of the core on which the application is pinned. >>> The thread that monopolizes the core is the thread internally used to >>> manage the timers, running at the highest priority. >>> The trigger for running into this behavior is currently unclear. >>> If we only start a part of the application (platform management only), >>> the issue is not observed. >>> We see this on both an old version of Xenomai and a very recent one >>> (pulled from the git repo yesterday). >>> >>> I will continue to debug this issue in the coming days and try isolate >>> the code that is triggering it, but I can use hints from the >>> community. >>> Debugging is complex since once the load starts, the debugger is not >>> reacting anymore. >>> If I put breakpoints in the functions that are called when the timer >>> expires (both oneshot and periodic), the process starts to clone >>> itself and I endup with tens of them. >>> >>> Has anybody seen an issue like this before or does somebody has some >>> hints on how to debug this problem? >> >> >> First enable the watchdog. It will send a signal to the application when >> detecting a problem, then you can use the watchdog to trigger an I-pipe >> tracer trace when the bug happens. You will probably have to increase >> the watchdog polling frequency, in order to have a meaningful trace. >> >> -- >> Gilles. > > Gilles, > > We are running completely in user-space (mercury) . cobalt also runs in user-space. > I thought that the watchdog and I-pipe tracer are only relevant when > using the cobalt code. > In case my assumption is wrong, please correct me and let me know how > to enable it. Yes, if you are using plain linux, there are even more tools to debug the problem: - you can enable RT throttling to avoid the machine lockup by the buggy thread - you can enable the kernel detection for just your case (CONFIG_LOCKUP_DETECTOR) - if you are on x86 you can use the NMI watchdog - you can use FTRACE instead of the I-pipe tracer - or you can decide to compile the kernel with CONFIG_IPIPE and CONFIG_IPIPE_TRACE to use the I-pipe tracer without Xenomai. - maybe xenomai-forge's "slackspot" tool works for mecury? -- Gilles.