From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lukasz Majewski Subject: Re: [linux-2.6.26.8-rt14] RT Page Fault. Date: Mon, 23 Feb 2009 00:01:04 +0100 Message-ID: <49A1D930.4010600@gmail.com> References: <49A1BA49.7060308@gmail.com> <3efb10970902221329h89b18cbl60c8d3365009e2d9@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-rt-users@vger.kernel.org To: Remy Bohmer Return-path: Received: from mail-bw0-f161.google.com ([209.85.218.161]:37371 "EHLO mail-bw0-f161.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753601AbZBVXBI (ORCPT ); Sun, 22 Feb 2009 18:01:08 -0500 Received: by bwz5 with SMTP id 5so4253985bwz.13 for ; Sun, 22 Feb 2009 15:01:06 -0800 (PST) In-Reply-To: <3efb10970902221329h89b18cbl60c8d3365009e2d9@mail.gmail.com> Sender: linux-rt-users-owner@vger.kernel.org List-ID: Hello, Thank you for this extremely fast response. :-) I haven't try to run my application from RAM disk (as suppose you suggest building initramrd or initramfs). It's inconvenient, for development, since size of such init ram disk is limited. I will try it anyway. Now some explanation about uClibc. I know that this C library lacks of many features (for instance NTPL support for ARM). Unfortunately, for ARM architecture, there is nice utility -> buildroot for building toolchain and root filesystems. I'm not using x86 as target architecture, so cross compiling toolchain without errors is success as it is :-). I've tried pxtdist tool from pengutronix.de , but I cannot build proper toolchain without errors for my Atmel's ARM9 device (ARMv5tej arch to be precise). Or saying it in another way -> I gave up after spending some time and moved back toward working uClibc. I've build some toolchains, but during build I've encountered some errors and despite of that it, built arm-linux-gcc which worked, I wasn't sure if glibc is not cracked in some way. On my first post I haven't mention that I was checking rusage struct for minor and major page faults. In my program I checked it twice: first before calling mlockall() - on the beginning of the program and after munlockall() when program ends either when delay occurs or in normal termination. In both cases I've got the same number of minor and major page faults. So it looks that no page fault appears. I've checked, and it turns out that uClibc 0.9.30 is not supporting PI (priority inversion) and futexes. This may cause my bug ,since I'm heavily using read/write on /dev/ files (read/write to 4 differen files). There's a lot of space for priority inversion in my code :-/. I was writing this code with holly faith that "in some way" priority inversion is avoided. I assumed, that access to some device (resource in this case) has some mechanism to inheritance the priority. Is looks,that the only solution for me is to build toolchain with glibc at least 2.6 and gcc 4.+ with full support of Priority Inheritance, futexses and "clock_" set of functions. As I mentioned it's a bit tricky for ARM (especialy with proper ABI and software floating point support (msoft-float)). But it looks as the only feasible solution. Thanks for advice. Regards, Lukasz Remy Bohmer wrote: > Hello Lukasz, > > >> I'm also using NFS to mount root file system from my host x86 ubuntu PC. >> > > Have you tried already running from a ram-disk? > > >> When I start my application it runs for some time and ends as expected. It >> seems that everything is OK. Static schedule is not violated. Unfortunately, >> after running this application for couple of times (6 to 10) I can see that >> static schedule is violated(delayed in execution) for about 2-4 seconds. >> Application is running for 1-2 seconds as expected and then crashes(I mean >> exits with static schedule delay of 2-4 seconds). It looks like page fault, >> > > You can trace the number of page-faults during run, by means of > getrusage(), see rt-wiki. > 2-4 seconds sounds quite long for page fault handling to me (unless > you are using page/swap files) > > >> but in my main() I've add mlockall() as writen in the examples from rt.wiki. >> Moreover I've prevent stack as written in "square_wave example". Before my >> application exits I'm calling munlockall(). When I log via ssh to my >> embedded system and start top,I cannot see that I've got some memory leaks >> or zombi processes during run of my RT application. >> >> May it be possible that by some chance some global variable is not locked in >> the memory? What is the "scope" of mlockall? Is it only valid in one .o >> > > mlockall() is somewhat tricky. It locks all allocated data pages (and > future pages, if specified) in to RAM, but IIRC code segments are not > forced to be loaded into RAM, but only code segments that are loaded > once, will be locked. So, in theory, there could be pages still on the > NFS share that are not loaded when the problem arises. So, this could > be the problem you see, but it would not be the first suspect I would > look for. > > >> I'd appreciate any hints/comments what can cause this bug. >> > > I read you use uClibc, the last time I looked at it (quite some time > ago), it lacked support for priority inheritance mutexes... Aren't you > running in a mutex priority inversion? > > Or priority inversion related to other interrupt threads? You run at > prio 71, if you leave the network, or block device > softirqs/irq-threads on 50, you could have a priority inversion on > this level as well. This would be my prime suspect... > > >> I was trying to >> use strace and gdb to fix this problem, but this tools are to slow and they >> cause violation of my cyclic static schedule. >> > > No ETM trace available? Really nice to have in such cases... > > > Kind Regards, > > Remy > >