From mboxrd@z Thu Jan 1 00:00:00 1970 References: <5AC116C9.6040809@freyder.net> <15a83f1d-7785-11c9-5f64-c09cb035ff10@xenomai.org> <5AC24433.30107@freyder.net> <6cacabd8-af4c-27e0-1b62-34ce73d7b933@xenomai.org> <5AC25632.90608@freyder.net> <7f51755d-8a54-c1a8-9ee4-a3014e03b3df@xenomai.org> <5ACA9F4C.8090902@freyder.net> <9638da9b-dcf7-8fac-8ef7-3b54d9f10316@xenomai.org> From: Steve Freyder Message-ID: <5ACF7EE1.7020404@freyder.net> Date: Thu, 12 Apr 2018 10:44:33 -0500 MIME-Version: 1.0 In-Reply-To: <9638da9b-dcf7-8fac-8ef7-3b54d9f10316@xenomai.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue? List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Philippe Gerum , "xenomai@xenomai.org" On 4/12/2018 5:23 AM, Philippe Gerum wrote: > On 04/12/2018 11:31 AM, Philippe Gerum wrote: >> On 04/09/2018 01:01 AM, Steve Freyder wrote: >>> On 4/2/2018 11:51 AM, Philippe Gerum wrote: >>>> On 04/02/2018 06:11 PM, Steve Freyder wrote: >>>>> On 4/2/2018 10:20 AM, Philippe Gerum wrote: >>>>>> On 04/02/2018 04:54 PM, Steve Freyder wrote: >>>>>>> On 4/2/2018 8:41 AM, Philippe Gerum wrote: >>>>>>>> On 04/01/2018 07:28 PM, Steve Freyder wrote: >>>>>>>>> Greetings again. >>>>>>>>> >>>>>>>>> As I understand it, for each rt_queue there's supposed to be a >>>>>>>>> "status file" located in the fuse filesystem underneath the >>>>>>>>> "/run/xenomai/user/session/pid/alchemy/queues" directory, with >>>>>>>>> the file name being the queue name. This used to contain very >>>>>>>>> useful info about queue status, message counts, etc. I don't know >>>>>>>>> when it broke or whether it's something I'm doing wrong but I'm >>>>>>>>> now getting a "memory exhausted" message on the console when I >>>>>>>>> attempt to do a "cat" on the status file. >>>>>>>>> >>>>>>>>> Here's a small C program that just creates a queue, and then does >>>>>>>>> a pause to hold the accessor count non-zero. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> The resulting output (logged in via the system console): >>>>>>>>> >>>>>>>>> # sh qtest.sh >>>>>>>>> + sleep 1 >>>>>>>>> + ./qc --mem-pool-size=64M --session=mysession foo >>>>>>>>> + find /run/xenomai >>>>>>>>> /run/xenomai >>>>>>>>> /run/xenomai/root >>>>>>>>> /run/xenomai/root/mysession >>>>>>>>> /run/xenomai/root/mysession/821 >>>>>>>>> /run/xenomai/root/mysession/821/alchemy >>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks >>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821] >>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues >>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues/foo >>>>>>>>> /run/xenomai/root/mysession/system >>>>>>>>> /run/xenomai/root/mysession/system/threads >>>>>>>>> /run/xenomai/root/mysession/system/heaps >>>>>>>>> /run/xenomai/root/mysession/system/version >>>>>>>>> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo' >>>>>>>>> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo >>>>>>>>> memory exhausted >>>>>>>>> >>>>>>>>> At this point, it hangs, although SIGINT usually terminates it. >>>>>>>>> >>>>>>>>> I've seen some cases where SIGINT won't terminate it, and a >>>>>>>>> reboot is >>>>>>>>> required to clean things up. I see this message appears to be >>>>>>>>> logged >>>>>>>>> in the obstack error handler. I don't think I'm running out of >>>>>>>>> memory, >>>>>>>>> which makes me think "heap corruption". Not much of an analysis! >>>>>>>>> I did >>>>>>>>> try varying queue sizes and max message counts - no change. >>>>>>>>> >>>>>>>> I can't reproduce this. I would suspect a rampant memory corruption >>>>>>>> too, >>>>>>>> although running the test code over valgrind (mercury build) did not >>>>>>>> reveal any issue. >>>>>>>> >>>>>>>> - which Xenomai version are you using? >>>>>>>> - cobalt / mercury ? >>>>>>>> - do you enable the shared heap when configuring ? (--enable-pshared) >>>>>>>> >>>>>>> I'm using Cobalt. uname -a reports: >>>>>>> >>>>>>> Linux sdftest 4.1.18_C01571-15S00-00.000.zimg+83fdace666 #2 SMP Fri >>>>>>> Mar >>>>>>> 9 11:07:52 CST 2018 armv7l GNU/Linux >>>>>>> >>>>>>> Here is the config dump: >>>>>>> >>>>>>> CONFIG_XENO_PSHARED=1 >>>>>> Any chance you could have some leftover files in /dev/shm from aborted >>>>>> runs, which would steal RAM? >>>>>> >>>>> I've been rebooting before each test run, but I'll keep that in mind for >>>>> future testing. >>>>> >>>>> Sounds like I need to try rolling back to an older build, I have a 3.0.5 >>>>> and a 3.0.3 build handy. >>>> The standalone test should work with the shared heap disabled, could you >>>> check it against a build configure with --disable-pshared? Thanks, >>>> >>> Philippe, >>> >>> Sorry for the delay - our vendor had been doing all of our kernel and SDK >>> builds so I had to do a lot of learning to get this all going. >>> >>> With the --disable-pshared in effect: >>> >>> /.g3l # ./qc --dump-config | grep SHARED >>> based on Xenomai/cobalt v3.0.6 -- #6e34bb5 (2018-04-01 10:50:59 +0200) >>> CONFIG_XENO_PSHARED is OFF >>> >>> /.g3l # ./qc foo & >>> /.g3l # find /run/xenomai/ >>> /run/xenomai/ >>> /run/xenomai/root >>> /run/xenomai/root/opus >>> /run/xenomai/root/opus/3477 >>> /run/xenomai/root/opus/3477/alchemy >>> /run/xenomai/root/opus/3477/alchemy/tasks >>> /run/xenomai/root/opus/3477/alchemy/tasks/qcreate3477 >>> /run/xenomai/root/opus/3477/alchemy/queues >>> /run/xenomai/root/opus/3477/alchemy/queues/foo >>> /run/xenomai/root/opus/system >>> /run/xenomai/root/opus/system/threads >>> /run/xenomai/root/opus/system/heaps >>> /run/xenomai/root/opus/system/version >>> root@ICB-G3L:/.g3l # cat run/xenomai/root/opus/3477/alchemy/queues/foo >>> [TYPE] [TOTALMEM] [USEDMEM] [QLIMIT] [MCOUNT] >>> FIFO 5344 3248 10 0 >>> >>> Perfect! >>> >>> What's the next step? >>> >> I can reproduce this issue. I'm on it. >> > The patch below should solve the problem for the registry, however this > may have uncovered a bug in the "tlsf" allocator (once again), which > should not have failed allocating memory. Two separate issues then. > > diff --git a/include/copperplate/registry-obstack.h > b/include/copperplate/registry-obstack.h > index fe192faf7..48e453bc3 100644 > --- a/include/copperplate/registry-obstack.h > +++ b/include/copperplate/registry-obstack.h > @@ -29,11 +29,12 @@ struct threadobj; > struct syncobj; > > /* > - * Assume we may want fast allocation of private memory from real-time > - * mode when growing the obstack. > + * Obstacks are grown from handlers called by the fusefs server > + * thread, which has no real-time requirement: malloc/free is fine for > + * memory management. > */ > -#define obstack_chunk_alloc pvmalloc > -#define obstack_chunk_free pvfree > +#define obstack_chunk_alloc malloc > +#define obstack_chunk_free free > > struct threadobj; > > Thanks Philippe, I shall add this to my build ASAP. If I understand correctly, this is switching the entire registry-obstack-related dynamic storage allocation mechanism from the "pv" routines (TLSF allocator?) paradigm to the standard malloc/free paradigm. I ask because my next issue report was going to be about a SEGV that I have been seeing occasionally in registry_add_file() after having called pvstrdup() and having gotten a NULL return back. The caller there apparently does not expect a NULL return, so when you said "should not have failed allocating memory" that brought my attention back to the SEGV issue. This appears to be related to what I will call "heavy registry activity" when I am initializing - creating lots of RT tasks, queues, mutexes, etc, causing hot activity in registry_add_file I would expect. I'm thinking creation of a "registry exerciser" program may be in order...