From mboxrd@z Thu Jan 1 00:00:00 1970 References: <20220117165341.387782-1-florian.bezdeka@siemens.com> <871r12hksf.fsf@xenomai.org> <6a63cc2c027ba27c07852f91d9e2067760068e94.camel@siemens.com> From: Philippe Gerum Subject: Re: [PATCH] cobalt: Fix resource leak of cobalt_monitor Date: Sat, 22 Jan 2022 19:16:33 +0100 In-reply-to: <6a63cc2c027ba27c07852f91d9e2067760068e94.camel@siemens.com> Message-ID: <87lez7d2bs.fsf@xenomai.org> MIME-Version: 1.0 Content-Type: text/plain List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Bezdeka, Florian" Cc: "xenomai@xenomai.org" , "jan.kiszka@siemens.com" "Bezdeka, Florian" writes: > On Thu, 2022-01-20 at 09:12 +0100, Philippe Gerum wrote: >> Florian Bezdeka writes: >> >> > We have a test within the smokey testsuite which actually does >> > something like >> > >> > cobalt_monitor_init() >> > cobalt_monitor_wait() >> > cobalt_monitor_enter() >> > -> timeout >> > cobalt_monitor_exit() >> > cobalt_monitor_destroy() >> > >> > If the posix_mutex tests were run right after this scenario the mutex >> > tests failed. The wrong mutex state was caused by the monitor in the >> > previous test not being cleaned up properly. >> > >> > Signed-off-by: Florian Bezdeka >> > --- >> > >> > I'm not 100% sure about this fix. Maybe we should move it into >> > cobalt_monitor_reclaim, maybe I'm using the monitor in a wrong way >> > inside the y2038 tests. Everything is possible... >> > >> > @Philippe: It would be very nice if you could have a look as well. >> > Thanks! >> > >> >> Looks ok, thank you for spotting this. I think the issue your patch >> fixes caused a PI boost to linger since deleting the gate lock does not >> drop ownership - only xnsynch_release() does so - leaving the owner with >> a boosted priority unduly (*). This is likely to wreck any subsequent >> test depending on the priority scheme. >> >> You could indeed move the release call to the reclamation routine. > > Checked that again and my conclusion is that the current location is > correct and it should not be moved into cobalt_monitor_reclaim. > > Reasoning: cobalt_monitor_reclaim is called by cobalt_process_detach() > as well, but there is no owner check like in CoBaLt_monitor_destroy. > > Do you agree? > I guess you want me to look at the code at this point. Ok, let's try this. On task exit, PI is cleared when either release_all_ownerships() or xnsynch_release() are called, whichever comes first. For a thread which belongs to the Cobalt personality, the exit sequence may include in that order: 1. __handle_taskexit_event cobalt_remove_process cobalt_process_detach __reclaim_resource cobalt_monitor_reclaim xnsynch_destroy => flush waiters, clear PI 2. __xnthread_cleanup release_all_ownerships => monitor already destroyed, no lock held So, despite the fact that an exiting thread drops all ownerships on locks only _after_ the resources it owns have been reclaimed, this should be ok to keep the release out of the reclamation code, because all waiters were flushed prior to that point, plus the lock owner exits anyway (unlike in the plain monitory_destroy() case). So yes, you are right and I was likely off base when mentioning the move to the reclamation code. You may want to double-check that taking a termination signal while holding a monitor does indeed what I just described though. shameless plug: the v3 way of doing resource management is overly complicated and somewhat fragile, would benefit from a serious streamlining effort. v4 does much, much better by simply relying on the common VFS for this part. -- Philippe.