From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4DE5202F.70309@domain.hid> Date: Tue, 31 May 2011 19:06:55 +0200 From: Jan Kiszka MIME-Version: 1.0 References: <4DE4D2F1.20305@domain.hid> <1306859397.2153.52.camel@domain.hid> <4DE51973.70104@domain.hid> <1306861115.2153.67.camel@domain.hid> In-Reply-To: <1306861115.2153.67.camel@domain.hid> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-core] Fragile lock usage tracking for auto-relax List-Id: Xenomai life and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Philippe Gerum Cc: Xenomai core On 2011-05-31 18:58, Philippe Gerum wrote: > On Tue, 2011-05-31 at 18:38 +0200, Jan Kiszka wrote: >> On 2011-05-31 18:29, Philippe Gerum wrote: >>> On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote: >>>> Hi Philippe, >>>> >>>> enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the >>>> in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in >>>> xnsynch_release triggers for RT threads: >>>> >>>> XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner) < 0); >>>> >>>> RT threads do not balance their lock and unlock syscalls, so their >>>> counter goes wild quite quickly. >>>> >>>> But just limiting the bug check to XNOTHER threads is neither a >>>> solution. How to deal with the counter on scheduling policy changes? >>>> >>>> So my suggestion is to convert the auto-relax feature into a service, >>>> user space can request based on a counter that user space maintains >>>> independently. I.e. we should create another shared word that user space >>>> increments and decrements on lock acquisitions/releases on its own. The >>>> nucleus just tests it when deciding about the relax on return to user space. >>>> >>>> But before hacking into that direction, I'd like to hear if it makes >>>> sense to you. >>> >>> At first glance, this does not seem to address the root issue. The >>> bottom line is that we should not have any thread release an owned lock >>> it does not hold, kthread or not. >>> >>> In that respect, xnsynch_release() looks fishy because it may be called >>> over a context which is _not_ the lock owner, but the thread who is >>> deleting the lock owner, so assuming lastowner == current_thread when >>> releasing is wrong. >>> >>> At the very least, the following patch would prevent >>> xnsynch_release_all_ownerships() to break badly. The same way, the >>> fastlock stuff does not track the owner properly in the synchro object. >>> We should fix those issues before going further, they may be related to >>> the bug described. >>> >>> Totally, genuinely, 100% untested. >>> >>> diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c >>> index 3a53527..0785533 100644 >>> --- a/ksrc/nucleus/synch.c >>> +++ b/ksrc/nucleus/synch.c >>> @@ -424,6 +424,7 @@ xnflags_t xnsynch_acquire(struct xnsynch *synch, xnticks_t timeout, >>> XN_NO_HANDLE, threadh); >>> >>> if (likely(fastlock == XN_NO_HANDLE)) { >>> + xnsynch_set_owner(synch, thread); >>> xnthread_inc_rescnt(thread); >>> xnthread_clear_info(thread, >>> XNRMID | XNTIMEO | XNBREAK); >>> @@ -718,7 +719,7 @@ struct xnthread *xnsynch_release(struct xnsynch *synch) >>> >>> XENO_BUGON(NUCLEUS, !testbits(synch->status, XNSYNCH_OWNER)); >>> >>> - lastowner = xnpod_current_thread(); >>> + lastowner = synch->owner ?: xnpod_current_thread(); >>> xnthread_dec_rescnt(lastowner); >>> XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner) < 0); >>> lastownerh = xnthread_handle(lastowner); >>> >> >> That's maybe another problem, need to check. >> >> Back to the original issue: with fastlock, kernel space has absolutely >> no clue about how many locks user space may hold - unless someone is >> contending for all those locks. IOW, you can't reliably track resource >> ownership at kernel level without user space help out. The current way >> it helps (enforced syscalls of XNOTHER threads) is insufficient. > > The thing is: we don't care about knowing how many locks some > non-current thread owns. What the nucleus wants to know is whether the > _current user-space_ thread owns a lock, which is enough for the > autorelax management. This restricted scope makes the logic fine. Nope, this does not work for threads that undergo policy changes (see reply to Gilles). > > The existing resource counter is by no mean a resource tracking tool > that could be used from whatever context to query the number of locks an > arbitrary thread holds, it has not been intended that way at all. It > only answers the simple question: "do I hold any lock, as an XNOTHER > thread". Unfortunately, that was not encoded into the overeager bug check. Plus, as state before, the counter can't be kept consistent when the thread was an RT thread before. > >> >> Alternatively to plain counting of ownership in user space, we could >> adopt mainline's robust mutex mechanism (a user space maintained list) >> that solves the release-all-ownerships issue. But I haven't looked into >> details yet. >> > > Would be nice, but still overkill for the purpose of autorelax > management. Right. But the existing code still needs a redesign, maybe not that big, but involved ABI changes. So... Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux