From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <47C0B12D.5010200@domain.hid> Date: Sun, 24 Feb 2008 00:50:05 +0100 From: Philippe Gerum MIME-Version: 1.0 References: <47C020A9.3050704@domain.hid> <47C02113.7070005@domain.hid> <18368.23256.85461.492726@domain.hid> <47C06052.2020406@domain.hid> <18368.26112.776987.660255@domain.hid> <47C06CAD.6060200@domain.hid> In-Reply-To: <47C06CAD.6060200@domain.hid> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put Reply-To: philippe.gerum@domain.hid List-Id: "Xenomai life and development \(bug reports, patches, discussions\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: Xenomai-core@domain.hid Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >> > Gilles Chanteperdrix wrote: >> > > Jan Kiszka wrote: >> > > > As the #ifdef forest was cut down, I once again looked at xnlock_put. >> > > > Why do you have this safety check for the owner also in production code? >> > > >> > > Because only one broken xnlock_put could entail a chain reaction of >> > > broken xnlock sections with code on multiple CPU violating critical >> > > sections. With the test, we prevent the chain reaction. But I agree this >> > > check should not be silent. >> > >> > When there is a bug, then there is bug and we are hosed. That's why we >> > have debug checks for finding such cases in advance. Here I was talking >> > about such a debug check in a hot path on a _production_ system, and >> > that check even had no fault recovery. That appeared pointless to me. >> > >> > Just to avoid misunderstandings: This version is not different from the >> > old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced >> > to cover specifically lock debugging. >> > >> > Do you have an idea for some cheap fault recovery for broken locking >> > that we should put in instead? >> >> The fault recovery is to leave the xnlock untouched, in order to avoid >> the chain reaction I was talking about. > > But you may later deadlock on it when trying to reacquire this > unbalanced lock. It can help against double releases, granted. Still, > such cases should be eliminated _ahead_ of release via review, so that > one can finally go for the fast unchecked version in the matured system. > >> Anyway, I think even production >> code should contain some sanity checks, and this one looks cheap. > > I'm fine with a simple check as well. But there should be an _option_ > for such checks, even more if they are supposed to be inlined. > Uninlining reduces this pressure, but it still adds text size to the hot > path. I don't think this is strictly equivalent. The cause of deadlock is no rocket science to diagnose even if it may be painful to fix, so our main problem is critical section breakage. And this particular kind of bugs is extremely sensitive to the instrumentation overhead, to the point where enabling a debug option would turn it into a damned Heisenbug. Unless you want to create a unique debug option to control each and every tiny check, you will likely bind it to a more general debug option, turning on a significant amount of unrelated debug code, which could in turn affect the bug itself. This is where I agree with Gilles, I would rather pay peanuts cycles when acquiring a lock on SMP machines to run this check, than chase wild gooses for days. Some key issues like obvious critical section breakage deserve systematic detection, including in the field. Due to their very nature, maybe you won't be able to trigger some of them elsewhere anyway, because in some complex application cases (the ones we more often have with SMP configs), you may just fake part of the operating environment to debug the application (network feed simulation and so on). But, and there I tend to agree with you, uninlining the containing code is definitely something I would combine with such approach, since I-cache is a precious thing to save. Anyway, measurements will tell. -- Philippe.