From mboxrd@z Thu Jan 1 00:00:00 1970 From: Philippe Gerum In-Reply-To: <4CD5FA26.4090504@domain.hid> References: <4CC82C8D.3080808@domain.hid> <4CD1BA29.9000303@domain.hid> <1288816871.1842.84.camel@domain.hid> <4CD1DC1B.8060407@domain.hid> <4CD1DE12.5010309@domain.hid> <4CD1E890.5010702@domain.hid> <4CD1EC2F.4040603@domain.hid> <4CD1ED16.8030103@domain.hid> <4CD1EDA8.10007@domain.hid> <4CD1F33C.5070208@domain.hid> <4CD1F3F5.5080505@domain.hid> <4CD1F4FE.9020908@domain.hid> <4CD1F69B.9070100@domain.hid> <4CD1F906.1070703@domain.hid> <4CD1FABD.1080301@domain.hid> <4CD2612C.2070507@domain.hid> <4CD279F7.7070502@domain.hid> <4CD27C46.8010302@domain.hid> <4CD27DC2.7060607@domain.hid> <4CD2A96B.3080001@domain.hid> <4CD2B2A7.9010900@domain.hid> <4CD2C50F.1090604@domain.hid> <4CD32E76.3080004@domain.hid> <4CD33F0C.1050403@domain.hid> <4CD340AA.60002@domain.hid> <4CD34355.5020304@domain.hid> <4CD35DC7.1000507@domain.hid> <4CD3DAC5.6000400@domain.hid> <4CD4A0EF.1@domain.hid> <4CD5B9FC.6050602@domain.hid> <4CD5BC82.6060106@domain.hid> <1289083796.1842.239.camel@domain.hid> <4CD5FA26.4090504@domain.hid> Content-Type: text/plain; charset="UTF-8" Date: Sun, 07 Nov 2010 10:46:20 +0100 Message-ID: <1289123180.1842.265.camel@domain.hid> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-core] Potential problem with rt_eepro100 List-Id: Xenomai life and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: "xenomai@xenomai.org" , Anders On Sun, 2010-11-07 at 02:00 +0100, Jan Kiszka wrote: > Am 06.11.2010 23:49, Philippe Gerum wrote: > > On Sat, 2010-11-06 at 21:37 +0100, Gilles Chanteperdrix wrote: > >> Anders Blomdell wrote: > >>> Gilles Chanteperdrix wrote: > >>>> Anders Blomdell wrote: > >>>>> Gilles Chanteperdrix wrote: > >>>>>> Jan Kiszka wrote: > >>>>>>> Am 05.11.2010 00:24, Gilles Chanteperdrix wrote: > >>>>>>>> Jan Kiszka wrote: > >>>>>>>>> Am 04.11.2010 23:06, Gilles Chanteperdrix wrote: > >>>>>>>>>> Jan Kiszka wrote: > >>>>>>>>>>>>> At first sight, here you are more breaking things than cleaning them. > >>>>>>>>>>>> Still, it has the SMP record for my test program, still runs with ftrace > >>>>>>>>>>>> on (after 2 hours, where it previously failed after maximum 23 minutes). > >>>>>>>>>>> My version was indeed still buggy, I'm reworking it ATM. > >>>>>>>>>>> > >>>>>>>>>>>> If I get the gist of Jan's changes, they are (using the IPI to transfer > >>>>>>>>>>>> one bit of information: your cpu needs to reschedule): > >>>>>>>>>>>> > >>>>>>>>>>>> xnsched_set_resched: > >>>>>>>>>>>> - setbits((__sched__)->status, XNRESCHED); > >>>>>>>>>>>> > >>>>>>>>>>>> xnpod_schedule_handler: > >>>>>>>>>>>> + xnsched_set_resched(sched); > >>>>>>>>>>>> > >>>>>>>>>>>> If you (we?) decide to keep the debug checks, under what circumstances > >>>>>>>>>>>> would the current check trigger (in laymans language, that I'll be able > >>>>>>>>>>>> to understand)? > >>>>>>>>>>> That's actually what /me is wondering as well. I do not see yet how you > >>>>>>>>>>> can reliably detect a missed reschedule reliably (that was the purpose > >>>>>>>>>>> of the debug check) given the racy nature between signaling resched and > >>>>>>>>>>> processing the resched hints. > >>>>>>>>>> The purpose of the debugging change is to detect a change of the > >>>>>>>>>> scheduler state which was not followed by setting the XNRESCHED bit. > >>>>>>>>> But that is nucleus business, nothing skins can screw up (as long as > >>>>>>>>> they do not misuse APIs). > >>>>>>>> Yes, but it happens that we modify the nucleus from time to time. > >>>>>>>> > >>>>>>>>>> Getting it to work is relatively simple: we add a "scheduler change set > >>>>>>>>>> remotely" bit to the sched structure which is NOT in the status bit, set > >>>>>>>>>> this bit when changing a remote sched (under nklock). In the debug check > >>>>>>>>>> code, if the scheduler state changed, and the XNRESCHED bit is not set, > >>>>>>>>>> only consider this a but if this new bit is not set. All this is > >>>>>>>>>> compiled out if the debug is not enabled. > >>>>>>>>> I still see no benefit in this check. Where to you want to place the bit > >>>>>>>>> set? Aren't that just the same locations where > >>>>>>>>> xnsched_set_[self_]resched already is today? > >>>>>>>> Well no, that would be another bit in the sched structure which would > >>>>>>>> allow us to manipulate the status bits from the local cpu. That > >>>>>>>> supplementary bit would only be changed from a distant CPU, and serve to > >>>>>>>> detect the race which causes the false positive. The resched bits are > >>>>>>>> set on the local cpu to get xnpod_schedule to trigger a rescheduling on > >>>>>>>> the distance cpu. That bit would be set on the remote cpu's sched. Only > >>>>>>>> when debugging is enabled. > >>>>>>>> > >>>>>>>>> But maybe you can provide some motivating bug scenarios, real ones of > >>>>>>>>> the past or realistic ones of the future. > >>>>>>>> Of course. The bug is anything which changes the scheduler state but > >>>>>>>> does not set the XNRESCHED bit. This happened when we started the SMP > >>>>>>>> port. New scheduling policies would be good candidates for a revival of > >>>>>>>> this bug. > >>>>>>>> > >>>>>>> You don't gain any worthwhile check if you cannot make the > >>>>>>> instrumentation required for a stable detection simpler than the proper > >>>>>>> problem solution itself. And this is what I'm still skeptical of. > >>>>>> The solution is simple, but finding the problem without the > >>>>>> instrumentation is way harder than with the instrumentation, so the > >>>>>> instrumentation is worth something. > >>>>>> > >>>>>> Reproducing the false positive is surprisingly easy with a simple > >>>>>> dual-cpu semaphore ping-pong test. So, here is the (tested) patch, > >>>>>> using a ridiculous long variable name to illustrate what I was > >>>>>> thinking about: > >>>>>> > >>>>>> diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h > >>>>>> index 8888cf4..454b8e8 100644 > >>>>>> --- a/include/nucleus/sched.h > >>>>>> +++ b/include/nucleus/sched.h > >>>>>> @@ -108,6 +108,9 @@ typedef struct xnsched { > >>>>>> struct xnthread *gktarget; > >>>>>> #endif > >>>>>> > >>>>>> +#ifdef CONFIG_XENO_OPT_DEBUG_NUCLEUS > >>>>>> + int debug_resched_from_remote; > >>>>>> +#endif > >>>>>> } xnsched_t; > >>>>>> > >>>>>> union xnsched_policy_param; > >>>>>> @@ -185,6 +188,8 @@ static inline int xnsched_resched_p(struct xnsched *sched) > >>>>>> xnsched_t *current_sched = xnpod_current_sched(); \ > >>>>>> __setbits(current_sched->status, XNRESCHED); \ > >>>>>> if (current_sched != (__sched__)) { \ > >>>>>> + if (XENO_DEBUG(NUCLEUS)) \ > >>>>>> + __sched__->debug_resched_from_remote = 1; \ > >>>>>> xnarch_cpu_set(xnsched_cpu(__sched__), current_sched->resched); \ > >>>>>> } \ > >>>>>> } while (0) > >>>>>> diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c > >>>>>> index 4cb707a..50b0f49 100644 > >>>>>> --- a/ksrc/nucleus/pod.c > >>>>>> +++ b/ksrc/nucleus/pod.c > >>>>>> @@ -2177,6 +2177,10 @@ static inline int __xnpod_test_resched(struct xnsched *sched) > >>>>>> xnarch_cpus_clear(sched->resched); > >>>>>> } > >>>>>> #endif > >>>>>> + if (XENO_DEBUG(NUCLEUS) && sched->debug_resched_from_remote) { > >>>>>> + sched->debug_resched_from_remote = 0; > >>>>>> + resched = 1; > >>>>>> + } > >>>>>> clrbits(sched->status, XNRESCHED); > >>>>>> return resched; > >>>>>> } > >>>>>> > >>>>>> > >>>>>> I am still uncertain. > >>>>> Will only work if all is done under nklock, otherwise two almost > >>>>> simultaneous xnsched_resched_p from different cpus, might lead to one of > >>>>> the ipi wakeups sees the 0 written due to handling the first ipi interrupt. > >>>> This is a patch artifact, the function modified are xnsched_set_resched > >>>> and xnpod_test_resched, and both are run with the nklock locked. > >>>> > >>> > >>> Isn't this a possible scenario? > >>> > >>> CPU A CPU B CPU C > >>> take nklock > >>> remote = 1 > >>> send ipi #1 > >>> release nklock > >>> take nklock handle ipi > >>> remote = 1 ack ipi #1 > >>> send ipi #2 > >>> release nklock > >>> take nklock > >>> if remote (==1) > >>> remote = 0 > >>> reseched = 1 > >>> relese nklock > >>> handle ipi > >>> ack ipi #2 > >>> take nklock > >>> if remote (==0) > >>> OOPS! > >> > >> No problem here, since handling the first IPI has taken into account the > >> two scheduler state changes. So, no OOPS. The second IPI is spurious. > >> > >> Anyway, after some thoughts, I think we are going to try and make the > >> current situation work instead of going back to the old way. > >> > >> You can find the patch which attempts to do so here: > >> http://sisyphus.hd.free.fr/~gilles/sched_status.txt > > > > Ack. At last, this addresses the real issues without asking for > > regression funkiness: fix the lack of barrier before testing XNSCHED in > > Check the kernel, we actually need it on both sides. No, really? > Wherever the final > barriers will be, we should leave a comment behind why they are there. > Could be picked up from kernel/smp.c. > > > the xnpod_schedule pre-test, and stop sched->status trashing due to > > XNINIRQ/XNHTICK/XNRPICK ops done un-synced on nklock. > > > > In short, this patch looks like moving the local-only flags where they > > belong, i.e. anywhere you want but *outside* of the status with remotely > > accessed bits. XNRPICK seems to be handled differently, but it makes > > sense to group it with other RPI data as you did, so fine with me. > > I just hope we finally converge over a solution. Looks like all > possibilities have been explored now. A few more comments on this one: > > It probably makes sense to group the status bits accordingly (both their > values and definitions) and briefly document on which status field they > are supposed to be applied. > > I do not understand the split logic - or some bits are simply not yet > migrated. XNHDEFER, XNSWLOCK, XNKCOUT are all local-only as well, no? The split logic is to keep the scheduler bits in the scheduler status, and move other per-CPU bits elsewhere. So in fact, we only need XNHDEFER to migrate as well. > Then better put them in the _local_ status field, that's more consistent > (and would help if we once wanted to optimize their cache line usage). > > The naming is unfortunate: status vs. lstatus. This is asking for > confusion and typos. They must be better distinguishable, e.g. > local_status. Or we need accessors that have debug checks built in, > catching wrong bits for their target fields. > > Good catch of the RPI breakage, Gilles! > > Jan > -- Philippe.