From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <43A5B648.5010104@domain.hid> Date: Sun, 18 Dec 2005 20:19:36 +0100 From: Philippe Gerum MIME-Version: 1.0 Subject: Re: [Xenomai-core] [bug] don't try this at home... References: <438DD4E2.9080208@domain.hid> <438DE166.5090303@domain.hid> <438DE551.7080708@domain.hid> <43A32305.8030004@domain.hid> <43A329E6.3080505@domain.hid> <43A34F6C.6020904@domain.hid> <43A56A44.6020308@domain.hid> <43A5AACF.6050505@domain.hid> <43A5B188.3080003@domain.hid> In-Reply-To: <43A5B188.3080003@domain.hid> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit List-Id: "Xenomai life and development \(bug reports, patches, discussions\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: xenomai@xenomai.org Jan Kiszka wrote: > Philippe Gerum wrote: > >>Jan Kiszka wrote: >> >> >>>Philippe Gerum wrote: >>> >>> >>>>>>... >>>>>>Fixed. The cause was related to the thread migration routine to >>>>>>primary mode (xnshadow_harden), which would spuriously call the Linux >>>>>>rescheduling procedure from the primary domain under certain >>>>>>circumstances. This bug only triggers on preemptible kernels. This >>>>>>also fixes the spinlock recursion issue which is sometimes triggered >>>>>>when the spinlock debug option is active. >>>>>> >>>>> >>>>>Gasp. I've found a severe regression with this fix, so more work is >>>>>needed. More later. >>>>> >>>> >>>>End of alert. Should be ok now. >>>> >>> >>> >>>No crashes so far, looks good. But the final test, a box which always >>>went to hell very quickly, is still waiting in my office - more on >>>Monday. >>> >>>Anyway, there seems to be some latency issues pending. I discovered this >>>again with my migration test. Please give it a try on a mid- (800 MHz >>>Athlon in my case) to low-end box. On that Athlon I got peaks of over >>>100 us in the userspace latency test right on starting migration. The >>>Athlon does not support the NMI watchdog, but on my 1.4 GHz Notebook >>>there were alarms (>30 us) hitting in the native registry during >>>rt_task_create. I have no clue yet if anything is broken there. >> >> >>I suspect that rt_registry_enter() is inherently a long operation when >>considered as a non-preemptible sum of reasonably short ones. Since it >>is always called with interrupts enabled, we should split the work in >>there, releasing interrupts in the middle. The tricky thing is that we >>must ensure that the new registration slot is not exposed in a >>half-baked state during the preemptible section. > > > Yea, I guess there are a few more of such complex call chains inside the > core lock, at least when looking at the native skin. For a regression > test suite, we should define load scenarios of low-prio realtime tasks > doing some init/cleanup and communication while e.g. the latency test is > running. This should give a clearer picture what numbers you can expect > in a normal application scenarios. > > >>>We need >>>that back-tracer soon - did I mentioned this before? ;) >> >> >>Well, we have a backtrace support for detecting latency peaks, but it's >>dependent on NMI availability. The thing is that not every platform >>provides a programmable NMI support. A possible option would be to >>overload the existing LTT tracepoints in order to keep an execution >>backtrace, so that we would not have to rely on any hw support. >> > > > The advantage of Fu's mcount-based tracer will be that is can capture > also functions you do not expect, e.g. accidentally called kernel > services. His patch, likely against Adeos, will enable kernel-wide > function tracing which you can use to instrument IRQ-off paths or (in a > second step or so) others things you are interested in. And it will > maintain a FULL calling history, something that NMI can't do. > > NMI will still be useful for hard lock-ups, LTT for a more global view > what's happening, but the mcount-instrumentation should give deep > insights on the core's and skin's critical timing behaviours. > No problem. I've just suggested to build a bicycle to go to the shop around the corner, but if you tell me that a spaceship to visit Venus is at hand, I'll wait for it: shopping can wait. -- Philippe.