From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <478F63AC.3050903@domain.hid> Date: Thu, 17 Jan 2008 15:18:20 +0100 From: Jan Kiszka MIME-Version: 1.0 References: <2ff1a98a0801020231k19be7d89k1a6f04b7d497cc34@domain.hid> <478F30FB.8060501@domain.hid> <2ff1a98a0801170247t4378e733l24d470a31d208f95@domain.hid> <478F4239.30808@domain.hid> <2ff1a98a0801170559r48816868jb8451c52e2a7cdfc@domain.hid> <478F6321.4030602@domain.hid> In-Reply-To: <478F6321.4030602@domain.hid> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-core] High latencies on ARM. List-Id: "Xenomai life and development \(bug reports, patches, discussions\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gilles Chanteperdrix Cc: xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> On Jan 17, 2008 12:55 PM, Jan Kiszka wrote: >>> Gilles Chanteperdrix wrote: >>>> On Jan 17, 2008 11:42 AM, Jan Kiszka wrote: >>>>> Gilles Chanteperdrix wrote: >>>>>> Hi, >>>>>> >>>>>> after some (unsuccessful) time trying to instrument the code in a way >>>>>> that does not change the latency results completely, I found the >>>>>> reason for the high latency with latency -t 1 and latency -t 2 on ARM. >>>>>> So, here comes an update on this issue. The culprit is the user-space >>>>>> context switch, which flushes the processor cache with the nklock >>>>>> locked, irqs off. >>>>>> >>>>>> There are two things we could do: >>>>>> - arrange for the ARM cache flush to happen with the nklock unlocked >>>>>> and irqs enabled. This will improve interrupt latency (latency -t 2) >>>>>> but obviously not scheduling latency (latency -t 1). If we go that >>>>>> way, there are several problems we should solve: >>>>>> >>>>>> we do not want interrupt handlers to reenter xnpod_schedule(), for >>>>>> this we can use the XNLOCK bit, set on whatever is >>>>>> xnpod_current_thread() when the cache flush occurs >>>>>> >>>>>> since the interrupt handler may modify the rescheduling bits, we need >>>>>> to test these bits in xnpod_schedule() epilogue and restart >>>>>> xnpod_schedule() if need be >>>>>> >>>>>> we do not want xnpod_delete_thread() to delete one of the two threads >>>>>> involved in the context switch, for this the only solution I found is >>>>>> to add a bit to the thread mask meaning that the thread is currently >>>>>> switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue >>>>>> to delete whatever thread was marked for deletion >>>>>> >>>>>> in case of migration with xnpod_migrate_thread, we do not want >>>>>> xnpod_schedule() on the target CPU to switch to the migrated thread >>>>>> before the context switch on the source CPU is finished, for this we >>>>>> can avoid setting the resched bit in xnpod_migrate_thread(), detect >>>>>> the condition in xnpod_schedule() epilogue and set the rescheduling >>>>>> bits so that xnpod_schedule is restarted and send the IPI to the >>>>>> target CPU. >>>>>> >>>>>> - avoid using user-space real-time tasks when running latency >>>>>> kernel-space benches, i.e. at least in the latency -t 1 and latency -t >>>>>> 2 case. This means that we should change the timerbench driver. There >>>>>> are at least two ways of doing this: >>>>>> use an rt_pipe >>>>>> modify the timerbench driver to implement only the nrt ioctl, using >>>>>> vanilla linux services such as wait_event and wake_up. >>>>> [As you reminded me of this unanswered question:] >>>>> One may consider adding further modes _besides_ current kernel tests >>>>> that do not rely on RTDM & native userland support (e.g. when >>>>> CONFIG_XENO_OPT_PERVASIVE is disabled). But the current tests are valid >>>>> scenarios as well that must not be killed by such a change. >>>> I think the current test scenario for latency -t 1 and latency -t 2 >>>> are a bit misleading: they measure kernel-space latencies in presence >>>> of user-space real-time tasks. When one runs latency -t 1 or latency >>>> -t 2, one would expect that there are only kernel-space real-time >>>> tasks. >>> If they are misleading, depends on your perspective. In fact, they are >>> measuring in-kernel scenarios over the standard Xenomai setup, which >>> includes userland RT task activity these day. Those scenarios are mainly >>> targeting driver use cases, not pure kernel-space applications. >>> >>> But I agree that, for !CONFIG_XENO_OPT_PERVASIVE-like scenarios, we >>> would benefit from an additional set of test cases. >> Ok, I will not touch timerbench then, and implement another kernel module. >> > > [Without considering all details] > To achieve this independence of user space RT thread, it should suffice > to implement a kernel-based frontend for timerbench. This frontent would > then either dump to syslog or open some pipe to tell userland about the > benchmark results. What do yo think? > (That is only in case you meant "reimplementing timerbench" with "implement another kernel module". Just write a kernel-hosted RTDM user of timerbench.)