From mboxrd@z Thu Jan  1 00:00:00 1970
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <18325.5311.379131.83389@domain.hid>
Date: Mon, 21 Jan 2008 22:55:11 +0100
In-Reply-To: <478F6499.8060700@domain.hid>
References: <2ff1a98a0801020231k19be7d89k1a6f04b7d497cc34@domain.hid>
	<478F30FB.8060501@domain.hid>
	<2ff1a98a0801170247t4378e733l24d470a31d208f95@domain.hid>
	<478F4239.30808@domain.hid>
	<2ff1a98a0801170559r48816868jb8451c52e2a7cdfc@domain.hid>
	<478F6321.4030602@domain.hid>
	<2ff1a98a0801170620rd984e60i228fc99442114bb4@domain.hid>
	<478F6499.8060700@domain.hid>
From: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Subject: Re: [Xenomai-core] High latencies on ARM.
List-Id: "Xenomai life and development \(bug reports, patches,
	discussions\)" <xenomai.xenomai.org>
List-Unsubscribe: <https://mail.gna.org/listinfo/xenomai-core>,
	<mailto:xenomai-core-request@domain.hid>
List-Archive: </public/xenomai-core>
List-Post: <mailto:xenomai@xenomai.org>
List-Help: <mailto:xenomai-core-request@domain.hid>
List-Subscribe: <https://mail.gna.org/listinfo/xenomai-core>,
	<mailto:xenomai-core-request@domain.hid>
To: Jan Kiszka <jan.kiszka@domain.hid>
Cc: xenomai-core <xenomai@xenomai.org>

Jan Kiszka wrote:
 > Gilles Chanteperdrix wrote:
 > > On Jan 17, 2008 3:16 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
 > >> Gilles Chanteperdrix wrote:
 > >>> On Jan 17, 2008 12:55 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
 > >>>> Gilles Chanteperdrix wrote:
 > >>>>> On Jan 17, 2008 11:42 AM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
 > >>>>>> Gilles Chanteperdrix wrote:
 > >>>>>>> Hi,
 > >>>>>>>
 > >>>>>>> after some (unsuccessful) time trying to instrument the code in a way
 > >>>>>>> that does not change the latency results completely, I found the
 > >>>>>>> reason for the high latency with latency -t 1 and latency -t 2 on ARM.
 > >>>>>>> So, here comes an update on this issue. The culprit is the user-space
 > >>>>>>> context switch, which flushes the processor cache with the nklock
 > >>>>>>> locked, irqs off.
 > >>>>>>>
 > >>>>>>> There are two things we could do:
 > >>>>>>> - arrange for the ARM cache flush to happen with the nklock unlocked
 > >>>>>>> and irqs enabled. This will improve interrupt latency (latency -t 2)
 > >>>>>>> but obviously not scheduling latency (latency -t 1). If we go that
 > >>>>>>> way, there are several problems we should solve:
 > >>>>>>>
 > >>>>>>> we do not want interrupt handlers to reenter xnpod_schedule(), for
 > >>>>>>> this we can use the XNLOCK bit, set on whatever is
 > >>>>>>> xnpod_current_thread() when the cache flush occurs
 > >>>>>>>
 > >>>>>>> since the interrupt handler may modify the rescheduling bits, we need
 > >>>>>>> to test these bits in xnpod_schedule() epilogue and restart
 > >>>>>>> xnpod_schedule() if need be
 > >>>>>>>
 > >>>>>>> we do not want xnpod_delete_thread() to delete one of the two threads
 > >>>>>>> involved in the context switch, for this the only solution I found is
 > >>>>>>> to add a bit to the thread mask meaning that the thread is currently
 > >>>>>>> switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
 > >>>>>>> to delete whatever thread was marked for deletion
 > >>>>>>>
 > >>>>>>> in case of migration with xnpod_migrate_thread, we do not want
 > >>>>>>> xnpod_schedule() on the target CPU to switch to the migrated thread
 > >>>>>>> before the context switch on the source CPU is finished, for this we
 > >>>>>>> can avoid setting the resched bit in xnpod_migrate_thread(), detect
 > >>>>>>> the condition in xnpod_schedule() epilogue and set the rescheduling
 > >>>>>>> bits so that xnpod_schedule is restarted and send the IPI to the
 > >>>>>>> target CPU.
 > >>>>>>>
 > >>>>>>> - avoid using user-space real-time tasks when running latency
 > >>>>>>> kernel-space benches, i.e. at least in the latency -t 1 and latency -t
 > >>>>>>> 2 case. This means that we should change the timerbench driver. There
 > >>>>>>> are at least two ways of doing this:
 > >>>>>>> use an rt_pipe
 > >>>>>>>  modify the timerbench driver to implement only the nrt ioctl, using
 > >>>>>>> vanilla linux services such as wait_event and wake_up.
 > >>>>>> [As you reminded me of this unanswered question:]
 > >>>>>> One may consider adding further modes _besides_ current kernel tests
 > >>>>>> that do not rely on RTDM & native userland support (e.g. when
 > >>>>>> CONFIG_XENO_OPT_PERVASIVE is disabled). But the current tests are valid
 > >>>>>> scenarios as well that must not be killed by such a change.
 > >>>>> I think the current test scenario for latency -t 1 and latency -t 2
 > >>>>> are a bit misleading: they measure kernel-space latencies in presence
 > >>>>> of user-space real-time tasks. When one runs latency -t 1 or latency
 > >>>>> -t 2, one would expect that there are only kernel-space real-time
 > >>>>> tasks.
 > >>>> If they are misleading, depends on your perspective. In fact, they are
 > >>>> measuring in-kernel scenarios over the standard Xenomai setup, which
 > >>>> includes userland RT task activity these day. Those scenarios are mainly
 > >>>> targeting driver use cases, not pure kernel-space applications.
 > >>>>
 > >>>> But I agree that, for !CONFIG_XENO_OPT_PERVASIVE-like scenarios, we
 > >>>> would benefit from an additional set of test cases.
 > >>> Ok, I will not touch timerbench then, and implement another kernel module.
 > >>>
 > >> [Without considering all details]
 > >> To achieve this independence of user space RT thread, it should suffice
 > >> to implement a kernel-based frontend for timerbench. This frontent would
 > >> then either dump to syslog or open some pipe to tell userland about the
 > >> benchmark results. What do yo think?
 > > 
 > > My intent was to implement a protocol similar to the one of
 > > timerbench, but using an rt-pipe, and continue to use the latency
 > > test, adding new options such as -t 3 and t 4. But there may be
 > > problems with this approach: if we are compiling without
 > > CONFIG_XENO_OPT_PERVASIVE, latency will not run at all. So, it is
 > > probably simpler to implement a klatency that just reads from the
 > > rt-pipe.
 > 
 > But that klantency could perfectly reuse what timerbench already
 > provides, without code changes to the latter, in theory.

In theory yes, but in practice, timerbench non real-time ioctls use some
linux services, so they can not be called from the context of a
kernel-space task listening on a real-time pipe.

-- 


					    Gilles Chanteperdrix.