From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4796643D.7010208@domain.hid> Date: Tue, 22 Jan 2008 22:46:37 +0100 From: Jan Kiszka MIME-Version: 1.0 References: <2ff1a98a0801020231k19be7d89k1a6f04b7d497cc34@domain.hid> <18326.21459.333811.155772@domain.hid> In-Reply-To: <18326.21459.333811.155772@domain.hid> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig9A1DACE8EC684EB9926AACB4" Sender: jan.kiszka@domain.hid Subject: Re: [Xenomai-core] High latencies on ARM. List-Id: "Xenomai life and development \(bug reports, patches, discussions\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gilles Chanteperdrix Cc: xenomai-core This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig9A1DACE8EC684EB9926AACB4 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Gilles Chanteperdrix wrote: > Gilles Chanteperdrix wrote: > > Hi, > >=20 > > after some (unsuccessful) time trying to instrument the code in a wa= y > > that does not change the latency results completely, I found the > > reason for the high latency with latency -t 1 and latency -t 2 on AR= M. > > So, here comes an update on this issue. The culprit is the user-spac= e > > context switch, which flushes the processor cache with the nklock > > locked, irqs off. > >=20 > > There are two things we could do: > > - arrange for the ARM cache flush to happen with the nklock unlocked= > > and irqs enabled. This will improve interrupt latency (latency -t 2)= > > but obviously not scheduling latency (latency -t 1). If we go that > > way, there are several problems we should solve: > >=20 > > we do not want interrupt handlers to reenter xnpod_schedule(), for > > this we can use the XNLOCK bit, set on whatever is > > xnpod_current_thread() when the cache flush occurs > >=20 > > since the interrupt handler may modify the rescheduling bits, we nee= d > > to test these bits in xnpod_schedule() epilogue and restart > > xnpod_schedule() if need be > >=20 > > we do not want xnpod_delete_thread() to delete one of the two thread= s > > involved in the context switch, for this the only solution I found i= s > > to add a bit to the thread mask meaning that the thread is currently= > > switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilog= ue > > to delete whatever thread was marked for deletion > >=20 > > in case of migration with xnpod_migrate_thread, we do not want > > xnpod_schedule() on the target CPU to switch to the migrated thread > > before the context switch on the source CPU is finished, for this we= > > can avoid setting the resched bit in xnpod_migrate_thread(), detect > > the condition in xnpod_schedule() epilogue and set the rescheduling > > bits so that xnpod_schedule is restarted and send the IPI to the > > target CPU. >=20 > Please find attached a patch implementing these ideas. This adds some > clutter, which I would be happy to reduce. Better ideas are welcome. >=20 I tried to cross-read the patch (-p would have been nice) but failed -=20 this needs to be applied on some tree. Does the patch improve ARM=20 latencies already? >=20 > >=20 > > - avoid using user-space real-time tasks when running latency > > kernel-space benches, i.e. at least in the latency -t 1 and latency = -t > > 2 case. This means that we should change the timerbench driver. Ther= e > > are at least two ways of doing this: > > use an rt_pipe > > modify the timerbench driver to implement only the nrt ioctl, using= > > vanilla linux services such as wait_event and wake_up. > >=20 > > What do you think ? >=20 > So, what do you thing is the best way to change the timerbench driver, > * use an rt_pipe ? Pros: allows to run latency -t 1 and latency -t 2 ev= en > if Xenomai is compiled with CONFIG_XENO_OPT_PERVASIVE off; cons: make > the timerbench non portable on other implementations of rtdm, eg. rtdm= > over rtai or the version of rtdm which runs over vanilla linux > * modify the timerbecn driver to implement only nrt ioctls ? Pros: > better driver portability; cons: latency would still need > CONFIG_XENO_OPT_PERVASIVE to run latency -t 1 and latency -t 2. I'm still voting for my third approach: -> Write latency as kernel application (klatency) against the timerbench device -> Call NRT IOCTLs of timerbench during module init/cleanup -> Use module parameters for customization -> Setup a low-prio kernel-based RT task to issue the RT IOCTLs -> Format the results nicely (similar to userland latency) in that RT task and stuff them into some rtpipe -> Use "cat /dev/rtpipeX" to display the results Jan --------------enig9A1DACE8EC684EB9926AACB4 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHlmQ9niDOoMHTA+kRAuRSAJ4jKQAkEjN0QrwJ1p1jfArBTpW5rQCfcJeM /FN5tfzRR7Es4SOZnS4uZJI= =Sz6I -----END PGP SIGNATURE----- --------------enig9A1DACE8EC684EB9926AACB4--