All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] Heads up: I-pipe patch status on ARM
@ 2012-07-25 12:48 Gilles Chanteperdrix
  2012-07-28 20:27 ` Paul
  0 siblings, 1 reply; 21+ messages in thread
From: Gilles Chanteperdrix @ 2012-07-25 12:48 UTC (permalink / raw)
  To: xenomai@xenomai.org

Hi,

Short version: good news, we are able to compile a Linux 3.4 "all-in-one
kernel" with Xenomai support.

Long version:
As you may know, the direction followed by the Linux kernel on ARM is to
allow, as much as possible, to compile a single kernel to run on many
ARM SOCs. This basically means the systematic elimination of
SOC-specific piece of codes which were chosen at compilation time.
Historically, Xenomai SOC support code fell into that category, with
everything SOC specific being called __ipipe_mach_something, and being
selected at compilation time, this was not a bad choice at the time it
was made: Linux kernel code was not able to be compiled for several SOCs
at a time anyway.

Changing this started with the I-pipe patch for Linux 2.6.33, with the
implementation of the "kuser" tsc emulation, whose side effect was to
make the tsc emulation registered at run-time. The tsc emulation code is
provided in the "vector page", at the same place as other helpers
provided by the Linux kernel (the ARM user-space atomic_cmpxchg is
implemented this way for instance). The aim of putting the helper there
is to avoid having to compile a SOC specific code in user-space. This
has been exploited by Xenomai 2.6.0 which removed the --enable-arm-mach
option and uses the "kuser" helper by default (but allows to fallback to
the old way in order to remain compatible with old patches). This
allows, for instance, a Debian "xenomai-runtime" package which runs with
any SOC, whatever the implementation of their tsc. Incidentally, tests
have also showed that implementing the tsc emulation code that way,
actually reduced the average tsc latency despite the fact that reading
the tsc is a function call and no longer a piece of inlined C code, that
is probably because the helpers are implemented in assembly. To the
point where the latency of the (software) tsc is lower on a Cortex A9
based processor than the (hardware) tsc on an Intel Atom 230.

The I-pipe core patch for Linux 3.2 provided another occasion to move in
the same direction:
- the I-pipe timer factorization effort allowed to get the hardware
timer support code registered at run-time and no longer hard-coded at
compilation time. Again, this was not the original aim, the aim was to
have a hardware timer support uniform across all architectures to reduce
the Xenomai arch-specific cruft, and a secondary aim on ARM was to allow
sharing more code with the clockevent infrastructure, by simply reusing
the clockevent timer call-backs when the hardware timer is shared
between Xenomai and Linux. It turns out that we also reused the
clockevent call-backs on other architectures, and in fact, it also
simplified compiling a Linux kernel on x86, because we no longer need a
kernel compiled differently to use the local APIC timer or the PIT timer.
- we also get the PIC muting functions on ARM to be registered at
run-time, this time on purpose.

While writing a documentation on how to port the I-pipe core patch to a
new ARM board [1], I realized that we still had some "__ipipe_mach" code
in the I-pipe core patch: the IPI handling code on SMP systems.

So, I took the chance of working on the I-pipe patch for Linux 3.4 to
remedy this situation. The idea is simple: in the Linux kernel on ARM
the IPIs live in a different numbering space as the IRQs, but the
interrupt pipeline wants to mix them with irqs, to have a common
pipeline for both types of interrupts. Some __ipipe_mach macros made
that work. For instance, on Cortex A9, the IPIs are actually in the same
numbering space as the IRQs at the interrupt controller level, they
simply are irqs from 2 to 16, so, in the I-pipe patch before Linux 3.4,
we simply used the hardware IPI numbers as irq numbers. But this has two
disadvantages:
- starting with Linux 3.4, hardware irq numbers are remapped to Linux
virtual irq numbers, with the advent of "irq domains", so the fact that
there is room for the IPIs is no longer guaranteed;
- more importantly, on a cortex A9 kernel, this means that Linux domain
irq handlers for irqs 2 to 16 were redirecting these irqs to Linux IPI
handler, which is the wrong thing to do on non cortex A9 processor.
So, the cure was to systematically remap the IPIs to I-pipe virqs, (and
remove the __ipipe_mach macros), this does not change anything with
regard to the path followed for handling IPIs except for the particular
number they use along the interrupt pipeline, but this avoids the two
issues mentioned above:
- however Linux irqs are remapped, they do not conflict with I-pipe virqs
- a cortex A9 enabled kernel running on another SOC (a uniprocessor SOC
for instance), reserve virqs for IPIs which are simply never used, the
other SOC irqs are handled as irqs, as should be.

So, the net result of all this is that I have been able to run a Linux
kernel compiled with Xenomai support both for OMAP3 and OMAP4 at the
same time, both on an OMAP3 and on OMAP4, and to compile test a Xenomai
kernel for the two imx_v4_v4_defconfig and imx_v6_v7 kernel
configurations which cover about every Freescale IMX processors around
(including the cortex A9 based IMX6Q).

Whether the multi-SOC feature will really be used by Xenomai users on
ARM remains to be seen (but why not a Debian kernel or two with all ARM
marchines around?), but in any case, it is a net win for simply compile
testing the I-pipe patch on ARM. The only remaining problem is with
AT91s, which still have to be compiled separately on Linux 3.4, but I
see the 3.5 kernel allows compiling them all at once.

The bad news is that in order to achieve the goal of a kernel runnable
on multiple SOC, the Linux kernel has increased the path followed to
handle hardware interrupts (notably with CONFIG_MULTI_IRQ_HANDLER, irq
domains), and that is probably the reason why we get slightly worse
latencies with Xenomai for Linux 3.4 on ARM [2].

[1] http://www.xenomai.org/index.php/I-pipe-core:ArmPorting
[2] http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-07-30 15:39 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-25 12:48 [Xenomai] Heads up: I-pipe patch status on ARM Gilles Chanteperdrix
2012-07-28 20:27 ` Paul
2012-07-28 20:41   ` Gilles Chanteperdrix
2012-07-29  9:44     ` Paul
2012-07-29 10:58       ` xenophile
2012-07-29 11:05         ` Gilles Chanteperdrix
2012-07-29 11:16           ` xenophile
2012-07-29 11:01       ` Gilles Chanteperdrix
2012-07-29 11:09       ` xenophile
2012-07-29 11:16         ` Gilles Chanteperdrix
2012-07-29 11:24           ` xenophile
2012-07-29 11:32             ` Gilles Chanteperdrix
2012-07-29 12:10               ` xenophile
2012-07-29 12:22                 ` Gilles Chanteperdrix
2012-07-29 11:22         ` Gilles Chanteperdrix
2012-07-29 11:50           ` xenophile
2012-07-29 12:02             ` Gilles Chanteperdrix
2012-07-29 12:49               ` xenophile
2012-07-29 13:34                 ` Gilles Chanteperdrix
2012-07-29 14:16                   ` xenophile
2012-07-30 15:39   ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.