* [Xenomai-help] Generel problem with realtime-tops(like adeos) over linux-kernel
@ 2008-07-17 8:12 Schlägl Manfred jun.
2008-07-17 9:14 ` Jan Kiszka
2008-07-18 8:02 ` Philippe Gerum
0 siblings, 2 replies; 4+ messages in thread
From: Schlägl Manfred jun. @ 2008-07-17 8:12 UTC (permalink / raw)
To: xenomai
Hi!
I think we've discovered a generell logical problem with realtime-tops
like adeos over the linux-kernel.
The basic-assumption of such an system is: Linux is not a
realtime-system, so it is not able to provide realtime to it's services,
so no linux-service is able to use realtime-capabilities, so no
linux-service has realtime-requirements.
>From this it follows that we are able use a top like adeos (send
interrupts later, always interrupt the linux-kernel).
But... Linux is able to provide hard-realtime while interrupts are
locked. And many services(driver) use this.
abstract example:
{{{
spin_lock_irqsave
if(hardware_data_valid())
process_hardware_data()
spin_lock_irqrestore
}}}
works fine without adeos, but with adeos there may be a relative long
interruption between validation and processing. The hardware may overrun
and process_hardware_data is called without valid data...
In our case we have this problem while the rx-interrupt of our
ethernet-driver. The dma is running permanently and generates an overrun
between the error-checking(which would catch the overrun) part and the
data-processing part of the handler.
I think it is possible that there could be many such (latent) problems
in linux-kernel. For example USB which itself has realtime-requirements,
or eventually mtd (lost data as cause of wrong flash-write/erase
timings), ...
So ... what do you think about that.
Best regards
Manfred Schlaegl
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xenomai-help] Generel problem with realtime-tops(like adeos) over linux-kernel
2008-07-17 8:12 [Xenomai-help] Generel problem with realtime-tops(like adeos) over linux-kernel Schlägl Manfred jun.
@ 2008-07-17 9:14 ` Jan Kiszka
2008-07-18 8:02 ` Philippe Gerum
1 sibling, 0 replies; 4+ messages in thread
From: Jan Kiszka @ 2008-07-17 9:14 UTC (permalink / raw)
To: "Schlägl \"Manfred jun.\""; +Cc: xenomai
Schlägl Manfred jun. wrote:
> Hi!
>
> I think we've discovered a generell logical problem with realtime-tops
> like adeos over the linux-kernel.
>
> The basic-assumption of such an system is: Linux is not a
> realtime-system, so it is not able to provide realtime to it's services,
> so no linux-service is able to use realtime-capabilities, so no
> linux-service has realtime-requirements.
>>From this it follows that we are able use a top like adeos (send
> interrupts later, always interrupt the linux-kernel).
>
> But... Linux is able to provide hard-realtime while interrupts are
> locked. And many services(driver) use this.
>
> abstract example:
> {{{
> spin_lock_irqsave
> if(hardware_data_valid())
> process_hardware_data()
> spin_lock_irqrestore
> }}}
> works fine without adeos, but with adeos there may be a relative long
> interruption between validation and processing. The hardware may overrun
> and process_hardware_data is called without valid data...
For those rare cases (compared to the mass of locks that don't need
this), you can still convert the lock into a hard one again. See e.g.
how IRQ controllers (which are shared between Linux and other Adeos
domains) are handled by the I-pipe patch, watching out for
ipipe_spinlock_t specifically.
>
> In our case we have this problem while the rx-interrupt of our
> ethernet-driver. The dma is running permanently and generates an overrun
> between the error-checking(which would catch the overrun) part and the
> data-processing part of the handler.
>
> I think it is possible that there could be many such (latent) problems
> in linux-kernel. For example USB which itself has realtime-requirements,
> or eventually mtd (lost data as cause of wrong flash-write/erase
> timings), ...
>
> So ... what do you think about that.
Most hardware does not have such hard requirements. IIRC, USB does not.
Also note that long hard IRQ-off phases are highly disliked in mainline
Linux as well, and kernel developers will happily accept sound patches
avoiding such phases. This also simplifies the path toward PREEMPT_RT
(which effectively does the same to these locks that Adeos does).
Jan
--
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xenomai-help] Generel problem with realtime-tops(like adeos) over linux-kernel
2008-07-17 8:12 [Xenomai-help] Generel problem with realtime-tops(like adeos) over linux-kernel Schlägl Manfred jun.
2008-07-17 9:14 ` Jan Kiszka
@ 2008-07-18 8:02 ` Philippe Gerum
2008-07-19 7:27 ` Philippe Gerum
1 sibling, 1 reply; 4+ messages in thread
From: Philippe Gerum @ 2008-07-18 8:02 UTC (permalink / raw)
To: "Schlägl \"Manfred jun.\""; +Cc: xenomai
Schlägl Manfred jun. wrote:
> Hi!
>
> I think we've discovered a generell logical problem with realtime-tops
> like adeos over the linux-kernel.
>
> The basic-assumption of such an system is: Linux is not a
> realtime-system, so it is not able to provide realtime to it's services,
> so no linux-service is able to use realtime-capabilities, so no
> linux-service has realtime-requirements.
>>From this it follows that we are able use a top like adeos (send
> interrupts later, always interrupt the linux-kernel).
>
> But... Linux is able to provide hard-realtime while interrupts are
> locked. And many services(driver) use this.
>
> abstract example:
> {{{
> spin_lock_irqsave
> if(hardware_data_valid())
> process_hardware_data()
> spin_lock_irqrestore
> }}}
Well, real-time is not about allowing this handler to perform un-preempted, but
rather to guarantee that the highest priority code will always get the CPU at
any point in time. If that handler is the most time-critical work to do on your
box, that's fine. But if it's not, this code is wrong, because it basically
wrecks real-time behaviour.
Actually, your basic assumption is flawed on a regular SMP kernel: what if the
lock is currently held by a task running on another CPU, that ends up being
preempted by an IRQ, or any higher priority task? Unless running a RT-capable
system, common spinlock loops are entered with hw interrupts off, therefore, you
end up locking interrupts for an undefined amount of time on your local CPU,
before being able to enter your critical section. So much for predictability,
both for entering that critical section, but above all for any other code that
would want to get the local CPU attention asap while your code is waiting for
the lock.
Masking interrupts may solve the preeemption issue on a uniprocessor box, but
this does not guarantee that any other time-critical part of the system
requiring immediate attention, because of its higher priority, will get the CPU
on time. Therefore, I don't see how this construct could be used to enforce
real-time, precisely because it completely ignores priorities.
> works fine without adeos,
In fact, it does not work in a reliable manner, without actual RTOS support.
The point is not about Adeos, which is only an enabler for real-time support,
which in turn brings proper preemption and priority management.
> but with adeos there may be a relative long
> interruption between validation and processing. The hardware may overrun
> and process_hardware_data is called without valid data...
>
> In our case we have this problem while the rx-interrupt of our
> ethernet-driver. The dma is running permanently and generates an overrun
> between the error-checking(which would catch the overrun) part and the
> data-processing part of the handler.
>
> I think it is possible that there could be many such (latent) problems
> in linux-kernel. For example USB which itself has realtime-requirements,
> or eventually mtd (lost data as cause of wrong flash-write/erase
> timings), ...
The whole idea underlying dual RT/non-RT systems, is that RT processes
1) share the available CPU horsepower between them all according to arbitrary
priorities, 2) should be part of a software system that leaves some cycles to
the non-RT processes whenever possible.
The example you described says basically: 1) any time-critical driver may lock
interrupts out in order to complete its duty un-preempted, 2) all time-critical
drivers have to perform their duty without stepping on each others toes with the
rather limited help of a single giant traffic light (i.e. hw interrupt masking,
and no priority scheme).
When designing such a system, you would have to think about the potential damage
high priority tasks may cause to low priority tasks, because of preemption, or
absence therereof. But again, you would have to do that with all kinds of RT
frameworks. This is something which could only be sorted out at coding design
level, which means that you would have to review the locking scheme for all
time-critical sections you want to care of in any case.
Once identified, those sections can be fixed individually, including with Adeos.
If they happen to be too complex for using a different kind of (ironed,
Adeos-aware) lock, then maybe they do not qualify for being atomic in the first
place.
Practically, if you don't want to put your MTD flash at risk on a dual RT/GPOS
design, use VxWorks, but in that case, do not run the MTD task along with other
tasks that may preempt it for a dangerously long time. Back to square #1, I'm
afraid, you have to know what your RT constraints are.
>
> So ... what do you think about that.
>
Native preemption turns spinlocks into rt-mutexes, which allows the code flow to
be diverted from the critical section for an undefined amount of time when the
CPU has to turn its attention to a higher priority task. So your spinlock
actually gives you no guarantee beyond proper serialization of the section in
question. Therefore, the issue you raised is not a co-kernel problem, it simply
expresses a general question about any RT design: which activity has highest
priority and lowest response time required? I don't think your are going to get
away with that problem only relying on hw interrupt masking.
To sum up, if a system has multiple time-critical duties to perform, well, I
see no other option than enumerating them, and building a sane priority design
accordingly. Depending on multi-domain execution the Adeos way, or native
preemption makes no difference here. At the end of the day, both of them will
bring predictability to my RT code, whilst a non-RT kernel will certainly
cause me headaches.
--
Philippe.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xenomai-help] Generel problem with realtime-tops(like adeos) over linux-kernel
2008-07-18 8:02 ` Philippe Gerum
@ 2008-07-19 7:27 ` Philippe Gerum
0 siblings, 0 replies; 4+ messages in thread
From: Philippe Gerum @ 2008-07-19 7:27 UTC (permalink / raw)
To: "Schlägl \"Manfred jun.\""; +Cc: xenomai
Philippe Gerum wrote:
> Schlägl Manfred jun. wrote:
>> But... Linux is able to provide hard-realtime while interrupts are
>> locked. And many services(driver) use this.
>>
>> abstract example:
>> {{{
>> spin_lock_irqsave
>> if(hardware_data_valid())
>> process_hardware_data()
>> spin_lock_irqrestore
>> }}}
>
> Well, real-time is not about allowing this handler to perform un-preempted, but
> rather to guarantee that the highest priority code will always get the CPU at
> any point in time. If that handler is the most time-critical work to do on your
> box, that's fine. But if it's not, this code is wrong, because it basically
> wrecks real-time behaviour.
>
> Actually, your basic assumption is flawed on a regular SMP kernel: what if the
> lock is currently held by a task running on another CPU, that ends up being
> preempted by an IRQ,
e.g. due to a mixed spin_lock(CPU #0) vs spin_lock_irq*(cpu #1) construct,
protecting two different accesses to the same critical resource. We could make
sure that such construct never involves time-critical code, but that would still
require to identify the potentially problematic code, and fix it accordingly.
IOW, we would have to go through the very same audit process than for a
virtualized IRQ system like Adeos.
In the former case (vanilla non-RT kernel), we would have to always use the
spin_lock_irq*() form to serialize accesses, in the latter case (Adeos-enabled
kernel), we would have to convert regular locks to raw Adeos locks
(ipipe_spin_lock_t).
Another scenario which comes to mind involves 3 CPUs and two locks on a vanilla
kernel, i.e.:
CPU #0 CPU #1 CPU #2
spin_lock(&lockB); spin_lock_irqsave(&lockA);
spin_lock(&lockB); spin_lock_irqsave(&lockA);
In that case, and despite the locking sequence seems fine, the driver code on
CPU #2 that attempts to grab lockA will wait until lockB is released on CPU #0,
which can induce significant delay, while CPU #2 spins with hw interrupts off.
Again, that kind of nested construct could be banned, but how would you know it
is never used without actually auditing the code?
or any higher priority task?
i.e. on a native preemption kernel, because spinlocks are turned into rt-mutexes
that allow rescheduling in what used to be truly atomic sections on vanilla kernels.
What I mean with those examples, is that your analysis is right when it comes to
the potential issue introduced by interrupt virtualization, but other problems
leading to the same consequence can appear with regular or native preemption
kernels as well. Therefore, the only reasonable answer is to address each
time-critical aspect specifically (e.g. MTD and such), and from that point, one
can fix them - such as introducing raw locks in Adeos-enabled kernels or
re-ordering task priorities with native preemption ones - regardless of the
underlying real-time infrastructure.
I'm not saying that such task is an easy one, all I'm saying is that there is no
Linux kernel that is virtuous by essence when it comes to predictable behaviour,
so we have to help them a bit by knowing about our real-time constraints in any
case.
--
Philippe.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-07-19 7:27 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-17 8:12 [Xenomai-help] Generel problem with realtime-tops(like adeos) over linux-kernel Schlägl Manfred jun.
2008-07-17 9:14 ` Jan Kiszka
2008-07-18 8:02 ` Philippe Gerum
2008-07-19 7:27 ` Philippe Gerum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.