* [Adeos-main] RE: Interrupt Latency Question
@ 2005-04-14 14:55 Fillod Stephane
2005-04-14 15:47 ` Philippe Gerum
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Fillod Stephane @ 2005-04-14 14:55 UTC (permalink / raw)
To: Wolfgang Grandegger, rtai; +Cc: adeos-main
Wolfgang Grandegger wrote:
>It's also my experience, that the large latencies are
>due to TLB misses and cache refills, especially the
>latter one. What helps is L2 cache or fast memory.
>For example, on an MPC 5200 I get significately better
>latencies with DDR-RAM than with SDRAM (which is ca.
>20% slower).
I keep on hearing people are having feeling that their latency
can be caused by TLB misses/cache refills, but never seen proof.
Is there some literature about that subject? Nobody in the RTAI
community had curiosity to explain and fix this interesting problem?
If not, what about showing (or not) that the large latencies are due
to TLB misses/cache refills with a tool like Flushy?
Using Flushy would be like using low-end hardware. It's far easier to
make
performance improvements on low-end hardware than high-end. It works as
a
magnifying glass. It reminds me a comment on Gnome mailing list, where
an
end-user wished that developers had high-end compile machine, but slow
hardware to test with.
>>Have a look at http://rtai.dk/cgi-bin/gratiswiki.pl?Latency_Killer
>>To get real bad cases, try the Flushy module.
>>You can try also to disable caches for better predictability, but it
really
>>hurts :*)
>
>I will try it on an embedded PowerPC platform a.s.a.p.
After thought, there would be a better design for Flushy. Instead of
an infinite loop in a separate module(process), we should instead call
the TLB flush/cache invalidate right before entering the RT world
from ADEOS. Therefore, we should get "predictable" worst case latencies
wrt
TLB/cache conditions.
Where is the best place in ADEOS to do that?
The earlier, the better. Tapping at the exception level would be the
best, right before saving registers, but we need couple registers to
call the
TLB/cache flush.
Any idea?
I've Cc:'d the adeos-main list to reach some more gurus.
>>Note: if it turns out this latency is due to cache misses, then
solutions
>>exist.
>
>Can you be more precise here.
With reproducible latencies, we can then use OProfile (where available)
to
spot slow areas. We have to sort out whether TLB misses, I-cache misses
or
D-cache misses is the bigger culprit. Make your guess :-)
Modern processors have cache control instructions, like prefetch for
read,
zero cache line, writeback flush, etc. With nice cpp macros, we can use
them (where available) ahead of time in the previously spotted places,
to render the memory access latency predictable.
Do you think that will do it? Anybody has experience to share?
Thanks
--
Stephane
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Adeos-main] RE: Interrupt Latency Question
2005-04-14 14:55 [Adeos-main] RE: Interrupt Latency Question Fillod Stephane
@ 2005-04-14 15:47 ` Philippe Gerum
2005-04-14 20:20 ` Michael Neuhauser
2005-04-14 17:15 ` [Adeos-main] " Paolo Mantegazza
2005-04-17 9:32 ` Wolfgang Grandegger
2 siblings, 1 reply; 10+ messages in thread
From: Philippe Gerum @ 2005-04-14 15:47 UTC (permalink / raw)
To: Fillod Stephane; +Cc: Wolfgang Grandegger, rtai, adeos-main
Fillod Stephane wrote:
> Wolfgang Grandegger wrote:
>
>>It's also my experience, that the large latencies are
>>due to TLB misses and cache refills, especially the
>>latter one. What helps is L2 cache or fast memory.
>>For example, on an MPC 5200 I get significately better
>>latencies with DDR-RAM than with SDRAM (which is ca.
>>20% slower).
>
>
> I keep on hearing people are having feeling that their latency
> can be caused by TLB misses/cache refills, but never seen proof.
> Is there some literature about that subject? Nobody in the RTAI
> community had curiosity to explain and fix this interesting problem?
>
AFAIC, the curiosity is there, and better understanding the caching
behaviour of the nucleus is planned before fusion turns 1.0; after all,
the core can run inside a regular Linux process so we could even use
cachegrind for this. The same goes for Adeos, except that cachegrind is
obviously out of reach, so the usual tough way is currently followed,
when time allows.
For instance, this explains why the CONFIG_ADEOS_NOTHREADS came into
play in recent Adeos releases, but with limited success, since the cost
of switching domain stacks on low-end machines (Pentium 90Mhz-based
slug, Geode/x86 266 and IceCube/ppc) was apparently not worth the effort
of coding up this mode. On mid-range to high-end boxen,
the perceived benefits so far are nil, except perhaps that you don't
have to fiddle
with non-Linux allocated stacks inside your interrupt handlers (e.g.
"current" determination hack for x86). Maybe other have had better
results trying a similar approach on other archs (Michael, with ARM?), I
don't know. OTOH, the cache issues that could be triggered by the layout
of the Adeos domain descriptor (adomain_t) still bother me, and have not
been checked in depth so far AFAIK.
> If not, what about showing (or not) that the large latencies are due
> to TLB misses/cache refills with a tool like Flushy?
>
> Using Flushy would be like using low-end hardware. It's far easier to
> make
> performance improvements on low-end hardware than high-end. It works as
> a
> magnifying glass. It reminds me a comment on Gnome mailing list, where
> an
> end-user wished that developers had high-end compile machine, but slow
> hardware to test with.
>
More precisely, we need fast compile machines, low-end testing platforms
and fat brains. Guess which one I'm personally missing right now... :o>
>
>>>Have a look at http://rtai.dk/cgi-bin/gratiswiki.pl?Latency_Killer
>>>To get real bad cases, try the Flushy module.
>>>You can try also to disable caches for better predictability, but it
>
> really
>
>>>hurts :*)
>>
>>I will try it on an embedded PowerPC platform a.s.a.p.
>
>
> After thought, there would be a better design for Flushy. Instead of
> an infinite loop in a separate module(process), we should instead call
> the TLB flush/cache invalidate right before entering the RT world
> from ADEOS. Therefore, we should get "predictable" worst case latencies
> wrt
> TLB/cache conditions.
>
> Where is the best place in ADEOS to do that?
I'd say arch/ppc/kernel/adeos.c:__adeos_sync_stage(), this is the
interrupt log syncer. You will find this pattern:
if (adp == adp_root) {
/* dispatching ISR to Linux */
} else {
/* dispatching ISR to non-root domains. This is where you likely
want to play with the cache, before calling the handler. */
}
> The earlier, the better. Tapping at the exception level would be the
> best, right before saving registers, but we need couple registers to
> call the
> TLB/cache flush.
> Any idea?
>
Only to interpose before the pipelining stuff comes into play, you could
hook __adeos_grab_irq(), still in arch/ppc/kernel/adeos.c. It's called
right after the address translation has been switch on by the exception
transfer block, so it's quite early already.
> I've Cc:'d the adeos-main list to reach some more gurus.
>
>
>>>Note: if it turns out this latency is due to cache misses, then
>
> solutions
>
>>>exist.
>>
>>Can you be more precise here.
>
>
> With reproducible latencies, we can then use OProfile (where available)
> to
> spot slow areas. We have to sort out whether TLB misses, I-cache misses
> or
> D-cache misses is the bigger culprit. Make your guess :-)
> Modern processors have cache control instructions, like prefetch for
> read,
> zero cache line, writeback flush, etc. With nice cpp macros, we can use
> them (where available) ahead of time in the previously spotted places,
> to render the memory access latency predictable.
>
> Do you think that will do it? Anybody has experience to share?
>
>
> Thanks
--
Philippe.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Adeos-main] Re: Interrupt Latency Question
2005-04-14 14:55 [Adeos-main] RE: Interrupt Latency Question Fillod Stephane
2005-04-14 15:47 ` Philippe Gerum
@ 2005-04-14 17:15 ` Paolo Mantegazza
2005-04-19 18:35 ` Max Krasnyansky
2005-04-17 9:32 ` Wolfgang Grandegger
2 siblings, 1 reply; 10+ messages in thread
From: Paolo Mantegazza @ 2005-04-14 17:15 UTC (permalink / raw)
To: Fillod Stephane; +Cc: Wolfgang Grandegger, rtai, adeos-main
Fillod Stephane wrote:
> Wolfgang Grandegger wrote:
>
>>It's also my experience, that the large latencies are
>>due to TLB misses and cache refills, especially the
>>latter one. What helps is L2 cache or fast memory.
>>For example, on an MPC 5200 I get significately better
>>latencies with DDR-RAM than with SDRAM (which is ca.
>>20% slower).
>
>
> I keep on hearing people are having feeling that their latency
> can be caused by TLB misses/cache refills, but never seen proof.
> Is there some literature about that subject? Nobody in the RTAI
> community had curiosity to explain and fix this interesting problem?
>
> If not, what about showing (or not) that the large latencies are due
> to TLB misses/cache refills with a tool like Flushy?
>
> Using Flushy would be like using low-end hardware. It's far easier to
> make
> performance improvements on low-end hardware than high-end. It works as
> a
> magnifying glass. It reminds me a comment on Gnome mailing list, where
> an
> end-user wished that developers had high-end compile machine, but slow
> hardware to test with.
>
>
>>>Have a look at http://rtai.dk/cgi-bin/gratiswiki.pl?Latency_Killer
>>>To get real bad cases, try the Flushy module.
>>>You can try also to disable caches for better predictability, but it
>
> really
>
>>>hurts :*)
>>
>>I will try it on an embedded PowerPC platform a.s.a.p.
>
>
> After thought, there would be a better design for Flushy. Instead of
> an infinite loop in a separate module(process), we should instead call
> the TLB flush/cache invalidate right before entering the RT world
> from ADEOS. Therefore, we should get "predictable" worst case latencies
> wrt
> TLB/cache conditions.
>
> Where is the best place in ADEOS to do that?
> The earlier, the better. Tapping at the exception level would be the
> best, right before saving registers, but we need couple registers to
> call the
> TLB/cache flush.
> Any idea?
>
> I've Cc:'d the adeos-main list to reach some more gurus.
>
>
>>>Note: if it turns out this latency is due to cache misses, then
>
> solutions
>
>>>exist.
>>
>>Can you be more precise here.
>
>
> With reproducible latencies, we can then use OProfile (where available)
> to
> spot slow areas. We have to sort out whether TLB misses, I-cache misses
> or
> D-cache misses is the bigger culprit. Make your guess :-)
> Modern processors have cache control instructions, like prefetch for
> read,
> zero cache line, writeback flush, etc. With nice cpp macros, we can use
> them (where available) ahead of time in the previously spotted places,
> to render the memory access latency predictable.
>
> Do you think that will do it? Anybody has experience to share?
>
Either a GPCPU is good as it is or use a DSP, too much work for nothing
granted.
TLB is just one facet, what about pipe speculations, bus arbitration and
so on? Recall you have to let Linux work also.
If you have a multicpus machine and can reserve CPUs to real time only,
than the picture will change a lot. No Linux activity on them, just your
real time programs and irq handlers, likely stuck and fully cached to
those CPUs.
This is the solution you'll see native in Linux soon. With true lowcost
multicpus on a single chip massively available within a short time at
the kids' game and mama's word processors store it will change the whole
picture.
Paolo.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Adeos-main] RE: Interrupt Latency Question
2005-04-14 15:47 ` Philippe Gerum
@ 2005-04-14 20:20 ` Michael Neuhauser
0 siblings, 0 replies; 10+ messages in thread
From: Michael Neuhauser @ 2005-04-14 20:20 UTC (permalink / raw)
To: Philippe Gerum; +Cc: Fillod Stephane, Wolfgang Grandegger, rtai, adeos-main
On Thu, 2005-04-14 at 17:47, Philippe Gerum wrote:
> Fillod Stephane wrote:
>
> > I keep on hearing people are having feeling that their latency
> > can be caused by TLB misses/cache refills, but never seen proof.
> > Is there some literature about that subject? Nobody in the RTAI
> > community had curiosity to explain and fix this interesting problem?
>
> AFAIC, the curiosity is there, and better understanding the caching
> behaviour of the nucleus is planned before fusion turns 1.0; after all,
> the core can run inside a regular Linux process so we could even use
> cachegrind for this. The same goes for Adeos, except that cachegrind is
> obviously out of reach, so the usual tough way is currently followed,
> when time allows.
>
> For instance, this explains why the CONFIG_ADEOS_NOTHREADS came into
> play in recent Adeos releases, but with limited success, since the cost
> of switching domain stacks on low-end machines (Pentium 90Mhz-based
> slug, Geode/x86 266 and IceCube/ppc) was apparently not worth the effort
> of coding up this mode. On mid-range to high-end boxen,
> the perceived benefits so far are nil, except perhaps that you don't
> have to fiddle
> with non-Linux allocated stacks inside your interrupt handlers (e.g.
> "current" determination hack for x86). Maybe other have had better
> results trying a similar approach on other archs (Michael, with ARM?), I
Non-threaded Adeos helps a little on ARM, but the gain is nothing
compared to the penalty created by the way the caches work on ARM: as
virtual addresses are used to access the cache, it is necessary to flush
it completely *every* time a different process is switched in. This can
be demonstrated by running a simple test program like the following in
parallel to a real-time Adeos domain:
main() {
fork();
while (1)
sched_yield();
}
Worst-case latencies are achieved really quick with this setup :-)
Things are even worse if the dcache is configured for write-back:
interrupts have to be disabled during the write-back (switch_mm() call
in schedule()) and that adds 70 us to the worst-case latency on a 166
MHz ARM9 CPU (depends also on the RAM speed of course). You can get rid
of this by using write-through caching, but that decreases the
average-case performance.
The only solution (I have found) to the cold-cache-after-process-switch
problem would be to use MMU-less uClinux (see
http://www.linuxdevices.com/articles/AT2598317046.html)
or a scheme like FASS (see
http://www.disy.cse.unsw.edu.au/Software/FASS/) but both have their
disadvantages.
Mike
--
Dr. Michael Neuhauser phone: +43 1 789 08 49 - 30
Firmix Software GmbH fax: +43 1 789 08 49 - 55
Vienna/Austria/Europe email: mike@domain.hid
Embedded Linux Development and Services http://www.firmix.at/
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Adeos-main] Re: Interrupt Latency Question
2005-04-14 14:55 [Adeos-main] RE: Interrupt Latency Question Fillod Stephane
2005-04-14 15:47 ` Philippe Gerum
2005-04-14 17:15 ` [Adeos-main] " Paolo Mantegazza
@ 2005-04-17 9:32 ` Wolfgang Grandegger
2005-05-03 7:12 ` Der Herr Hofrat
2 siblings, 1 reply; 10+ messages in thread
From: Wolfgang Grandegger @ 2005-04-17 9:32 UTC (permalink / raw)
To: Fillod Stephane; +Cc: rtai, adeos-main
Hello,
I just run flushy on a low-end PowerPC system:
XPC855xxZPnnD4 at 80 MHz: 4 kB I-Cache 4 kB D-Cache FEC present
Unfortunately the results are not that clear than expected. I see the
latency going up a bit but other activities do increase it as well
(telnet, ping -f). Furthermore, from run to run the latency results are
different. Well, I think it's a complex and arch-dependent interplay of
various parameters, e.g. on the system above, the caches are quite small
and therefore the influence of cache refills is low. When I have more
time I might repeat the tests on other PowerPC archs as well.
There are a few things you can do to reduce the influence of TLB misses,
e.g. pinning TLB entries and there are corresponding kernel option on
some PowerPC archs. With a small patch you can then also load kernel
modules into kmalloc instead of vmalloc space to profit from the
pinning. Unfortunately the latency improvement depends on your
application and PowerPC arch and requires tedious tuning, which is not
appropriate in general.
Apart from that, you can do little to reduce the latency degradation due
to cache refills and TLB misses (at least not in a portable way). Linux
simply requires it.
Wolfgang.
On 04/14/2005 04:55 PM Fillod Stephane wrote:
> Wolfgang Grandegger wrote:
>>It's also my experience, that the large latencies are
>>due to TLB misses and cache refills, especially the
>>latter one. What helps is L2 cache or fast memory.
>>For example, on an MPC 5200 I get significately better
>>latencies with DDR-RAM than with SDRAM (which is ca.
>>20% slower).
>
> I keep on hearing people are having feeling that their latency
> can be caused by TLB misses/cache refills, but never seen proof.
> Is there some literature about that subject? Nobody in the RTAI
> community had curiosity to explain and fix this interesting problem?
>
> If not, what about showing (or not) that the large latencies are due
> to TLB misses/cache refills with a tool like Flushy?
>
> Using Flushy would be like using low-end hardware. It's far easier to
> make
> performance improvements on low-end hardware than high-end. It works as
> a
> magnifying glass. It reminds me a comment on Gnome mailing list, where
> an
> end-user wished that developers had high-end compile machine, but slow
> hardware to test with.
>
>>>Have a look at http://rtai.dk/cgi-bin/gratiswiki.pl?Latency_Killer
>>>To get real bad cases, try the Flushy module.
>>>You can try also to disable caches for better predictability, but it
> really
>>>hurts :*)
>>
>>I will try it on an embedded PowerPC platform a.s.a.p.
>
> After thought, there would be a better design for Flushy. Instead of
> an infinite loop in a separate module(process), we should instead call
> the TLB flush/cache invalidate right before entering the RT world
> from ADEOS. Therefore, we should get "predictable" worst case latencies
> wrt
> TLB/cache conditions.
>
> Where is the best place in ADEOS to do that?
> The earlier, the better. Tapping at the exception level would be the
> best, right before saving registers, but we need couple registers to
> call the
> TLB/cache flush.
> Any idea?
>
> I've Cc:'d the adeos-main list to reach some more gurus.
>
>>>Note: if it turns out this latency is due to cache misses, then
> solutions
>>>exist.
>>
>>Can you be more precise here.
>
> With reproducible latencies, we can then use OProfile (where available)
> to
> spot slow areas. We have to sort out whether TLB misses, I-cache misses
> or
> D-cache misses is the bigger culprit. Make your guess :-)
> Modern processors have cache control instructions, like prefetch for
> read,
> zero cache line, writeback flush, etc. With nice cpp macros, we can use
> them (where available) ahead of time in the previously spotted places,
> to render the memory access latency predictable.
>
> Do you think that will do it? Anybody has experience to share?
>
>
> Thanks
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Adeos-main] Re: Interrupt Latency Question
2005-04-14 17:15 ` [Adeos-main] " Paolo Mantegazza
@ 2005-04-19 18:35 ` Max Krasnyansky
2005-04-20 8:06 ` Paolo Mantegazza
0 siblings, 1 reply; 10+ messages in thread
From: Max Krasnyansky @ 2005-04-19 18:35 UTC (permalink / raw)
To: Paolo Mantegazza; +Cc: Fillod Stephane, Wolfgang Grandegger, rtai, adeos-main
Hi Paolo,
> If you have a multicpus machine and can reserve CPUs to real time only,
> than the picture will change a lot. No Linux activity on them, just your
> real time programs and irq handlers, likely stuck and fully cached to
> those CPUs.
>
> This is the solution you'll see native in Linux soon. With true lowcost
> multicpus on a single chip massively available within a short time at
> the kids' game and mama's word processors store it will change the whole
> picture.
Actually this is kind of available right now with vanilla 2.6 kernel.
I'm talking about CPU reservation. Here is an example.
Let's say we have dual CPU box and we want to dedicate CPU 1 to
our application:
- Configure the kernel with following boot options:
isolcpus=1 acpi_irq_nobalance noirqbalance
This excludes CPU 1 from the scheduler balancing logic. And disables
ACPI and SW irq balancing.
Make sure that you don't run user-space IRQ balancer.
- Redirect all interrupts to CPU 0
for i in /proc/irq/*; do
echo 1 > $i/smp_affinity;
done
- Your app call now migrate to CPU 1
int cpu = 1;
uint32_t mask = (1 << cpu);
sched_setaffinity(0, sizeof(mask), (cpu_set_t *) &mask);
That's it. CPU 1 is yours. There will be almost zero activity on it, besides
your task of course.
Max
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Adeos-main] Re: Interrupt Latency Question
2005-04-19 18:35 ` Max Krasnyansky
@ 2005-04-20 8:06 ` Paolo Mantegazza
2005-04-20 18:10 ` Max Krasnyansky
0 siblings, 1 reply; 10+ messages in thread
From: Paolo Mantegazza @ 2005-04-20 8:06 UTC (permalink / raw)
To: Max Krasnyansky; +Cc: Fillod Stephane, Wolfgang Grandegger, rtai, adeos-main
Max Krasnyansky wrote:
> Hi Paolo,
>
>> If you have a multicpus machine and can reserve CPUs to real time only,
>> than the picture will change a lot. No Linux activity on them, just your
>> real time programs and irq handlers, likely stuck and fully cached to
>> those CPUs.
>>
>> This is the solution you'll see native in Linux soon. With true
>> lowcost multicpus on a single chip massively available within a short
>> time at the kids' game and mama's word processors store it will change
>> the whole picture.
>
>
> Actually this is kind of available right now with vanilla 2.6 kernel.
> I'm talking about CPU reservation. Here is an example.
> Let's say we have dual CPU box and we want to dedicate CPU 1 to
> our application:
> - Configure the kernel with following boot options:
> isolcpus=1 acpi_irq_nobalance noirqbalance
>
> This excludes CPU 1 from the scheduler balancing logic. And disables
> ACPI and SW irq balancing.
> Make sure that you don't run user-space IRQ balancer.
>
> - Redirect all interrupts to CPU 0
> for i in /proc/irq/*; do
> echo 1 > $i/smp_affinity;
> done
>
> - Your app call now migrate to CPU 1
> int cpu = 1;
> uint32_t mask = (1 << cpu);
> sched_setaffinity(0, sizeof(mask), (cpu_set_t *) &mask);
>
> That's it. CPU 1 is yours. There will be almost zero activity on it,
> besides
> your task of course.
>
Well such a scheme has been available in RTAI even when it was not
general in Linux yet (forcing interruts to a CPU dates back to 2.2.x).
For real time CPU reservation I mean something that estabilshes it as
such directly at boot and are real time applications that have
specifically to use it, the rest is excluded from the very beginning
Paolo.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Adeos-main] Re: Interrupt Latency Question
2005-04-20 8:06 ` Paolo Mantegazza
@ 2005-04-20 18:10 ` Max Krasnyansky
0 siblings, 0 replies; 10+ messages in thread
From: Max Krasnyansky @ 2005-04-20 18:10 UTC (permalink / raw)
To: Paolo Mantegazza; +Cc: Fillod Stephane, Wolfgang Grandegger, rtai, adeos-main
Paolo Mantegazza wrote:
> For real time CPU reservation I mean something that estabilshes it as
> such directly at boot and are real time applications that have
> specifically to use it, the rest is excluded from the very beginning
That's exactly what 'isolcpus' option does. Tasks that want to use isolated
CPU have to explicitly call sched_setaffinity() to migrate.
Max
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Adeos-main] Re: Interrupt Latency Question
2005-04-17 9:32 ` Wolfgang Grandegger
@ 2005-05-03 7:12 ` Der Herr Hofrat
2005-05-03 16:11 ` Wolfgang Grandegger
0 siblings, 1 reply; 10+ messages in thread
From: Der Herr Hofrat @ 2005-05-03 7:12 UTC (permalink / raw)
To: Wolfgang Grandegger; +Cc: Fillod Stephane, rtai, adeos-main
> Hello,
>
> I just run flushy on a low-end PowerPC system:
>
> XPC855xxZPnnD4 at 80 MHz: 4 kB I-Cache 4 kB D-Cache FEC present
>
> Unfortunately the results are not that clear than expected. I see the
> latency going up a bit but other activities do increase it as well
> (telnet, ping -f). Furthermore, from run to run the latency results are
did some measurements on a number of boxes and ping -f is a bad test as
especially on low end systems it results in the kernel more or less running
the same code in an infinite loop - resulting in "good" values. If you want
to see the network layer influence use NetPIPE and see the jitter jump ;)
> different. Well, I think it's a complex and arch-dependent interplay of
> various parameters, e.g. on the system above, the caches are quite small
> and therefore the influence of cache refills is low. When I have more
> time I might repeat the tests on other PowerPC archs as well.
>
>
> Apart from that, you can do little to reduce the latency degradation due
> to cache refills and TLB misses (at least not in a portable way). Linux
> simply requires it.
>
has anybody ever used gcov feedbacks for ppc ? (run load with kernel compiled
with -fprofile-arcs recompile -fbranch-probabilities rerun load and test
jitter) . The PPC branch prediction should be almost ideal for this and that
would be a fairly portable way of doing it.
hofrat
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Adeos-main] Re: Interrupt Latency Question
2005-05-03 7:12 ` Der Herr Hofrat
@ 2005-05-03 16:11 ` Wolfgang Grandegger
0 siblings, 0 replies; 10+ messages in thread
From: Wolfgang Grandegger @ 2005-05-03 16:11 UTC (permalink / raw)
To: Der Herr Hofrat; +Cc: Fillod Stephane, rtai, adeos-main
On 05/03/2005 09:12 AM Der Herr Hofrat wrote:
>> Hello,
>>
>> I just run flushy on a low-end PowerPC system:
>>
>> XPC855xxZPnnD4 at 80 MHz: 4 kB I-Cache 4 kB D-Cache FEC present
>>
>> Unfortunately the results are not that clear than expected. I see the
>> latency going up a bit but other activities do increase it as well
>> (telnet, ping -f). Furthermore, from run to run the latency results are
>
> did some measurements on a number of boxes and ping -f is a bad test as
> especially on low end systems it results in the kernel more or less running
> the same code in an infinite loop - resulting in "good" values. If you want
> to see the network layer influence use NetPIPE and see the jitter jump ;)
>
>> different. Well, I think it's a complex and arch-dependent interplay of
>> various parameters, e.g. on the system above, the caches are quite small
>> and therefore the influence of cache refills is low. When I have more
>> time I might repeat the tests on other PowerPC archs as well.
>>
>>
>> Apart from that, you can do little to reduce the latency degradation due
>> to cache refills and TLB misses (at least not in a portable way). Linux
>> simply requires it.
>>
> has anybody ever used gcov feedbacks for ppc ? (run load with kernel compiled
> with -fprofile-arcs recompile -fbranch-probabilities rerun load and test
> jitter) . The PPC branch prediction should be almost ideal for this and that
> would be a fairly portable way of doing it.
I never tried that but I doubt that it will reduce cache refills and TLB
misses. I will have a closer look when time permits.
Wolfgang.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-05-03 16:11 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-14 14:55 [Adeos-main] RE: Interrupt Latency Question Fillod Stephane
2005-04-14 15:47 ` Philippe Gerum
2005-04-14 20:20 ` Michael Neuhauser
2005-04-14 17:15 ` [Adeos-main] " Paolo Mantegazza
2005-04-19 18:35 ` Max Krasnyansky
2005-04-20 8:06 ` Paolo Mantegazza
2005-04-20 18:10 ` Max Krasnyansky
2005-04-17 9:32 ` Wolfgang Grandegger
2005-05-03 7:12 ` Der Herr Hofrat
2005-05-03 16:11 ` Wolfgang Grandegger
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.