linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* cyclictest better values with system load than without (OMAP3530 target)
@ 2013-11-26  9:21 Stefan Roese
  2013-11-26 14:21 ` Dmitry Lysenko
  2013-11-26 16:12 ` Clark Williams
  0 siblings, 2 replies; 12+ messages in thread
From: Stefan Roese @ 2013-11-26  9:21 UTC (permalink / raw)
  To: linux-rt-users

Hi!

I'm running cylictest on a OMAP3530 target board and am a bit
astonished about the results. Especially that the latency values
are better on a system with system load (hackbench) than on one
without system load. Here the values I get:

With system load (hackbench):
-----------------------------
# ./cyclictest -l 10000 -i 1000 -n -p 80 -q
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1853) P:80 I:1000 C:  10000 Min:     36 Act:  156 Avg:  154 Max:
    244

Idle system:
------------
# ./cyclictest -l 10000 -i 1000 -n -p 80 -q
# /dev/cpu_dma_latency set to 0us
T: 0 ( 2332) P:80 I:1000 C:  10000 Min:     81 Act:  530 Avg:  484 Max:
    602


Some details to my test/system setup:
- Linux v3.8.13
- preempt-rt patch 3.8.13-rt14
- HW: TI OMAP3530 CM_T35 board
- Latest cyclictest from rt-tests git repository


I might have misconfigured the system. So here some extracts from
my .config:

...
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
...
# CONFIG_CPU_FREQ is not set
# CONFIG_CPU_IDLE is not set
# CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
...
CONFIG_PREEMPT_RT_FULL=y
...

With CONFIG_NO_HZ disabled I get slightly better results:

With system load (hackbench):
-----------------------------
# ./cyclictest -l 10000 -i 1000 -n -p 80 -q
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1840) P:80 I:1000 C:  10000 Min:     30 Act:  153 Avg:  154 Max:
    238

Idle system:
-------------
# ./cyclictest -l 10000 -i 1000 -n -p 80 -q
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1371) P:80 I:1000 C:  10000 Min:     40 Act:  465 Avg:  435 Max:
    502


Any ideas/explanations are really appreciated.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cyclictest better values with system load than without (OMAP3530 target)
  2013-11-26  9:21 cyclictest better values with system load than without (OMAP3530 target) Stefan Roese
@ 2013-11-26 14:21 ` Dmitry Lysenko
  2013-11-26 19:14   ` Stefan Roese
  2013-11-26 16:12 ` Clark Williams
  1 sibling, 1 reply; 12+ messages in thread
From: Dmitry Lysenko @ 2013-11-26 14:21 UTC (permalink / raw)
  To: Stefan Roese; +Cc: linux-rt-users

Hi!

On Marvell Kirkwood at 1.2Ghz and 3.2.51-rt72 kernel I have slightly
better results:

--- with hackbench:

root@debian:~/rt-tests# ./cyclictest -l 10000 -i 1000 -n -p 80 -q
T: 0 (10268) P:80 I:1000 C:  10000 Min:     16 Act:   29 Avg:   29 Max:      46

--- w/o hackbench:

root@debian:~/rt-tests# ./cyclictest -l 10000 -i 1000 -n -p 80 -q
T: 0 (12686) P:80 I:1000 C:  10000 Min:      5 Act:    6 Avg:    6 Max:      22

-- .config:
CONFIG_TICK_ONESHOT=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y

# CONFIG_CPU_IDLE is not set

Try to play with:

# CONFIG_PM_RUNTIME is not set
# CONFIG_ARM_CPU_SUSPEND is not set

Best wishes,
 Dmitry.

2013/11/26 Stefan Roese <stefan.roese@gmail.com>
>
> Hi!
>
> I'm running cylictest on a OMAP3530 target board and am a bit
> astonished about the results. Especially that the latency values
> are better on a system with system load (hackbench) than on one
> without system load. Here the values I get:
>
> With system load (hackbench):
> -----------------------------
> # ./cyclictest -l 10000 -i 1000 -n -p 80 -q
> # /dev/cpu_dma_latency set to 0us
> T: 0 ( 1853) P:80 I:1000 C:  10000 Min:     36 Act:  156 Avg:  154 Max:
>     244
>
> Idle system:
> ------------
> # ./cyclictest -l 10000 -i 1000 -n -p 80 -q
> # /dev/cpu_dma_latency set to 0us
> T: 0 ( 2332) P:80 I:1000 C:  10000 Min:     81 Act:  530 Avg:  484 Max:
>     602
>
>
> Some details to my test/system setup:
> - Linux v3.8.13
> - preempt-rt patch 3.8.13-rt14
> - HW: TI OMAP3530 CM_T35 board
> - Latest cyclictest from rt-tests git repository
>
>
> I might have misconfigured the system. So here some extracts from
> my .config:
>
> ...
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ=y
> CONFIG_HIGH_RES_TIMERS=y
> ...
> # CONFIG_CPU_FREQ is not set
> # CONFIG_CPU_IDLE is not set
> # CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
> ...
> CONFIG_PREEMPT_RT_FULL=y
> ...
>
> With CONFIG_NO_HZ disabled I get slightly better results:
>
> With system load (hackbench):
> -----------------------------
> # ./cyclictest -l 10000 -i 1000 -n -p 80 -q
> # /dev/cpu_dma_latency set to 0us
> T: 0 ( 1840) P:80 I:1000 C:  10000 Min:     30 Act:  153 Avg:  154 Max:
>     238
>
> Idle system:
> -------------
> # ./cyclictest -l 10000 -i 1000 -n -p 80 -q
> # /dev/cpu_dma_latency set to 0us
> T: 0 ( 1371) P:80 I:1000 C:  10000 Min:     40 Act:  465 Avg:  435 Max:
>     502
>
>
> Any ideas/explanations are really appreciated.
>
> Thanks,
> Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cyclictest better values with system load than without (OMAP3530 target)
  2013-11-26  9:21 cyclictest better values with system load than without (OMAP3530 target) Stefan Roese
  2013-11-26 14:21 ` Dmitry Lysenko
@ 2013-11-26 16:12 ` Clark Williams
  2013-11-29 12:56   ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 12+ messages in thread
From: Clark Williams @ 2013-11-26 16:12 UTC (permalink / raw)
  To: Stefan Roese; +Cc: linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 2654 bytes --]

In my experience (on x86_64 mainly), that behavior (worse times when
not under load) is due to the overhead of coming out of power-save/idle
states. When you've got a big load on the system and all the cores are
active, then the power-save logic and/or the idle logic doesn't kick in
and devices aren't being powered down.

Do you know if your OMAP has power-save logic available? Alternatively
do you know how expensive the idle mechanism is? Have you tried booting
with idle=poll then measuring without a load?

Clark


On Tue, 26 Nov 2013 10:21:34 +0100
Stefan Roese <stefan.roese@gmail.com> wrote:

> Hi!
> 
> I'm running cylictest on a OMAP3530 target board and am a bit
> astonished about the results. Especially that the latency values
> are better on a system with system load (hackbench) than on one
> without system load. Here the values I get:
> 
> With system load (hackbench):
> -----------------------------
> # ./cyclictest -l 10000 -i 1000 -n -p 80 -q
> # /dev/cpu_dma_latency set to 0us
> T: 0 ( 1853) P:80 I:1000 C:  10000 Min:     36 Act:  156 Avg:  154 Max:
>     244
> 
> Idle system:
> ------------
> # ./cyclictest -l 10000 -i 1000 -n -p 80 -q
> # /dev/cpu_dma_latency set to 0us
> T: 0 ( 2332) P:80 I:1000 C:  10000 Min:     81 Act:  530 Avg:  484 Max:
>     602
> 
> 
> Some details to my test/system setup:
> - Linux v3.8.13
> - preempt-rt patch 3.8.13-rt14
> - HW: TI OMAP3530 CM_T35 board
> - Latest cyclictest from rt-tests git repository
> 
> 
> I might have misconfigured the system. So here some extracts from
> my .config:
> 
> ...
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ=y
> CONFIG_HIGH_RES_TIMERS=y
> ...
> # CONFIG_CPU_FREQ is not set
> # CONFIG_CPU_IDLE is not set
> # CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
> ...
> CONFIG_PREEMPT_RT_FULL=y
> ...
> 
> With CONFIG_NO_HZ disabled I get slightly better results:
> 
> With system load (hackbench):
> -----------------------------
> # ./cyclictest -l 10000 -i 1000 -n -p 80 -q
> # /dev/cpu_dma_latency set to 0us
> T: 0 ( 1840) P:80 I:1000 C:  10000 Min:     30 Act:  153 Avg:  154 Max:
>     238
> 
> Idle system:
> -------------
> # ./cyclictest -l 10000 -i 1000 -n -p 80 -q
> # /dev/cpu_dma_latency set to 0us
> T: 0 ( 1371) P:80 I:1000 C:  10000 Min:     40 Act:  465 Avg:  435 Max:
>     502
> 
> 
> Any ideas/explanations are really appreciated.
> 
> Thanks,
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cyclictest better values with system load than without (OMAP3530 target)
  2013-11-26 14:21 ` Dmitry Lysenko
@ 2013-11-26 19:14   ` Stefan Roese
  2013-11-26 20:14     ` Tim Sander
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Roese @ 2013-11-26 19:14 UTC (permalink / raw)
  To: Dmitry Lysenko; +Cc: linux-rt-users

Hi Dmitry!

On 26.11.2013 15:21, Dmitry Lysenko wrote:
> On Marvell Kirkwood at 1.2Ghz and 3.2.51-rt72 kernel I have slightly
> better results:
> 
> --- with hackbench:
> 
> root@debian:~/rt-tests# ./cyclictest -l 10000 -i 1000 -n -p 80 -q
> T: 0 (10268) P:80 I:1000 C:  10000 Min:     16 Act:   29 Avg:   29 Max:      46
> 
> --- w/o hackbench:
> 
> root@debian:~/rt-tests# ./cyclictest -l 10000 -i 1000 -n -p 80 -q
> T: 0 (12686) P:80 I:1000 C:  10000 Min:      5 Act:    6 Avg:    6 Max:      22
> 
> -- .config:
> CONFIG_TICK_ONESHOT=y
> # CONFIG_NO_HZ is not set
> CONFIG_HIGH_RES_TIMERS=y
> 
> # CONFIG_CPU_IDLE is not set
> 
> Try to play with:
> 
> # CONFIG_PM_RUNTIME is not set
> # CONFIG_ARM_CPU_SUSPEND is not set

Thanks, this was helpful. After some tweaks I was able to disable
CONFIG_PM_RUNTIME and ARM_CPU_SUSPEND. And after disabling NO_HZ and
less CONFIG_DEBUG_xxx options I now have the following results:

Idle:
# ./cyclictest -l 10000 -i 1000 -n -p 80 -q
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1382) P:80 I:1000 C:  10000 Min:     12 Act:  141 Avg:  127 Max:     202

Load:
# ./cyclictest -l 10000 -i 1000 -n -p 80 -q
# /dev/cpu_dma_latency set to 0us
T: 0 ( 2777) P:80 I:1000 C:  10000 Min:     26 Act:  167 Avg:  152 Max:     229

So the test now finally has better results on a idle system than on
one with heavy system load. The numbers are still far away from your
latency values on the 1.2GHz Kirkwood. Does anybody have OMAP3
values at hand to compare?

Thanks for your input,
Stefan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cyclictest better values with system load than without (OMAP3530 target)
  2013-11-26 19:14   ` Stefan Roese
@ 2013-11-26 20:14     ` Tim Sander
  0 siblings, 0 replies; 12+ messages in thread
From: Tim Sander @ 2013-11-26 20:14 UTC (permalink / raw)
  To: Stefan Roese; +Cc: Dmitry Lysenko, linux-rt-users

Hi Stefan
> On 26.11.2013 15:21, Dmitry Lysenko wrote:
> > On Marvell Kirkwood at 1.2Ghz and 3.2.51-rt72 kernel I have slightly
> > better results:
> > 
> > --- with hackbench:
> > 
> > root@debian:~/rt-tests# ./cyclictest -l 10000 -i 1000 -n -p 80 -q
> > T: 0 (10268) P:80 I:1000 C:  10000 Min:     16 Act:   29 Avg:   29 Max:   
> >   46
> > 
> > --- w/o hackbench:
> > 
> > root@debian:~/rt-tests# ./cyclictest -l 10000 -i 1000 -n -p 80 -q
> > T: 0 (12686) P:80 I:1000 C:  10000 Min:      5 Act:    6 Avg:    6 Max:   
> >   22
> > 
> > -- .config:
> > CONFIG_TICK_ONESHOT=y
> > # CONFIG_NO_HZ is not set
> > CONFIG_HIGH_RES_TIMERS=y
> > 
> > # CONFIG_CPU_IDLE is not set
> > 
> > Try to play with:
> > 
> > # CONFIG_PM_RUNTIME is not set
> > # CONFIG_ARM_CPU_SUSPEND is not set
> 
> Thanks, this was helpful. After some tweaks I was able to disable
> CONFIG_PM_RUNTIME and ARM_CPU_SUSPEND. And after disabling NO_HZ and
> less CONFIG_DEBUG_xxx options I now have the following results:
> 
> Idle:
> # ./cyclictest -l 10000 -i 1000 -n -p 80 -q
> # /dev/cpu_dma_latency set to 0us
> T: 0 ( 1382) P:80 I:1000 C:  10000 Min:     12 Act:  141 Avg:  127 Max:    
> 202
> 
> Load:
> # ./cyclictest -l 10000 -i 1000 -n -p 80 -q
> # /dev/cpu_dma_latency set to 0us
> T: 0 ( 2777) P:80 I:1000 C:  10000 Min:     26 Act:  167 Avg:  152 Max:    
> 229
> 
> So the test now finally has better results on a idle system than on
> one with heavy system load. The numbers are still far away from your
> latency values on the 1.2GHz Kirkwood. Does anybody have OMAP3
> values at hand to compare?
I have not used OMAP3 but Sitara a while ago. But as far as i know they are 
derived from the same die-mask. I think the numbers where a little better than 
yours but it was still in the 100 range which was to large for my use-case 
:-(. I haven't seen an cortex a8 with good realtime performance. The A9 seems 
much better in this regard from my experience. If you need better numbers ping 
me again. Probably i will find the results again on the other machine currently 
out of reach.

One thing that really bothers is that ARM repurposed the FIC with some 
security features which are not that useful for realtime and i think most of 
the time Digital Restrictions Management does more harm than good. But thats
a little offtopic for this mailing list.

Best regards
Tim

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cyclictest better values with system load than without (OMAP3530 target)
  2013-11-26 16:12 ` Clark Williams
@ 2013-11-29 12:56   ` Sebastian Andrzej Siewior
  2013-11-29 15:10     ` Carsten Emde
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-11-29 12:56 UTC (permalink / raw)
  To: Clark Williams; +Cc: Stefan Roese, linux-rt-users

* Clark Williams | 2013-11-26 10:12:32 [-0600]:

>In my experience (on x86_64 mainly), that behavior (worse times when
>not under load) is due to the overhead of coming out of power-save/idle
>states. When you've got a big load on the system and all the cores are
>active, then the power-save logic and/or the idle logic doesn't kick in
>and devices aren't being powered down.
This is the case here, too. The overhead comming out of a deep power
state plus the invalidated caches.

>Do you know if your OMAP has power-save logic available? Alternatively
>do you know how expensive the idle mechanism is? Have you tried booting
>with idle=poll then measuring without a load?

idle=pull is x86 only. Disable the complete PM stuff should solve the
issue. You could go via arch_cpu_idle() to check what is used.
The idle routine should come either via cpuidle_idle_call() or arm_pm_idle().

>Clark

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cyclictest better values with system load than without (OMAP3530 target)
  2013-11-29 12:56   ` Sebastian Andrzej Siewior
@ 2013-11-29 15:10     ` Carsten Emde
  2013-11-29 16:36       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 12+ messages in thread
From: Carsten Emde @ 2013-11-29 15:10 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Clark Williams, Stefan Roese; +Cc: linux-rt-users

On 11/29/2013 01:56 PM, Sebastian Andrzej Siewior wrote:
> * Clark Williams | 2013-11-26 10:12:32 [-0600]:
>
>> In my experience (on x86_64 mainly), that behavior (worse times when
>> not under load) is due to the overhead of coming out of power-save/idle
>> states. When you've got a big load on the system and all the cores are
>> active, then the power-save logic and/or the idle logic doesn't kick in
>> and devices aren't being powered down.
> This is the case here, too. The overhead comming out of a deep power
> state plus the invalidated caches.
Sorry, I feel that the discussion a somewhat out of sync with the 
original posting. Let me explain.

Among others, processors may use two completely different interfaces to 
save power:
1. Sleep states aka C states, Linux interface cpuidle
2. Clock frequency modulation aka P states, Linux interface cpufreq

1. Sleep states
Processors may come with a number of C states from light sleep to deep 
sleep to save power when idle. The longer a processor is idle, the 
deeper normally is the sleep state the processor may enter. Sleep states 
may be disabled i) on a per-processor and per-state basis in 
/sys/devices/system/cpu/cpuX/cpuidle/stateX/disable or ii) altogether 
using the somewhat mislabeled /dev/cpu_dma_latency pseudo device. As far 
as cyclictest is concerned, sleep states normally are disabled 
altogether. If this is the case, cyclictest prints the message:
# /dev/cpu_dma_latency set to 0us
The original posting contains this line. In consequence, sleep states 
cannot be responsible for any observed latency prolongation. To check 
whether sleep states are disabled, the command
   # cat /sys/devices/system/cpu/cpu0/cpuidle/state?/time
may be used repeatedly for every CPU. If sleep states are disabled 
correctly, only the first state (poll state) may increase such as

# cat /sys/devices/system/cpu/cpu0/cpuidle/state?/time
444330737734
234393550
1760323375
1234658099
183251179053

and sometime later

# cat /sys/devices/system/cpu/cpu0/cpuidle/state?/time
444417947595
234393550
1760323375
1234658099
183251179053

BTW: The cyclictest source contains a related comment:
/* Latency trick
  * if the file /dev/cpu_dma_latency exists,
  * open it and write a zero into it. This will tell
  * the power management system not to transition to
  * a high cstate (in fact, the system acts like idle=poll)
  * When the fd to /dev/cpu_dma_latency is closed, the behavior
  * goes back to the system default.
  *
  * Documentation/power/pm_qos_interface.txt
  */

2. Clock frequency modulation
This is an entirely different story as cylictest has no business with it 
at all. The clock frequency of x86 processors has a more or less linear 
effect on latency, e.g. a system running at 1 GHz will show a latency 
that is twice as high as when running at 2 GHz. ARM processors, however, 
behave differently. Many ARM cores do not provide acceptable latency 
values unless running at full speed. It is, therefore, often mandatory 
to switch to the performance CPU frequency governor before starting 
cyclictest or before running a real-world user space application that 
relies on minimum latency. The /sys/devices/system/cpu/cpu0/cpufreq 
interface is available to manage P states:
Switch to maximum performance:
cd /sys/devices/system/cpu/
for i in cpu?/cpufreq/scaling_governor
do
   echo performance >$i
done
Switch to on-demand frequency modulation:
for i in cpu?/cpufreq/scaling_governor
do
   echo ondemand >$i
done

BTW: Power saving and real-time do not necessarily exclude each other. 
If a - still deterministic - but a little longer latency is acceptable, 
some light sleep states and a somewhat lower clock frequency may be 
allowed which still may result in considerable energy saving. If, 
however, the fastest possible real-time response is required, C states 
and P states must be disabled (or set to polling and maximum speed, 
repsectively) and the power bill must be payed.

> So the test now finally has better results on a idle system than on
> one with heavy system load. The numbers are still far away from your
> latency values on the 1.2GHz Kirkwood. Does anybody have OMAP3
> values at hand to compare?
This is why we run the OSADL QA Farm. An AM3359 system is in rack 7, 
slot 5 -> https://www.osadl.org/?id=1590. We run 100 million cycles with 
200 µs cycle interval (which takes about 5 hours and 33 minutes) to 
obtain reliable data. In addition, the processor is in idle state but 
also executing defined load scenarios during the recording. Please do 
the same before you compare the results. To facilitate the comparison, 
the cyclictest command line is given below every plot, and any other 
relevant information (including kernel command line) is available in the 
systems' profiles.

Hope this helps,
	-Carsten.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cyclictest better values with system load than without (OMAP3530 target)
  2013-11-29 15:10     ` Carsten Emde
@ 2013-11-29 16:36       ` Gilles Chanteperdrix
  2013-11-29 16:58         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 12+ messages in thread
From: Gilles Chanteperdrix @ 2013-11-29 16:36 UTC (permalink / raw)
  To: Carsten Emde
  Cc: Sebastian Andrzej Siewior, Clark Williams, Stefan Roese,
	linux-rt-users

On 11/29/2013 04:10 PM, Carsten Emde wrote:
> BTW: Power saving and real-time do not necessarily exclude each other.
> If a - still deterministic - but a little longer latency is acceptable,
> some light sleep states and a somewhat lower clock frequency may be
> allowed which still may result in considerable energy saving. If,
> however, the fastest possible real-time response is required, C states
> and P states must be disabled (or set to polling and maximum speed,
> repsectively) and the power bill must be payed.

Well, I do not fully agree. To be sure that you can clock down the 
processor for executing a task which has sufficient time to meet its 
deadline, your system must be "time triggered", all the timer events 
must be known in advance. Because on a fully dynamic system, you may 
make that decision, but a new timer may be scheduled which causes the 
system to miss its deadline whereas it would not have missed it if it 
had run at full speed.

-- 
					    Gilles.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cyclictest better values with system load than without (OMAP3530 target)
  2013-11-29 16:36       ` Gilles Chanteperdrix
@ 2013-11-29 16:58         ` Gilles Chanteperdrix
  2013-11-29 17:36           ` Carsten Emde
  0 siblings, 1 reply; 12+ messages in thread
From: Gilles Chanteperdrix @ 2013-11-29 16:58 UTC (permalink / raw)
  To: Carsten Emde
  Cc: Sebastian Andrzej Siewior, Clark Williams, Stefan Roese,
	linux-rt-users

On 11/29/2013 05:36 PM, Gilles Chanteperdrix wrote:
> On 11/29/2013 04:10 PM, Carsten Emde wrote:
>> BTW: Power saving and real-time do not necessarily exclude each other.
>> If a - still deterministic - but a little longer latency is acceptable,
>> some light sleep states and a somewhat lower clock frequency may be
>> allowed which still may result in considerable energy saving. If,
>> however, the fastest possible real-time response is required, C states
>> and P states must be disabled (or set to polling and maximum speed,
>> repsectively) and the power bill must be payed.
>
> Well, I do not fully agree. To be sure that you can clock down the
> processor for executing a task which has sufficient time to meet its
> deadline, your system must be "time triggered", all the timer events
> must be known in advance. Because on a fully dynamic system, you may
> make that decision, but a new timer may be scheduled which causes the
> system to miss its deadline whereas it would not have missed it if it
> had run at full speed.
>
And a second problem is that you must know the task WCET, which on a 
modern system:
- depends on the task frequency;
- depends on the IRQ load.
Again, only a time triggered system seems to make this possible.

-- 
					    Gilles.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cyclictest better values with system load than without (OMAP3530 target)
  2013-11-29 16:58         ` Gilles Chanteperdrix
@ 2013-11-29 17:36           ` Carsten Emde
  2013-11-29 19:34             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 12+ messages in thread
From: Carsten Emde @ 2013-11-29 17:36 UTC (permalink / raw)
  To: Gilles Chanteperdrix
  Cc: Sebastian Andrzej Siewior, Clark Williams, Stefan Roese,
	linux-rt-users

Hi Gilles,

> On 11/29/2013 05:36 PM, Gilles Chanteperdrix wrote:
>> On 11/29/2013 04:10 PM, Carsten Emde wrote:
>>> BTW: Power saving and real-time do not necessarily exclude each other.
>>> If a - still deterministic - but a little longer latency is acceptable,
>>> some light sleep states and a somewhat lower clock frequency may be
>>> allowed which still may result in considerable energy saving. If,
>>> however, the fastest possible real-time response is required, C states
>>> and P states must be disabled (or set to polling and maximum speed,
>>> respectively) and the power bill must be payed.
>> Well, I do not fully agree. To be sure that you can clock down the
>> processor for executing a task which has sufficient time to meet its
>> deadline, your system must be "time triggered", all the timer events
>> must be known in advance. Because on a fully dynamic system, you may
>> make that decision, but a new timer may be scheduled which causes the
>> system to miss its deadline whereas it would not have missed it if it
>> had run at full speed.
> And a second problem is that you must know the task WCET, which on a
> modern system:
> - depends on the task frequency;
> - depends on the IRQ load.
> Again, only a time triggered system seems to make this possible.
Hmm, I'm not sure whether I correctly got your point.

Let me try an example: A 1-GHz CPU of a given systems runs at full speed 
with frequency governor set to performance and provides the required 
real-time capabilities. When a second system with the same capabilities 
was needed, the 1-GHz CPU unfortunately was out of stock, and the 
decision was made to buy the 2-GHz variant of the processor. To save 
energy, however, the clock frequency of the second system was set to 1 
GHz using the cpufreq interface. Are you arguing that the 2-GHz 
processor that is throttled down to 1 GHz has a slower response time 
than the 1-GHz processor that always runs at full speed?

	-Carsten.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cyclictest better values with system load than without (OMAP3530 target)
  2013-11-29 17:36           ` Carsten Emde
@ 2013-11-29 19:34             ` Gilles Chanteperdrix
  2013-11-29 21:10               ` Carsten Emde
  0 siblings, 1 reply; 12+ messages in thread
From: Gilles Chanteperdrix @ 2013-11-29 19:34 UTC (permalink / raw)
  To: Carsten Emde
  Cc: Sebastian Andrzej Siewior, Clark Williams, Stefan Roese,
	linux-rt-users

On 11/29/2013 06:36 PM, Carsten Emde wrote:
> Hi Gilles,
> 
>> On 11/29/2013 05:36 PM, Gilles Chanteperdrix wrote:
>>> On 11/29/2013 04:10 PM, Carsten Emde wrote:
>>>> BTW: Power saving and real-time do not necessarily exclude each other.
>>>> If a - still deterministic - but a little longer latency is acceptable,
>>>> some light sleep states and a somewhat lower clock frequency may be
>>>> allowed which still may result in considerable energy saving. If,
>>>> however, the fastest possible real-time response is required, C states
>>>> and P states must be disabled (or set to polling and maximum speed,
>>>> respectively) and the power bill must be payed.
>>> Well, I do not fully agree. To be sure that you can clock down the
>>> processor for executing a task which has sufficient time to meet its
>>> deadline, your system must be "time triggered", all the timer events
>>> must be known in advance. Because on a fully dynamic system, you may
>>> make that decision, but a new timer may be scheduled which causes the
>>> system to miss its deadline whereas it would not have missed it if it
>>> had run at full speed.
>> And a second problem is that you must know the task WCET, which on a
>> modern system:
>> - depends on the task frequency;
>> - depends on the IRQ load.
>> Again, only a time triggered system seems to make this possible.
> Hmm, I'm not sure whether I correctly got your point.
> 
> Let me try an example: A 1-GHz CPU of a given systems runs at full speed 
> with frequency governor set to performance and provides the required 
> real-time capabilities. When a second system with the same capabilities 
> was needed, the 1-GHz CPU unfortunately was out of stock, and the 
> decision was made to buy the 2-GHz variant of the processor. To save 
> energy, however, the clock frequency of the second system was set to 1 
> GHz using the cpufreq interface. Are you arguing that the 2-GHz 
> processor that is throttled down to 1 GHz has a slower response time 
> than the 1-GHz processor that always runs at full speed?

I probably misread what you were saying and thought you were talking
about dynamically changing the processor frequency when knowing that the
WCET of a task allows running it with a smaller frequency and still meet
its deadline. The thing implemented here for instance:
https://code.google.com/p/xenomaiote/

So called OTE algorithm (but I do not find what this acronym means exactly).

-- 
                                                                Gilles.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cyclictest better values with system load than without (OMAP3530 target)
  2013-11-29 19:34             ` Gilles Chanteperdrix
@ 2013-11-29 21:10               ` Carsten Emde
  0 siblings, 0 replies; 12+ messages in thread
From: Carsten Emde @ 2013-11-29 21:10 UTC (permalink / raw)
  To: Gilles Chanteperdrix
  Cc: Sebastian Andrzej Siewior, Clark Williams, Stefan Roese,
	linux-rt-users

On 11/29/2013 08:34 PM, Gilles Chanteperdrix wrote:
> On 11/29/2013 06:36 PM, Carsten Emde wrote:
>>> On 11/29/2013 05:36 PM, Gilles Chanteperdrix wrote:
>>>> On 11/29/2013 04:10 PM, Carsten Emde wrote:
>>>>> BTW: Power saving and real-time do not necessarily exclude each other.
>>>>> If a - still deterministic - but a little longer latency is acceptable,
>>>>> some light sleep states and a somewhat lower clock frequency may be
>>>>> allowed which still may result in considerable energy saving. If,
>>>>> however, the fastest possible real-time response is required, C states
>>>>> and P states must be disabled (or set to polling and maximum speed,
>>>>> respectively) and the power bill must be payed.
>>>> Well, I do not fully agree. To be sure that you can clock down the
>>>> processor for executing a task which has sufficient time to meet its
>>>> deadline, your system must be "time triggered", all the timer events
>>>> must be known in advance. Because on a fully dynamic system, you may
>>>> make that decision, but a new timer may be scheduled which causes the
>>>> system to miss its deadline whereas it would not have missed it if it
>>>> had run at full speed.
>>> And a second problem is that you must know the task WCET, which on a
>>> modern system:
>>> - depends on the task frequency;
>>> - depends on the IRQ load.
>>> Again, only a time triggered system seems to make this possible.
>> Hmm, I'm not sure whether I correctly got your point.
>> Let me try an example: A 1-GHz CPU of a given systems runs at full speed
>> with frequency governor set to performance and provides the required
>> real-time capabilities. When a second system with the same capabilities
>> was needed, the 1-GHz CPU unfortunately was out of stock, and the
>> decision was made to buy the 2-GHz variant of the processor. To save
>> energy, however, the clock frequency of the second system was set to 1
>> GHz using the cpufreq interface. Are you arguing that the 2-GHz
>> processor that is throttled down to 1 GHz has a slower response time
>> than the 1-GHz processor that always runs at full speed?
> I probably misread what you were saying and thought you were talking
> about dynamically changing the processor frequency when knowing that the
> WCET of a task allows running it with a smaller frequency and still meet
> its deadline. The thing implemented here for instance:
> https://code.google.com/p/xenomaiote/
Ah, no, the whole thing was about static setting. I got your point wrong 
since I didn't know the proposal to dynamically change the clock 
frequency in relation to the WCET of a task. Sounds like a cool feature 
- but maybe a little to early to appear on our PREEMPT_RT wish list ...

Thanks,
	-Carsten.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-11-29 21:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-26  9:21 cyclictest better values with system load than without (OMAP3530 target) Stefan Roese
2013-11-26 14:21 ` Dmitry Lysenko
2013-11-26 19:14   ` Stefan Roese
2013-11-26 20:14     ` Tim Sander
2013-11-26 16:12 ` Clark Williams
2013-11-29 12:56   ` Sebastian Andrzej Siewior
2013-11-29 15:10     ` Carsten Emde
2013-11-29 16:36       ` Gilles Chanteperdrix
2013-11-29 16:58         ` Gilles Chanteperdrix
2013-11-29 17:36           ` Carsten Emde
2013-11-29 19:34             ` Gilles Chanteperdrix
2013-11-29 21:10               ` Carsten Emde

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).