* [Xenomai] I find a way to reduce latency, but I don't know why.
@ 2015-03-17 13:41 =?gb18030?B?yum09NfT?=
2015-03-18 7:51 ` Gilles Chanteperdrix
0 siblings, 1 reply; 5+ messages in thread
From: =?gb18030?B?yum09NfT?= @ 2015-03-17 13:41 UTC (permalink / raw)
To: =?gb18030?B?eGVub21haQ==?=
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb18030", Size: 476 bytes --]
The x86 machine have two CPUs, and I need periodic task on CPU1.
In order to reduce latency, a busy task will run on CPU2.
cpu1_task
{
rt_task_wait_period;
real_work;
rt_sem_v(&busySEM);
}
cpu2_task
{
rt_sem_p(&busySEM);
rt_timer_spin(spin_ns);
}
And spin_ns will be long enough: spin_ns + real_work_ns > periodic cycle.
Yes, it means rt_code is always running on CPU1 or CPU2.
In doing this, the cpu1_task will get low latency and drift, but why ?
thanks, zhou
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [Xenomai] I find a way to reduce latency, but I don't know why.
2015-03-17 13:41 [Xenomai] I find a way to reduce latency, but I don't know why =?gb18030?B?yum09NfT?=
@ 2015-03-18 7:51 ` Gilles Chanteperdrix
[not found] ` <tencent_1DF1E8A94FD8F46B59E71CC9@qq.com>
0 siblings, 1 reply; 5+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-18 7:51 UTC (permalink / raw)
To: 书呆子; +Cc: xenomai
On Tue, Mar 17, 2015 at 09:41:05PM +0800, 书呆子 wrote:
> The x86 machine have two CPUs, and I need periodic task on CPU1.
> In order to reduce latency, a busy task will run on CPU2.
>
> cpu1_task
> {
> rt_task_wait_period;
> real_work;
> rt_sem_v(&busySEM);
> }
>
> cpu2_task
> {
> rt_sem_p(&busySEM);
> rt_timer_spin(spin_ns);
> }
>
> And spin_ns will be long enough: spin_ns + real_work_ns > periodic cycle.
> Yes, it means rt_code is always running on CPU1 or CPU2.
>
> In doing this, the cpu1_task will get low latency and drift, but why ?
Two possibilities:
- task2 is cache friendly, so if the cache is shared between the two
cpus, it prevents Linux from evicting xenomai and I-pipe code from
the cache, we know that cache has a big influence on latency;
- task2 prevents the cpu from entering idle mode, and so potentially
prevents any low power optimization from happening, low power modes
are generally not latency friendly.
Note however that measurements on an idle system do not really
matter much for finding the worst case latency. Real measurements
should last for hours and with the machine put under load. I bet
with measurements done this way, the differences you observe with
the tests you do would be negligible compared to the worst case
latency. And if there is a real difference, then the I-pipe tracer
would help you find from where it comes.
--
Gilles.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai] I find a way to reduce latency, but I don't know why.
@ 2015-03-18 13:36 =?gb18030?B?WmhvdXBlbmc=?=
0 siblings, 0 replies; 5+ messages in thread
From: =?gb18030?B?WmhvdXBlbmc=?= @ 2015-03-18 13:36 UTC (permalink / raw)
To: =?gb18030?B?eGVub21haQ==?=
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb18030", Size: 2127 bytes --]
> On Tue, Mar 17, 2015 at 09:41:05PM +0800, Êé´ô×Ó wrote:
> > The x86 machine have two CPUs, and I need periodic task on CPU1.
> > In order to reduce latency, a busy task will run on CPU2.
> >
> > cpu1_task
> > {
> > rt_task_wait_period;
> > real_work;
> > rt_sem_v(&busySEM);
> > }
> >
> > cpu2_task
> > {
> > rt_sem_p(&busySEM);
> > rt_timer_spin(spin_ns);
> > }
> >
> > And spin_ns will be long enough: spin_ns + real_work_ns > periodic cycle.
> > Yes, it means rt_code is always running on CPU1 or CPU2.
> >
> > In doing this, the cpu1_task will get low latency and drift, but why ?
> Two possibilities:
> - task2 is cache friendly, so if the cache is shared between the two
> cpus, it prevents Linux from evicting xenomai and I-pipe code from
> the cache, we know that cache has a big influence on latency;
I agree this is the best guess, the I5 CPUs do share L3 cache between them.
I also use rt_timer_read() wrap real_work() to measure time consumption:
without task2 real_work() expend 20 to 36us, with task2 real_work()
expend 20 to 22us to compelte.
real_work() only contain float-point arithmetic, the time stability is
improved obviously.
If this is cache problem, is task2 the right way to keep code in cache ?
> - task2 prevents the cpu from entering idle mode, and so potentially
> prevents any low power optimization from happening, low power modes
> are generally not latency friendly.
CONFIG_CPU_FREQ and CONFIG_CPU_IDLE is already disabled, smi workaround
also enabled.
I have notice C1E mode is disabled in Xenomai 3, but not in Xenomai 2,
but C1E is triggered by mwait, I don't think this is the problem.
> Note however that measurements on an idle system do not really
> matter much for finding the worst case latency. Real measurements
> should last for hours and with the machine put under load. I bet
> with measurements done this way, the differences you observe with
> the tests you do would be negligible compared to the worst case
> latency. And if there is a real difference, then the I-pipe tracer
> would help you find from where it comes.
OK, I will try I-pipe tracer.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-03-23 2:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-17 13:41 [Xenomai] I find a way to reduce latency, but I don't know why =?gb18030?B?yum09NfT?=
2015-03-18 7:51 ` Gilles Chanteperdrix
[not found] ` <tencent_1DF1E8A94FD8F46B59E71CC9@qq.com>
2015-03-18 13:39 ` Gilles Chanteperdrix
2015-03-23 2:44 ` Zhoupeng
-- strict thread matches above, loose matches on Subject: below --
2015-03-18 13:36 =?gb18030?B?WmhvdXBlbmc=?=
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.