linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RT-thread on cpu0 affects performance of RT-thread on isolated cpu1
@ 2018-02-28 21:11 Yann le Chevoir
  2018-03-02  9:43 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 5+ messages in thread
From: Yann le Chevoir @ 2018-02-28 21:11 UTC (permalink / raw)
  To: linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 4549 bytes --]

Hello,

I am an engineering student and I try to proof that a 4000Hz hard real-time
application can run on an ARM board rather than on a more powerful machine.

I work with an IMX6 dual-core and PREEMPT_RT patch-4.1.38-rt46.
I expected that my 4000Hz thread will perform better if it is the only one
on core1, so I put the boot argument isolcpus=1 and bound my thread to cpu1.

With the isolcpus=1, note that it remains these processes on core1:

   PID    PSR    RTPRIO    CMD
   16     1      99        [migration/1]
   17     1      -         [rcuc/1]
   18     1      1         [ktimersoftd/1]
   19     1      -         [ksoftirqd/1]
   20     1      99        [posixcputmr/1]
   21     1      -         [kworker/1:0]
   22     1      -         [kworker/1:0H]

I tried several permutations in my kernel configuration and boot args
(rcu_nocbs is an example) and none affected the results I describe below.


I use a script to stress Linux. I expected that only cpu0 will be stressed
as cpu1 is isolated. But it has an impact on thread on cpu1 too.
I think it is normal.


First, as I draw it (in red) on “expected_behavior.png”, I expected much less
variations in the Latency and especially the Execution time.
(My thread always does the same thing).

How can we explain so much time variations? As I said, I tried to deactivate
all interrupts on cpu1 (rcu and others processes above) but I am not very
familiar with that.



Then, I am even more surprised when, trying to debug that, I decided to put
another thread on core0 and it improved the behavior of the thread on core1!


My application looks like:

main(){

     create a first 4000Hz thread (thread1), prio = 99, cpu = 1
     /*cpu1 is isolated*/

     create a second 4000Hz thread (thread0), prio = 98, cpu = 0
     /*To create this thread (cpu0) improves the performance of
      *the other thread (cpu1)!*/

     start both threads

     while(1){
          print_stat();
     }

}

thread1(){

     struct timespec start, stop, next, interval = 250us;

     /* Initialization of the periodicity */
     clock_gettime(CLOCK_REALTIME, &next);
     next += interval;

     while(1){
          /*Releases at specified rate*/
          clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &next, NULL);
          /*Get time to check jitter and execution time*/
          clock_gettime(CLOCK_REALTIME, &start);
          do_job();
          /*Get time to check execution time*/
          clock_gettime(CLOCK_REALTIME, &stop);
          do_stat(); //jitter = start-next; exec_time = stop-start
          next += interval;
     }

}

thread0(){
    struct timespec next, interval = 250us;

     /* Initialization of the periodicity */
     clock_gettime(CLOCK_REALTIME, &next);
     next += interval;

     while(1){
          /*Releases at specified rate*/
          clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &next, NULL);
          usleep(100);
          /****************************************************************
           * Without sleeping 100us, only the Latency of the other thread *
           * (on cpu1) is improved.                                       *
           * Sleeping 100us in this new 4000Hz thread (cpu0) improved     *
           * the execution time of the other thread (on cpu1)...          *
           ****************************************************************/
          next += interval;
     }

}

As you can see in “background_thread_on_core_0.png”, the Latency and the
Execution time (of the thread on core1) are improved (in comparison with
“no_background_thread.png”) when there is a new 4000Hz thread on cpu0
AND when this thread does something...

I tried a lot of permutations and I do not understand:
- If the new thread (cpu0) is at 5000Hz (>4000Hz), then observations
  are the same (performance of the thread on cpu1 improves)
- If the new thread is at 2000HZ (<4000Hz), then there is no improvement...

- If the new thread (4000Hz on cpu0) does something (even sleeping enough
  time), then the Execution time of the thread on cpu1 improves.
- If the new thread does nothing (or do too few stuff), then, ONLY the
  Latency of the thread on cpu1 is improved...

Do you have any experience with that, any idea to debug?
I wonder if the scheduler or the clock tick are bound to cpu0 and if it
can play a role in the responsiveness of the thread on cpu1 (isolated one).

Thanks,

Regards,

Yann

[-- Attachment #2: background_thread_on_core_0.png --]
[-- Type: image/png, Size: 5632 bytes --]

[-- Attachment #3: expected_behavior.png --]
[-- Type: image/png, Size: 12878 bytes --]

[-- Attachment #4: no_background_thread.png --]
[-- Type: image/png, Size: 5892 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-03-08 17:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-28 21:11 RT-thread on cpu0 affects performance of RT-thread on isolated cpu1 Yann le Chevoir
2018-03-02  9:43 ` Sebastian Andrzej Siewior
2018-03-06 21:16   ` Yann le Chevoir
2018-03-06 22:07     ` Julia Cartwright
2018-03-08 17:21     ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).