All of lore.kernel.org
 help / color / mirror / Atom feed
* thread load balancing on dual CPU Multicore AMD64 system
@ 2007-10-18 17:01 Gernot Hillier
  2007-10-25  1:26 ` Steven Rostedt
  0 siblings, 1 reply; 5+ messages in thread
From: Gernot Hillier @ 2007-10-18 17:01 UTC (permalink / raw)
  To: linux-rt-users

Hi!

We're currently evaluating whether PREEMPT_RT will work for a certain
use case combining realtime and performance requirements running on a
lot of CPUs and using a bunch of RAM.

For first tests, we're running a "small" AMD64 test system with 2x2
cores (2 CPUs with 2 cores each) with 8 GB of RAM.

We wrote a small testcase which basically has one SCHED_FIFO "realtime"
thread which does nothing but sleeping and checking if it wakes up at
the right time. In addition, it spawns 20 low-prio "load threads"
introducing a lot of malloc/memory access/free load on some GB of RAM.

We can see, that the realtime requirements are fulfilled quite well (if
using the current glibc with private futexes, but that's another story).
The "rt thread" reacts within the expected timeframe with 2.6.22.1-rt9
as well as with 2.6.23-rt1.

However, what causes problems is the load balancing of the 20 threads
over the available CPU cores:

- With 2.6.22.1-vanilla, the threads are distributed over all four
available cores
- With 2.6.22.1-rt9 (patched, but PREEMPT_RT & friends *disabled*), the
threads are distributed *only over two cores*, the others are idling
- With 2.6.22.1-rt9 (PREEMPT_RT & friends enabled), the threads are
distributed *only over two cores*, the others are idling

- With 2.6.23-rt1 (patched, but PREEMPT_RT & friends *disabled*), the
threads are distributed over all four cores
- With 2.6.23-rt1 (PREEMPT_RT & friends enabled), the threads are
distributed *only over two cores*, the others are idling

We have not set any CPU affinities in any case.

With the 2.6.22 series, I played a bit with
patch-2.6.22.1-rt9-broken-out.tar.bz2 and am quite sure that the
behaviour change is introduced by preempt-softirqs-core.patch. If I
apply anything up to and including this patch (according to the series
file), I see that only two cores are used. If I just remove this sole
patch, all four cores are used again.

Any ideas what causes these issues? Is this somehow connected to NUMA
optimizations or just a bug? Known one?

Any idea or suggestions welcome...

TIA!

----
With kind regards,
Gernot Hillier
Siemens AG, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thread load balancing on dual CPU Multicore AMD64 system
  2007-10-18 17:01 thread load balancing on dual CPU Multicore AMD64 system Gernot Hillier
@ 2007-10-25  1:26 ` Steven Rostedt
  2007-11-02 10:52   ` Gernot Hillier
  0 siblings, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2007-10-25  1:26 UTC (permalink / raw)
  To: Gernot Hillier; +Cc: linux-rt-users


--

On Thu, 18 Oct 2007, Gernot Hillier wrote:

> Hi!
>
> We're currently evaluating whether PREEMPT_RT will work for a certain
> use case combining realtime and performance requirements running on a
> lot of CPUs and using a bunch of RAM.
>
> For first tests, we're running a "small" AMD64 test system with 2x2
> cores (2 CPUs with 2 cores each) with 8 GB of RAM.
>
> We wrote a small testcase which basically has one SCHED_FIFO "realtime"
> thread which does nothing but sleeping and checking if it wakes up at
> the right time. In addition, it spawns 20 low-prio "load threads"
> introducing a lot of malloc/memory access/free load on some GB of RAM.
>
> We can see, that the realtime requirements are fulfilled quite well (if
> using the current glibc with private futexes, but that's another story).
> The "rt thread" reacts within the expected timeframe with 2.6.22.1-rt9
> as well as with 2.6.23-rt1.
>
> However, what causes problems is the load balancing of the 20 threads
> over the available CPU cores:
>

The latest 2.6.23-rt3 (as well as -rt2) has new RT balancing code. Could
you try that to see if it solves you issues.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thread load balancing on dual CPU Multicore AMD64 system
  2007-10-25  1:26 ` Steven Rostedt
@ 2007-11-02 10:52   ` Gernot Hillier
  2007-11-02 12:13     ` Steven Rostedt
  0 siblings, 1 reply; 5+ messages in thread
From: Gernot Hillier @ 2007-11-02 10:52 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-rt-users

Hi!

On Thu, 25 Oct 2007, Steven Rostedt wrote:
>> We're currently evaluating whether PREEMPT_RT will work for a certain
>> use case combining realtime and performance requirements running on a
>> lot of CPUs and using a bunch of RAM.
>>
>> For first tests, we're running a "small" AMD64 test system with 2x2
>> cores (2 CPUs with 2 cores each) with 8 GB of RAM.
>>
>> We wrote a small testcase which basically has one SCHED_FIFO "realtime"
>> thread which does nothing but sleeping and checking if it wakes up at
>> the right time. In addition, it spawns 20 low-prio "load threads"
>> introducing a lot of malloc/memory access/free load on some GB of RAM.
>>
>> We can see, that the realtime requirements are fulfilled quite well (if
>> using the current glibc with private futexes, but that's another story).
>> The "rt thread" reacts within the expected timeframe with 2.6.22.1-rt9
>> as well as with 2.6.23-rt1.
>>
>> However, what causes problems is the load balancing of the 20 threads
>> over the available CPU cores:
>>
> 
> The latest 2.6.23-rt3 (as well as -rt2) has new RT balancing code. Could
> you try that to see if it solves you issues.

Sorry - being a bit late here.

I now tried my testcase with the current 2.6.23.1-rt5 patch (please tell
if you're still interested in test results for the old 2.6.23-rt3!) and
it didn't get better. I still see only 2 CPUs being occupied by our test
program.

Interestingly enough, it seems I now can't use CPU2 and CPU3 at all -
even if I start several test processes in parallel. IIRC, this was
better with 2.6.22.1-rt9 - there CPU2 and CPU3 got used if I started two
test programs in parallel.

I can now even reproduce the problem with a simple kernel make. Here's a
snapshot of /proc/stat after compiling the kernel with "make -j 4":

MRBOX:~/linux-2.6.23.1 # cat /proc/stat
cpu  249482 0 46304 683034 9592 90 358 0 1 28
cpu0 124240 0 20690 99283 2767 50 214 0 0 8
cpu1 125242 0 25586 89781 6431 33 136 0 0 10
cpu2 0 0 8 246867 333 0 0 0 0 0
cpu3 0 0 18 247103 60 6 7 0 0 8
intr 115025
ctxt 8498063
btime 1193997310
processes 82614
procs_running 1
procs_blocked 0

As with 2.6.23-rt1, behaviour only breaks for me as soon as I enable
PREEMPT_RT.

To narrow it down a bit, I now played a bit with the configure options
of 2.6.23.1-rt5 (leaving PREEMPT_RT disabled and enabling the other RT
features one after another). The culprit for me is
CONFIG_PREEMPT_SOFTIRQS. As soon as I enable it, only CPU0 and 1 are
used. Disabling it again makes the kernel use all CPUs.

Any hint how to continue with this matter is greatly appreciated...

-- 
Gernot Hillier

Siemens AG, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thread load balancing on dual CPU Multicore AMD64 system
  2007-11-02 10:52   ` Gernot Hillier
@ 2007-11-02 12:13     ` Steven Rostedt
  2007-11-05  7:22       ` Hillier, Gernot
  0 siblings, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2007-11-02 12:13 UTC (permalink / raw)
  To: Gernot Hillier; +Cc: linux-rt-users


--
On Fri, 2 Nov 2007, Gernot Hillier wrote:

> On Thu, 25 Oct 2007, Steven Rostedt wrote:
> > The latest 2.6.23-rt3 (as well as -rt2) has new RT balancing code. Could
> > you try that to see if it solves you issues.
>
> Sorry - being a bit late here.
>
> I now tried my testcase with the current 2.6.23.1-rt5 patch (please tell
> if you're still interested in test results for the old 2.6.23-rt3!) and

No need for 2.6.23-rt3

> it didn't get better. I still see only 2 CPUs being occupied by our test
> program.
>
> Interestingly enough, it seems I now can't use CPU2 and CPU3 at all -

Is this also the case without PREEMPT_RT configured?

> even if I start several test processes in parallel. IIRC, this was
> better with 2.6.22.1-rt9 - there CPU2 and CPU3 got used if I started two
> test programs in parallel.
>
> I can now even reproduce the problem with a simple kernel make. Here's a
> snapshot of /proc/stat after compiling the kernel with "make -j 4":
>
> MRBOX:~/linux-2.6.23.1 # cat /proc/stat
> cpu  249482 0 46304 683034 9592 90 358 0 1 28
> cpu0 124240 0 20690 99283 2767 50 214 0 0 8
> cpu1 125242 0 25586 89781 6431 33 136 0 0 10
> cpu2 0 0 8 246867 333 0 0 0 0 0
> cpu3 0 0 18 247103 60 6 7 0 0 8
> intr 115025
> ctxt 8498063
> btime 1193997310
> processes 82614
> procs_running 1
> procs_blocked 0
>
> As with 2.6.23-rt1, behaviour only breaks for me as soon as I enable
> PREEMPT_RT.

But otherwise it runs fine (you can use CPU2 and CPU3)?

>
> To narrow it down a bit, I now played a bit with the configure options
> of 2.6.23.1-rt5 (leaving PREEMPT_RT disabled and enabling the other RT
> features one after another). The culprit for me is
> CONFIG_PREEMPT_SOFTIRQS. As soon as I enable it, only CPU0 and 1 are
> used. Disabling it again makes the kernel use all CPUs.
>
> Any hint how to continue with this matter is greatly appreciated...

Could you send me the contents of /proc/cpuinfo your dmesg on bootup
as well as the .config that you are using.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thread load balancing on dual CPU Multicore AMD64 system
  2007-11-02 12:13     ` Steven Rostedt
@ 2007-11-05  7:22       ` Hillier, Gernot
  0 siblings, 0 replies; 5+ messages in thread
From: Hillier, Gernot @ 2007-11-05  7:22 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-rt-users

Hi!

Steven Rostedt schrieb:
>> it didn't get better. I still see only 2 CPUs being occupied by our test
>> program.
>>
>> Interestingly enough, it seems I now can't use CPU2 and CPU3 at all -
> 
> Is this also the case without PREEMPT_RT configured?

Not sure if I understand your question, but let's try: When I disable
CONFIG_PREEMPT_SOFTIRQS, I can use all four CPUs.

>> even if I start several test processes in parallel. IIRC, this was
>> better with 2.6.22.1-rt9 - there CPU2 and CPU3 got used if I started two
>> test programs in parallel.
>>
>> I can now even reproduce the problem with a simple kernel make. Here's a
>> snapshot of /proc/stat after compiling the kernel with "make -j 4":
>>
>> MRBOX:~/linux-2.6.23.1 # cat /proc/stat
>> cpu  249482 0 46304 683034 9592 90 358 0 1 28
>> cpu0 124240 0 20690 99283 2767 50 214 0 0 8
>> cpu1 125242 0 25586 89781 6431 33 136 0 0 10
>> cpu2 0 0 8 246867 333 0 0 0 0 0
>> cpu3 0 0 18 247103 60 6 7 0 0 8
>> intr 115025
>> ctxt 8498063
>> btime 1193997310
>> processes 82614
>> procs_running 1
>> procs_blocked 0
>>
>> As with 2.6.23-rt1, behaviour only breaks for me as soon as I enable
>> PREEMPT_RT.
> 
> But otherwise it runs fine (you can use CPU2 and CPU3)?

Yes, as long as I don't enable CONFIG_PREEMPT_SOFTIRQS. When talking
about PREEMPT_RT here, I meant it as a synonym for "all those additional
config options coming with the -rt patch series".

>> To narrow it down a bit, I now played a bit with the configure options
>> of 2.6.23.1-rt5 (leaving PREEMPT_RT disabled and enabling the other RT
>> features one after another). The culprit for me is
>> CONFIG_PREEMPT_SOFTIRQS. As soon as I enable it, only CPU0 and 1 are
>> used. Disabling it again makes the kernel use all CPUs.
>>
>> Any hint how to continue with this matter is greatly appreciated...
> 
> Could you send me the contents of /proc/cpuinfo your dmesg on bootup
> as well as the .config that you are using.

Will follow in a few minutes via private mail.

-- 
Gernot Hillier

Siemens AG, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-11-05  7:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-18 17:01 thread load balancing on dual CPU Multicore AMD64 system Gernot Hillier
2007-10-25  1:26 ` Steven Rostedt
2007-11-02 10:52   ` Gernot Hillier
2007-11-02 12:13     ` Steven Rostedt
2007-11-05  7:22       ` Hillier, Gernot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.