HT (Hyper Threading) aware process scheduling doesn't work as it should

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* HT (Hyper Threading) aware process scheduling doesn't work as it should
@ 2011-10-30 19:57 Artem S. Tashkinov
  2011-10-30 21:26 ` Henrique de Moraes Holschuh
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Artem S. Tashkinov @ 2011-10-30 19:57 UTC (permalink / raw)
  To: linux-kernel

Hello,

It's known that if you want to reach maximum performance on HT enabled Intel CPUs you
should distribute the load evenly between physical cores, and when you have loaded all
of them you should then load the remaining virtual cores.

For example, if you have 4 physical cores and 8 virtual CPUs then if you have just four
tasks consuming 100% of CPU time you should load four CPU pairs:

VCPUs: {1,2} - one task running

VCPUs: {3,4} - one task running

VCPUs: {5,6} - one task running

VCPUs: {7,8} - one task running

It's absolutely detrimental to performance to bind two tasks to e.g. two physical cores
{1,2} {3,4} and then the remaining two tasks to e.g. the third core 5,6:

VCPUs: {1,2} - one task running

VCPUs: {3,4} - one task running

VCPUs: {5,6} - *two* task runnings

VCPUs: {7,8} - no tasks running

I've found out that even on Linux 3.0.8 the process scheduler doesn't correctly distributes
the load amongst virtual CPUs. E.g. on a 4-core system (8 total virtual CPUs) the process
scheduler often run some instances of four different tasks on the same physical CPU.

Maybe I shouldn't trust top/htop output on this matter but the same test carried out on
Microsoft XP OS shows that it indeed distributes the load correctly, running tasks on different
physical cores whenever possible.

Any thoughts? comments? I think this is quite a serious problem.

Best wishes,

Artem

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-30 19:57 HT (Hyper Threading) aware process scheduling doesn't work as it should Artem S. Tashkinov
@ 2011-10-30 21:26 ` Henrique de Moraes Holschuh
  2011-10-30 21:51   ` Artem S. Tashkinov
  2011-10-31 18:59   ` Chris Friesen
  2011-10-30 22:12 ` Arjan van de Ven
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 25+ messages in thread
From: Henrique de Moraes Holschuh @ 2011-10-30 21:26 UTC (permalink / raw)
  To: Artem S. Tashkinov; +Cc: linux-kernel

On Sun, 30 Oct 2011, Artem S. Tashkinov wrote:
> I've found out that even on Linux 3.0.8 the process scheduler doesn't correctly distributes
> the load amongst virtual CPUs. E.g. on a 4-core system (8 total virtual CPUs) the process
> scheduler often run some instances of four different tasks on the same physical CPU.

Please check how your sched_mc_power_savings and sched_smt_power_savings
tunables.   Here's the doc from lesswats.org:

'sched_mc_power_savings' tunable under /sys/devices/system/cpu/ controls
the Multi-core related tunable. By default, this is set to '0' (for
optimal performance). By setting this to '1', under light load
scenarios, the process load is distributed such that all the cores in a
processor package are busy before distributing the process load to other
processor packages.

[...]

'sched_smt_power_savings' tunable under /sys/devices/system/cpu/
controls the multi-threading related tunable. By default, this is set to
'0' (for optimal performance). By setting this to '1', under light load
scenarios, the process load is distributed such that all the threads in
a core and all the cores in a processor package are busy before
distributing the process load to threads and cores, in other processor
packages. 

Please make sure both are set to 0.  If they were not 0 at the time you
ran your tests, please retest and report back.

You also want to make sure you _do_ have the SMT scheduler compiled in
whatever kernel you're using, just in case.

It is certainly possible that there is a bug in the scheduler, but it is
best to make sure it is not something else, first.

You may also want to refer to: http://oss.intel.com/pdfs/mclinux.pdf and
to the irqbalance and hwloc[1] utilities, since you're apparently
interested in SMP/SMT/NUMA scheduler performance.

[1] http://www.open-mpi.org/projects/hwloc/

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-30 21:26 ` Henrique de Moraes Holschuh
@ 2011-10-30 21:51   ` Artem S. Tashkinov
  2011-10-31  9:16     ` Henrique de Moraes Holschuh
  2011-10-31 18:59   ` Chris Friesen
  1 sibling, 1 reply; 25+ messages in thread
From: Artem S. Tashkinov @ 2011-10-30 21:51 UTC (permalink / raw)
  To: hmh; +Cc: linux-kernel

> On Oct 31, 2011, Henrique de Moraes Holschuh wrote: 
>
> On Sun, 30 Oct 2011, Artem S. Tashkinov wrote:
> > I've found out that even on Linux 3.0.8 the process scheduler doesn't correctly distributes
> > the load amongst virtual CPUs. E.g. on a 4-core system (8 total virtual CPUs) the process
> > scheduler often run some instances of four different tasks on the same physical CPU.
> 
> Please check how your sched_mc_power_savings and sched_smt_power_savings
> tunables.   Here's the doc from lesswats.org:
> 
> [cut]
> 
> Please make sure both are set to 0.  If they were not 0 at the time you
> ran your tests, please retest and report back.
> 
> You also want to make sure you _do_ have the SMT scheduler compiled in
> whatever kernel you're using, just in case.
> 
> It is certainly possible that there is a bug in the scheduler, but it is
> best to make sure it is not something else, first.
> 
> You may also want to refer to: http://oss.intel.com/pdfs/mclinux.pdf and
> to the irqbalance and hwloc[1] utilities, since you're apparently
> interested in SMP/SMT/NUMA scheduler performance.

That's 0 & 0 for me.

And people running standard desktop Linux distributions (like Arch Linux and Ubuntu
11.10) report that this issue also applies to them and by default (in the mentioned
distros) both these variables are set to 0 (that is unchanged).

So, there's nothing to retest.

I have another major pet peeve concerning the Linux process scheduler but I
want to start a new thread on that topic.

Best wishes,

Artem



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-30 21:51   ` Artem S. Tashkinov
@ 2011-10-31  9:16     ` Henrique de Moraes Holschuh
  2011-10-31  9:40       ` Artem S. Tashkinov
  0 siblings, 1 reply; 25+ messages in thread
From: Henrique de Moraes Holschuh @ 2011-10-31  9:16 UTC (permalink / raw)
  To: Artem S. Tashkinov; +Cc: linux-kernel

On Sun, 30 Oct 2011, Artem S. Tashkinov wrote:
> > Please make sure both are set to 0.  If they were not 0 at the time you
> > ran your tests, please retest and report back.
> 
> That's 0 & 0 for me.

How idle is your system during the test?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-31  9:16     ` Henrique de Moraes Holschuh
@ 2011-10-31  9:40       ` Artem S. Tashkinov
  2011-10-31 11:58         ` Henrique de Moraes Holschuh
  2011-11-01  5:15         ` ffab ffa
  0 siblings, 2 replies; 25+ messages in thread
From: Artem S. Tashkinov @ 2011-10-31  9:40 UTC (permalink / raw)
  To: hmh; +Cc: linux-kernel

> On Oct 31, 2011, Henrique de Moraes Holschuh  wrote: 
> 
> On Sun, 30 Oct 2011, Artem S. Tashkinov wrote:
> > > Please make sure both are set to 0.  If they were not 0 at the time you
> > > ran your tests, please retest and report back.
> > 
> > That's 0 & 0 for me.
> 
> How idle is your system during the test?

load average: 0.00, 0.00, 0.00

As I've mentioned great many times I run this test on a completely idle system
(i.e. I even `init 3` in advance to avoid any unexpected CPU usage spikes
caused by unrelated processed).

I have to insist that people conduct this test on their own without trusting my
words. Probably there's something I overlook or don't fully understand but from
what I see, there's a serious issue here (at least Microsoft XP and 7 work exactly
the way I believe an OS should handle such a load).

Artem

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-31  9:40       ` Artem S. Tashkinov
@ 2011-10-31 11:58         ` Henrique de Moraes Holschuh
  2011-11-01  4:14           ` Zhu Yanhai
  2011-11-01  5:15         ` ffab ffa
  1 sibling, 1 reply; 25+ messages in thread
From: Henrique de Moraes Holschuh @ 2011-10-31 11:58 UTC (permalink / raw)
  To: Artem S. Tashkinov; +Cc: linux-kernel

On Mon, 31 Oct 2011, Artem S. Tashkinov wrote:
> > On Oct 31, 2011, Henrique de Moraes Holschuh  wrote: 
> > 
> > On Sun, 30 Oct 2011, Artem S. Tashkinov wrote:
> > > > Please make sure both are set to 0.  If they were not 0 at the time you
> > > > ran your tests, please retest and report back.
> > > 
> > > That's 0 & 0 for me.
> > 
> > How idle is your system during the test?
> 
> load average: 0.00, 0.00, 0.00

I believe cpuidle will interfere with the scheduling in that case.  Could
you run your test with higher loads (start with one, and go up to eight
tasks that are CPU-hogs, measuring each step)?

> I have to insist that people conduct this test on their own without trusting my
> words. Probably there's something I overlook or don't fully understand but from

What you should attempt to do is to give us a reproducible test case.  A
shell script or C/perl/python/whatever program that when run clearly shows
the problem you're complaining about on your system.  Failing that, a very
detailed description (read: step by step) of how you're testing things.

I can't see anything wrong in my X5550 workstation (4 cores, 8 threads,
single processor, i.e. not NUMA) running 3.0.8.

> what I see, there's a serious issue here (at least Microsoft XP and 7 work exactly

So far it looks like that, since your system is almost entirely idle, it
could be trying to minimize task-run latency by scheduling work to the few
cores/threads that are not in deep sleep (they take time to wake up, are
often cache-cold, etc).

Please use tools/power/x86/turbostat to track core usage and idle-states
instead of top/htop.  That might give you better information, and I
think you will appreciate getting to know that tool.  Note: turbostat
reports *averages* for each thread.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-31 11:58         ` Henrique de Moraes Holschuh
@ 2011-11-01  4:14           ` Zhu Yanhai
  0 siblings, 0 replies; 25+ messages in thread
From: Zhu Yanhai @ 2011-11-01  4:14 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh; +Cc: Artem S. Tashkinov, linux-kernel

Hi,
I think the unbalance has got much better on mainline kernel than OS
vendor's, i.e. RHEL6.  Just in case you are interested, below is a
very simple test case I used before against NUMA + CFS group
scheduling extension. I have tested this on a dual-soket Xeon E5620
server.

cat bbb.c
int main()
{
    while(1)
    {
    };
}



cat run.sh

#!/bin/sh
count=0
pids=" "
while [ $count -lt 32 ]
do
	mkdir /cgroup/$count
	echo 1024 > /cgroup/$count/cpu.shares
	# taskset -c 4,5,6,7,12,13,14,15 ./bbb &
	./bbb &
	pid=`echo $!`
	echo $pid > /cgroup/$count/tasks
	pids=`echo $pids" "$pid`
	count=`expr $count + 1`
done
echo "for pid in $pids;do cat /proc/$pid/sched|grep
sum_exec_runtime;done" > show.sh
watch -n1 sh show.sh


Since one e5620 with HT enabled has 8 logical cpus, this dual-socket
box has 16 logical cpus in total. The above test script starts 32
processes, so the intuitively guess is each two of them will run on
one logical cpu. However it's not for current RHEL6 kernel, top shows
that they are keeping migrating and often unbalanced, sometimes worse
and sometimes better. If you watch it for a long time, you may find
sometimes one process occupy the whole logical cpu for a moment, and
several process (far more than 2) congest on a single cpu slot.
Also the 'watch' output shows that the sum_exec_runtime is almost the
same of them, so it seems that the RHEL6 kernel is trying to move a
lucky guy to a free cpu slot, make it hold that position for a while,
then move the next lucky guy there and kick off the previous one to a
crowded slot, which is not a good policy for such totally Independent
processes.
And on the mainline kernel(3.0.0+), they run much more balanced that
above, although I can't identify which commits made this.

--
Regards,
Zhu Yanhai


2011/10/31 Henrique de Moraes Holschuh <hmh@hmh.eng.br>:
> On Mon, 31 Oct 2011, Artem S. Tashkinov wrote:
>> > On Oct 31, 2011, Henrique de Moraes Holschuh  wrote:
>> >
>> > On Sun, 30 Oct 2011, Artem S. Tashkinov wrote:
>> > > > Please make sure both are set to 0.  If they were not 0 at the time you
>> > > > ran your tests, please retest and report back.
>> > >
>> > > That's 0 & 0 for me.
>> >
>> > How idle is your system during the test?
>>
>> load average: 0.00, 0.00, 0.00
>
> I believe cpuidle will interfere with the scheduling in that case.  Could
> you run your test with higher loads (start with one, and go up to eight
> tasks that are CPU-hogs, measuring each step)?
>
>> I have to insist that people conduct this test on their own without trusting my
>> words. Probably there's something I overlook or don't fully understand but from
>
> What you should attempt to do is to give us a reproducible test case.  A
> shell script or C/perl/python/whatever program that when run clearly shows
> the problem you're complaining about on your system.  Failing that, a very
> detailed description (read: step by step) of how you're testing things.
>
> I can't see anything wrong in my X5550 workstation (4 cores, 8 threads,
> single processor, i.e. not NUMA) running 3.0.8.
>
>> what I see, there's a serious issue here (at least Microsoft XP and 7 work exactly
>
> So far it looks like that, since your system is almost entirely idle, it
> could be trying to minimize task-run latency by scheduling work to the few
> cores/threads that are not in deep sleep (they take time to wake up, are
> often cache-cold, etc).
>
> Please use tools/power/x86/turbostat to track core usage and idle-states
> instead of top/htop.  That might give you better information, and I
> think you will appreciate getting to know that tool.  Note: turbostat
> reports *averages* for each thread.
>
> --
>  "One disk to rule them all, One disk to find them. One disk to bring
>  them all and in the darkness grind them. In the Land of Redmond
>  where the shadows lie." -- The Silicon Valley Tarot
>  Henrique Holschuh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-31  9:40       ` Artem S. Tashkinov
  2011-10-31 11:58         ` Henrique de Moraes Holschuh
@ 2011-11-01  5:15         ` ffab ffa
  1 sibling, 0 replies; 25+ messages in thread
From: ffab ffa @ 2011-11-01  5:15 UTC (permalink / raw)
  To: Artem S. Tashkinov; +Cc: hmh, linux-kernel

are you sure of the cpu terminology? I used few  i7-2600K  machine
and  the  /proc/cpuinfo shows that the cores thread siblings are 0-4,
1-5, 2-6 and 3-7
can you share your /proc/cpuinfo

On Mon, Oct 31, 2011 at 2:40 AM, Artem S. Tashkinov <t.artem@lycos.com> wrote:
>> On Oct 31, 2011, Henrique de Moraes Holschuh  wrote:
>>
>> On Sun, 30 Oct 2011, Artem S. Tashkinov wrote:
>> > > Please make sure both are set to 0.  If they were not 0 at the time you
>> > > ran your tests, please retest and report back.
>> >
>> > That's 0 & 0 for me.
>>
>> How idle is your system during the test?
>
> load average: 0.00, 0.00, 0.00
>
> As I've mentioned great many times I run this test on a completely idle system
> (i.e. I even `init 3` in advance to avoid any unexpected CPU usage spikes
> caused by unrelated processed).
>
> I have to insist that people conduct this test on their own without trusting my
> words. Probably there's something I overlook or don't fully understand but from
> what I see, there's a serious issue here (at least Microsoft XP and 7 work exactly
> the way I believe an OS should handle such a load).
>
> Artem
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-30 21:26 ` Henrique de Moraes Holschuh
  2011-10-30 21:51   ` Artem S. Tashkinov
@ 2011-10-31 18:59   ` Chris Friesen
  2011-11-01  6:01     ` Mike Galbraith
  1 sibling, 1 reply; 25+ messages in thread
From: Chris Friesen @ 2011-10-31 18:59 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh; +Cc: Artem S. Tashkinov, linux-kernel

On 10/30/2011 03:26 PM, Henrique de Moraes Holschuh wrote:
> On Sun, 30 Oct 2011, Artem S. Tashkinov wrote:
>> I've found out that even on Linux 3.0.8 the process scheduler doesn't correctly distributes
>> the load amongst virtual CPUs. E.g. on a 4-core system (8 total virtual CPUs) the process
>> scheduler often run some instances of four different tasks on the same physical CPU.
>
> Please check how your sched_mc_power_savings and sched_smt_power_savings
> tunables.   Here's the doc from lesswats.org:
>
> 'sched_mc_power_savings' tunable under /sys/devices/system/cpu/ controls
> the Multi-core related tunable. By default, this is set to '0' (for
> optimal performance). By setting this to '1', under light load
> scenarios, the process load is distributed such that all the cores in a
> processor package are busy before distributing the process load to other
> processor packages.
>
> [...]
>
> 'sched_smt_power_savings' tunable under /sys/devices/system/cpu/
> controls the multi-threading related tunable. By default, this is set to
> '0' (for optimal performance). By setting this to '1', under light load
> scenarios, the process load is distributed such that all the threads in
> a core and all the cores in a processor package are busy before
> distributing the process load to threads and cores, in other processor
> packages.


I'm currently running Fedora 14 (2.6.35.14-97.fc14.x86_64 kernel) on an 
i5 560M cpu.  It's supposed to have 2 cores, with hyperthreading.

I created a tiny cpu burner program that just busy-loops.  Running two 
instances on my system they were always scheduled on separate physical 
cpus regardless of the values in sched_mc_power_savings or 
sched_smt_power_savings.

Running four instances they were always spread across all four "cpus".

With the newer 8-core chips (plus HT) in multi-socket boards with 
package-level turbo-boost and NUMA memory access this is going to get 
really interesting...

Chris


-- 
Chris Friesen
Software Developer
GENBAND
chris.friesen@genband.com
www.genband.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-31 18:59   ` Chris Friesen
@ 2011-11-01  6:01     ` Mike Galbraith
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Galbraith @ 2011-11-01  6:01 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Henrique de Moraes Holschuh, Artem S. Tashkinov, linux-kernel

On Mon, 2011-10-31 at 12:59 -0600, Chris Friesen wrote:

> I created a tiny cpu burner program that just busy-loops.  Running two 
> instances on my system they were always scheduled on separate physical 
> cpus regardless of the values in sched_mc_power_savings or 
> sched_smt_power_savings.

A wakeup driven load using sync wakeup hint will wake to siblings of the
same core if there's too not too much imbalance though.  Whether that's
good or bad.. depends.

If you turn SD_SHARE_PKG_RESOURCES off in the sibling sched domain,
wakees should be awakened CPU affine in sync hint case, or on separate
cores in no sync hint case.  They are here anyway.

	-Mike 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-30 19:57 HT (Hyper Threading) aware process scheduling doesn't work as it should Artem S. Tashkinov
  2011-10-30 21:26 ` Henrique de Moraes Holschuh
@ 2011-10-30 22:12 ` Arjan van de Ven
  2011-10-30 22:29   ` Artem S. Tashkinov
  2011-10-31 10:06 ` Con Kolivas
  2011-11-03  8:18 ` Ingo Molnar
  3 siblings, 1 reply; 25+ messages in thread
From: Arjan van de Ven @ 2011-10-30 22:12 UTC (permalink / raw)
  To: Artem S. Tashkinov; +Cc: linux-kernel

On Sun, 30 Oct 2011 19:57:12 +0000 (GMT)
"Artem S. Tashkinov" <t.artem@lycos.com> wrote:

> Hello,
> 
> It's known that if you want to reach maximum performance on HT
> enabled Intel CPUs you should distribute the load evenly between
> physical cores, and when you have loaded all of them you should then
> load the remaining virtual cores.

this is a bold statement, and patently false if you have to threads of
one process that heavily share data between eachother...
(but true for more independent workloads)


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-30 22:12 ` Arjan van de Ven
@ 2011-10-30 22:29   ` Artem S. Tashkinov
  2011-10-31  3:19     ` Yong Zhang
  0 siblings, 1 reply; 25+ messages in thread
From: Artem S. Tashkinov @ 2011-10-30 22:29 UTC (permalink / raw)
  To: arjan; +Cc: linux-kernel

> On Oct 31, 2011, Arjan van de Ven wrote:
>
> On Sun, 30 Oct 2011 19:57:12 +0000 (GMT)
> "Artem S. Tashkinov" wrote:
>
> > Hello,
> >
> > It's known that if you want to reach maximum performance on HT
> > enabled Intel CPUs you should distribute the load evenly between
> > physical cores, and when you have loaded all of them you should then
> > load the remaining virtual cores.
>
> this is a bold statement, and patently false if you have to threads of
> one process that heavily share data between eachother...
> (but true for more independent workloads)

In my initial message I was talking about completely unrelated tasks/
processes which share no data/instructions/whatever else. You don't
need to trust my test case as you can carry out this test on your own.

I have asked quite a lot of people to do that and a lot of them see this
unfortunate pattern. 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-30 22:29   ` Artem S. Tashkinov
@ 2011-10-31  3:19     ` Yong Zhang
  2011-10-31  8:18       ` Artem S. Tashkinov
  0 siblings, 1 reply; 25+ messages in thread
From: Yong Zhang @ 2011-10-31  3:19 UTC (permalink / raw)
  To: Artem S. Tashkinov; +Cc: arjan, linux-kernel, Ingo Molnar, Peter Zijlstra

On Sun, Oct 30, 2011 at 10:29:23PM +0000, Artem S. Tashkinov wrote:
> > On Oct 31, 2011, Arjan van de Ven wrote:
> >
> > On Sun, 30 Oct 2011 19:57:12 +0000 (GMT)
> > "Artem S. Tashkinov" wrote:
> >
> > > Hello,
> > >
> > > It's known that if you want to reach maximum performance on HT
> > > enabled Intel CPUs you should distribute the load evenly between
> > > physical cores, and when you have loaded all of them you should then
> > > load the remaining virtual cores.
> >
> > this is a bold statement, and patently false if you have to threads of
> > one process that heavily share data between eachother...
> > (but true for more independent workloads)
> 
> In my initial message I was talking about completely unrelated tasks/
> processes which share no data/instructions/whatever else. You don't
> need to trust my test case as you can carry out this test on your own.

(Cc'ing more people)

Maybe you can also show your test case here?

Thanks,
Yong

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-31  3:19     ` Yong Zhang
@ 2011-10-31  8:18       ` Artem S. Tashkinov
  0 siblings, 0 replies; 25+ messages in thread
From: Artem S. Tashkinov @ 2011-10-31  8:18 UTC (permalink / raw)
  To: yong.zhang0; +Cc: arjan, linux-kernel, mingo, peterz

> 
> (Cc'ing more people)
> 
> Maybe you can also show your test case here?
> 

The test case is perfectly outlined in the first message I posted to LKML but I
can repeat it for you ( https://lkml.org/lkml/2011/10/30/106 ).

On a HT enabled completely idle system run as many different tasks as you have
real CPU cores, e.g. on an Intel Core i7 2600 CPU, that will be four tasks.

For the best performance all tasks should be attached to different physical cores.
However often the opposite behaviour can be observed, the process scheduler
binds pairs of tasks to virtual HT cores of the same physical CPU module, e.g.
in theory you should get this distribution of tasks: 1:3:5:7 but often I get this 
distribution 1:6:7:8 (three physical cores loaded instead of four) or 1:2:7:8 (two
physical cores loaded instead of four).

Artem

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-30 19:57 HT (Hyper Threading) aware process scheduling doesn't work as it should Artem S. Tashkinov
  2011-10-30 21:26 ` Henrique de Moraes Holschuh
  2011-10-30 22:12 ` Arjan van de Ven
@ 2011-10-31 10:06 ` Con Kolivas
  2011-10-31 11:42   ` Mike Galbraith
  2011-11-03  8:18 ` Ingo Molnar
  3 siblings, 1 reply; 25+ messages in thread
From: Con Kolivas @ 2011-10-31 10:06 UTC (permalink / raw)
  To: Artem S. Tashkinov; +Cc: linux-kernel

On Sun, 30 Oct 2011 07:57:12 PM Artem S. Tashkinov wrote:
> I've found out that even on Linux 3.0.8 the process scheduler doesn't
> correctly distributes the load amongst virtual CPUs. E.g. on a 4-core
> system (8 total virtual CPUs) the process scheduler often run some
> instances of four different tasks on the same physical CPU.

> Any thoughts? comments? I think this is quite a serious problem.

Intense cache locality logic, power saving concepts, cpu frequency governor 
behaviour and separate runqueues per CPU within the current CPU process 
scheduler in the current mainline linux kernel will ocasionally do this. Some 
workloads will be better, while others will be worse. Feel free to try my BFS 
cpu scheduler if you wish a CPU process scheduler that spreads work more 
evenly across CPUs. 

Alas the last version I synced up with will not apply cleanly past about 3.0.6 
I believe:

http://ck.kolivas.org/patches/bfs/3.0.0/3.0-sched-bfs-413.patch

Regards,
Con Kolivas

-- 
-ck

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-31 10:06 ` Con Kolivas
@ 2011-10-31 11:42   ` Mike Galbraith
  2011-11-01  0:41     ` Con Kolivas
  0 siblings, 1 reply; 25+ messages in thread
From: Mike Galbraith @ 2011-10-31 11:42 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Artem S. Tashkinov, linux-kernel

On Mon, 2011-10-31 at 21:06 +1100, Con Kolivas wrote:
> On Sun, 30 Oct 2011 07:57:12 PM Artem S. Tashkinov wrote:
> > I've found out that even on Linux 3.0.8 the process scheduler doesn't
> > correctly distributes the load amongst virtual CPUs. E.g. on a 4-core
> > system (8 total virtual CPUs) the process scheduler often run some
> > instances of four different tasks on the same physical CPU.
> 
> > Any thoughts? comments? I think this is quite a serious problem.
> 
> Intense cache locality logic, power saving concepts, cpu frequency governor 
> behaviour and separate runqueues per CPU within the current CPU process 
> scheduler in the current mainline linux kernel will ocasionally do this. Some 
> workloads will be better, while others will be worse. Feel free to try my BFS 
> cpu scheduler if you wish a CPU process scheduler that spreads work more 
> evenly across CPUs. 
> 
> Alas the last version I synced up with will not apply cleanly past about 3.0.6 
> I believe:
> 
> http://ck.kolivas.org/patches/bfs/3.0.0/3.0-sched-bfs-413.patch

Yeah, it handles independent tasks well, but cache misses can be
excruciatingly painful for the others.

Q6600 box, configs as identical as possible, tbench 8

3.0.6-bfs413       728.6 MB/sec
3.0.8             1146.7 MB/sec

	-Mike


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-31 11:42   ` Mike Galbraith
@ 2011-11-01  0:41     ` Con Kolivas
  2011-11-01  0:58       ` Gene Heskett
  2011-11-01  5:08       ` Mike Galbraith
  0 siblings, 2 replies; 25+ messages in thread
From: Con Kolivas @ 2011-11-01  0:41 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Artem S. Tashkinov, linux-kernel

On Mon, 31 Oct 2011 12:42:28 PM Mike Galbraith wrote:
> On Mon, 2011-10-31 at 21:06 +1100, Con Kolivas wrote:
> > On Sun, 30 Oct 2011 07:57:12 PM Artem S. Tashkinov wrote:
> > > I've found out that even on Linux 3.0.8 the process scheduler
> > > doesn't
> > > correctly distributes the load amongst virtual CPUs. E.g. on a
> > > 4-core
> > > system (8 total virtual CPUs) the process scheduler often run some
> > > instances of four different tasks on the same physical CPU.
> > > 
> > > Any thoughts? comments? I think this is quite a serious problem.
> > 
> > Intense cache locality logic, power saving concepts, cpu frequency
> > governor behaviour and separate runqueues per CPU within the current
> > CPU process scheduler in the current mainline linux kernel will
> > ocasionally do this. Some workloads will be better, while others will
> > be worse. Feel free to try my BFS cpu scheduler if you wish a CPU
> > process scheduler that spreads work more evenly across CPUs.
> > 
> > Alas the last version I synced up with will not apply cleanly past about
> > 3.0.6 I believe:
> > 
> > http://ck.kolivas.org/patches/bfs/3.0.0/3.0-sched-bfs-413.patch
> 
> Yeah, it handles independent tasks well, but cache misses can be
> excruciatingly painful for the others.
> 
> Q6600 box, configs as identical as possible, tbench 8
> 
> 3.0.6-bfs413       728.6 MB/sec
> 3.0.8             1146.7 MB/sec
> 
> 	-Mike

Fortunately BFS is about optimising user visible service latency for normal 
users running normal applications on normal desktops under normal workloads, 
and not about tbench throughput.

Regards,
Con
-- 
-ck

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-11-01  0:41     ` Con Kolivas
@ 2011-11-01  0:58       ` Gene Heskett
  2011-11-01  5:08       ` Mike Galbraith
  1 sibling, 0 replies; 25+ messages in thread
From: Gene Heskett @ 2011-11-01  0:58 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel

On Monday, October 31, 2011, Con Kolivas wrote:

>Fortunately BFS is about optimising user visible service latency for
>normal users running normal applications on normal desktops under normal
>workloads, and not about tbench throughput.
>
>Regards,
>Con

And for that it is doing a very good job for me, thanks Con.

Cheers, Gene
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
My web page: <http://coyoteden.dyndns-free.com:85/gene>
Enzymes are things invented by biologists that explain things which
otherwise require harder thinking.
		-- Jerome Lettvin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-11-01  0:41     ` Con Kolivas
  2011-11-01  0:58       ` Gene Heskett
@ 2011-11-01  5:08       ` Mike Galbraith
  1 sibling, 0 replies; 25+ messages in thread
From: Mike Galbraith @ 2011-11-01  5:08 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Artem S. Tashkinov, linux-kernel

On Tue, 2011-11-01 at 11:41 +1100, Con Kolivas wrote:

> Fortunately BFS is about optimising user visible service latency for normal 
> users running normal applications on normal desktops under normal workloads, 
> and not about tbench throughput.

Unfortunately, cache misses are an equal opportunity pain provider.

	-Mike


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-10-30 19:57 HT (Hyper Threading) aware process scheduling doesn't work as it should Artem S. Tashkinov
                   ` (2 preceding siblings ...)
  2011-10-31 10:06 ` Con Kolivas
@ 2011-11-03  8:18 ` Ingo Molnar
  2011-11-03  9:44   ` Artem S. Tashkinov
  2011-11-03 13:00   ` Mike Galbraith
  3 siblings, 2 replies; 25+ messages in thread
From: Ingo Molnar @ 2011-11-03  8:18 UTC (permalink / raw)
  To: Artem S. Tashkinov
  Cc: linux-kernel, Peter Zijlstra, Mike Galbraith, Paul Turner



( Sorry about the delay in the reply - folks are returning from and 
  recovering from the Kernel Summit ;-) I've extended the Cc: list.
  Please Cc: scheduler folks when reporting bugs, next time around. )

* Artem S. Tashkinov <t.artem@lycos.com> wrote:

> Hello,
> 
> It's known that if you want to reach maximum performance on HT 
> enabled Intel CPUs you should distribute the load evenly between 
> physical cores, and when you have loaded all of them you should 
> then load the remaining virtual cores.
> 
> For example, if you have 4 physical cores and 8 virtual CPUs then 
> if you have just four tasks consuming 100% of CPU time you should 
> load four CPU pairs:
> 
> VCPUs: {1,2} - one task running
> 
> VCPUs: {3,4} - one task running
> 
> VCPUs: {5,6} - one task running
> 
> VCPUs: {7,8} - one task running
> 
> It's absolutely detrimental to performance to bind two tasks to 
> e.g. two physical cores {1,2} {3,4} and then the remaining two 
> tasks to e.g. the third core 5,6:
> 
> VCPUs: {1,2} - one task running
> 
> VCPUs: {3,4} - one task running
> 
> VCPUs: {5,6} - *two* task runnings
> 
> VCPUs: {7,8} - no tasks running
> 
> I've found out that even on Linux 3.0.8 the process scheduler 
> doesn't correctly distributes the load amongst virtual CPUs. E.g. 
> on a 4-core system (8 total virtual CPUs) the process scheduler 
> often run some instances of four different tasks on the same 
> physical CPU.
> 
> Maybe I shouldn't trust top/htop output on this matter but the same 
> test carried out on Microsoft XP OS shows that it indeed 
> distributes the load correctly, running tasks on different physical 
> cores whenever possible.
> 
> Any thoughts? comments? I think this is quite a serious problem.

If sched_mc is set to zero then this looks like a serious load 
balancing bug - you are perfectly right that we should balance 
between physical packages first and ending up with the kind of 
asymmetry you describe for any observable length is a bug.

You have not outlined your exact workload - do you run a simple CPU 
consuming loop with no sleeping done whatsoever, or something more 
complex?

Peter, Paul, Mike, any ideas?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-11-03  8:18 ` Ingo Molnar
@ 2011-11-03  9:44   ` Artem S. Tashkinov
  2011-11-03 10:29     ` Ingo Molnar
  2011-11-03 12:42     ` Henrique de Moraes Holschuh
  2011-11-03 13:00   ` Mike Galbraith
  1 sibling, 2 replies; 25+ messages in thread
From: Artem S. Tashkinov @ 2011-11-03  9:44 UTC (permalink / raw)
  To: mingo; +Cc: linux-kernel, a.p.zijlstra, efault, pjt

> On Nov 3, 2011, Ingo Molnar wrote: 
> 
> If sched_mc is set to zero then this looks like a serious load 
> balancing bug - you are perfectly right that we should balance 
> between physical packages first and ending up with the kind of 
> asymmetry you describe for any observable length is a bug.
> 
> You have not outlined your exact workload - do you run a simple CPU 
> consuming loop with no sleeping done whatsoever, or something more 
> complex?
> 
> Peter, Paul, Mike, any ideas?

Actually I am just running 4 copies of bzip2 compressor (< /dev/zero > /dev/null).

A person named ffab ffa said ( http://lkml.org/lkml/2011/11/1/11 ) that I probably
misunderstand/misinterpret physical cores. He says that cores thread siblings on
e.g., Intel Core 2600K are 0-4, 1-5, 2-6 and 3-7

and when I am running this test I have the following VCPUs distribution:

1, 6, 7, 8 (0-4, 1-5, 2-6, 7-8 - all four physical cores loaded)
1, 2, 7, 8 (0-4, 1-5, 2-6, 7-8 - all four physical cores loaded)

According to the cores thread siblings distribution the HT aware process scheduler
indeed works correctly. 

However sometimes I see this picture:

3, 4, 5, 6 (2-6, 1-5, 2-6, 7-8 - three physical cores loaded)

So, now the question is whether VCPUs quite an illogical enumeration is good for
power users as I highly doubt that 0-4, 1-5, 2-6 and 3-7 order can be easily
remembered and grasped. Besides neither top, not htop are HT aware so just by
looking at their output it gets very difficult to see and understand if the process
scheduler works as it should.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-11-03  9:44   ` Artem S. Tashkinov
@ 2011-11-03 10:29     ` Ingo Molnar
  2011-11-03 12:42     ` Henrique de Moraes Holschuh
  1 sibling, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2011-11-03 10:29 UTC (permalink / raw)
  To: Artem S. Tashkinov; +Cc: linux-kernel, a.p.zijlstra, efault, pjt

* Artem S. Tashkinov <t.artem@lycos.com> wrote:

> > On Nov 3, 2011, Ingo Molnar wrote: 
> > 
> > If sched_mc is set to zero then this looks like a serious load 
> > balancing bug - you are perfectly right that we should balance 
> > between physical packages first and ending up with the kind of 
> > asymmetry you describe for any observable length is a bug.
> > 
> > You have not outlined your exact workload - do you run a simple CPU 
> > consuming loop with no sleeping done whatsoever, or something more 
> > complex?
> > 
> > Peter, Paul, Mike, any ideas?
> 
> Actually I am just running 4 copies of bzip2 compressor (< 
> /dev/zero > /dev/null).
> 
> A person named ffab ffa said ( http://lkml.org/lkml/2011/11/1/11 ) 
> that I probably misunderstand/misinterpret physical cores. He says 
> that cores thread siblings on e.g., Intel Core 2600K are 0-4, 1-5, 
> 2-6 and 3-7
> 
> and when I am running this test I have the following VCPUs distribution:
> 
> 1, 6, 7, 8 (0-4, 1-5, 2-6, 7-8 - all four physical cores loaded)
> 1, 2, 7, 8 (0-4, 1-5, 2-6, 7-8 - all four physical cores loaded)
> 
> According to the cores thread siblings distribution the HT aware 
> process scheduler indeed works correctly.

Ok, good - and that correct behavior is what we are seeing elsewhere 
as well so your bugreport was somewhat puzzling.

> However sometimes I see this picture:
> 
> 3, 4, 5, 6 (2-6, 1-5, 2-6, 7-8 - three physical cores loaded)

It's hard to tell how normal this is without better tooling and 
better data capture. Especially when visualization runs its normal 
for tasks to reshuffle a bit: Xorg and the visualization task is 
running as well and are treated preferentially to any CPU hogs - but 
once only the CPU-intense tasks are running they'll rebalance 
correctly.

That having said it's always a possibility that there's a balancing 
bug.

One way you could decide it is to measure actual CPU-intense task 
performance versus pinning them to the 'right' cores via taskset. If 
the 'pinned' variant measurably outperforms for 'free running' 
version then there's a balancing problem.

(Of course tracing it and checking how well we schedule is the most 
powerful tool.)

> So, now the question is whether VCPUs quite an illogical 
> enumeration is good for power users as I highly doubt that 0-4, 
> 1-5, 2-6 and 3-7 order can be easily remembered and grasped. 
> Besides neither top, not htop are HT aware so just by looking at 
> their output it gets very difficult to see and understand if the 
> process scheduler works as it should.

That enumeration order likely just comes from the BIOS and there's 
little the scheduler can do about it. We could try to re-shape the 
topology if the BIOS messes up but that's probably quite fragile to 
do.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-11-03  9:44   ` Artem S. Tashkinov
  2011-11-03 10:29     ` Ingo Molnar
@ 2011-11-03 12:42     ` Henrique de Moraes Holschuh
  2011-11-03 13:06       ` Artem S. Tashkinov
  1 sibling, 1 reply; 25+ messages in thread
From: Henrique de Moraes Holschuh @ 2011-11-03 12:42 UTC (permalink / raw)
  To: Artem S. Tashkinov; +Cc: linux-kernel

On Thu, 03 Nov 2011, Artem S. Tashkinov wrote:
> So, now the question is whether VCPUs quite an illogical enumeration is good for
> power users as I highly doubt that 0-4, 1-5, 2-6 and 3-7 order can be easily
> remembered and grasped. Besides neither top, not htop are HT aware so just by

Power users are directed to hwloc.  There's a reason I pointed you to it.
hwloc would have told you upfront your real memory/cache/core/thread
topology, either in text mode, through graphics, or as XML:

Here's hwloc's "lstopo" text output for my single-processor X5550:

Machine (6029MB) + Socket #0 + L3 #0 (8192KB)
  L2 #0 (256KB) + L1 #0 (32KB) + Core #0
    PU #0 (phys=0)
    PU #1 (phys=4)
  L2 #1 (256KB) + L1 #1 (32KB) + Core #1
    PU #2 (phys=1)
    PU #3 (phys=5)
  L2 #2 (256KB) + L1 #2 (32KB) + Core #2
    PU #4 (phys=2)
    PU #5 (phys=6)
  L2 #3 (256KB) + L1 #3 (32KB) + Core #3
    PU #6 (phys=3)
    PU #7 (phys=7)

http://www.open-mpi.org/projects/hwloc/

and examples/documentation:
http://www.open-mpi.org/projects/hwloc/doc/v1.3/

Most likely, your distro will have it packaged.

You should also try the turbostat tool I pointed you at, it lives in the
"tools/power/x86" folder inside the kernel source, and will help you track
processor core performance a lot better than top/htop (but not what is using
the cores):

(turbostat output):
core CPU   %c0   GHz  TSC   %c1    %c3    %c6   %pc3   %pc6 
           0.29 1.60 2.67   0.94  12.63  86.14   0.00   0.00
   0   0   0.31 1.60 2.67   1.62   3.21  94.87   0.00   0.00
   0   4   0.48 1.61 2.67   1.45   3.21  94.87   0.00   0.00
   1   1   0.18 1.60 2.67   0.91   2.17  96.75   0.00   0.00
   1   5   0.24 1.60 2.67   0.84   2.17  96.75   0.00   0.00
   2   2   0.03 1.60 2.67   0.07   0.16  99.74   0.00   0.00
   2   6   0.02 1.60 2.67   0.08   0.16  99.74   0.00   0.00
   3   3   1.00 1.60 2.67   0.83  44.97  53.20   0.00   0.00
   3   7   0.09 1.60 2.67   1.75  44.97  53.20   0.00   0.00

Which tells me my system spends most of its time sleeping. You will notice
it does tell you upfront that core 0 is CPUs 0 and 4.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-11-03 12:42     ` Henrique de Moraes Holschuh
@ 2011-11-03 13:06       ` Artem S. Tashkinov
  0 siblings, 0 replies; 25+ messages in thread
From: Artem S. Tashkinov @ 2011-11-03 13:06 UTC (permalink / raw)
  To: hmh; +Cc: linux-kernel


On Nov 3, 2011, Henrique de Moraes Holschuh wrote: 
 
> > On Thu, 03 Nov 2011, Artem S. Tashkinov wrote:
> > So, now the question is whether VCPUs quite an illogical enumeration is good for
> > power users as I highly doubt that 0-4, 1-5, 2-6 and 3-7 order can be easily
> > remembered and grasped. Besides neither top, not htop are HT aware so just by
> 
> Power users are directed to hwloc.  There's a reason I pointed you to it.
> hwloc would have told you upfront your real memory/cache/core/thread
> topology, either in text mode, through graphics, or as XML:
> 
> Here's hwloc's "lstopo" text output for my single-processor X5550:
> 
> Machine (6029MB) + Socket #0 + L3 #0 (8192KB)
>   L2 #0 (256KB) + L1 #0 (32KB) + Core #0
>     PU #0 (phys=0)
>     PU #1 (phys=4)
>   L2 #1 (256KB) + L1 #1 (32KB) + Core #1
>     PU #2 (phys=1)
>     PU #3 (phys=5)
>   L2 #2 (256KB) + L1 #2 (32KB) + Core #2
>     PU #4 (phys=2)
>     PU #5 (phys=6)
>   L2 #3 (256KB) + L1 #3 (32KB) + Core #3
>     PU #6 (phys=3)
>     PU #7 (phys=7)

A very useful utility indeed, thank you! Still I wonder if for the sake of simplicity 
it is possible to show and present virtual CPU pairs to the user in natural order,
(0,1 2,3 4,5 6,7) not how it's currently done (0,4 1,5 2,6 3,7). I cannot believe
it's difficult to change the userspace representation of virtual CPU pairs.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: HT (Hyper Threading) aware process scheduling doesn't work as it should
  2011-11-03  8:18 ` Ingo Molnar
  2011-11-03  9:44   ` Artem S. Tashkinov
@ 2011-11-03 13:00   ` Mike Galbraith
  1 sibling, 0 replies; 25+ messages in thread
From: Mike Galbraith @ 2011-11-03 13:00 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Artem S. Tashkinov, linux-kernel, Peter Zijlstra, Paul Turner

On Thu, 2011-11-03 at 09:18 +0100, Ingo Molnar wrote:
> 
> ( Sorry about the delay in the reply - folks are returning from and 
>   recovering from the Kernel Summit ;-) I've extended the Cc: list.
>   Please Cc: scheduler folks when reporting bugs, next time around. )
> 
> * Artem S. Tashkinov <t.artem@lycos.com> wrote:
> 
> > Hello,
> > 
> > It's known that if you want to reach maximum performance on HT 
> > enabled Intel CPUs you should distribute the load evenly between 
> > physical cores, and when you have loaded all of them you should 
> > then load the remaining virtual cores.
> > 
> > For example, if you have 4 physical cores and 8 virtual CPUs then 
> > if you have just four tasks consuming 100% of CPU time you should 
> > load four CPU pairs:
> > 
> > VCPUs: {1,2} - one task running
> > 
> > VCPUs: {3,4} - one task running
> > 
> > VCPUs: {5,6} - one task running
> > 
> > VCPUs: {7,8} - one task running
> > 
> > It's absolutely detrimental to performance to bind two tasks to 
> > e.g. two physical cores {1,2} {3,4} and then the remaining two 
> > tasks to e.g. the third core 5,6:
> > 
> > VCPUs: {1,2} - one task running
> > 
> > VCPUs: {3,4} - one task running
> > 
> > VCPUs: {5,6} - *two* task runnings
> > 
> > VCPUs: {7,8} - no tasks running
> > 
> > I've found out that even on Linux 3.0.8 the process scheduler 
> > doesn't correctly distributes the load amongst virtual CPUs. E.g. 
> > on a 4-core system (8 total virtual CPUs) the process scheduler 
> > often run some instances of four different tasks on the same 
> > physical CPU.
> > 
> > Maybe I shouldn't trust top/htop output on this matter but the same 
> > test carried out on Microsoft XP OS shows that it indeed 
> > distributes the load correctly, running tasks on different physical 
> > cores whenever possible.
> > 
> > Any thoughts? comments? I think this is quite a serious problem.
> 
> If sched_mc is set to zero then this looks like a serious load 
> balancing bug - you are perfectly right that we should balance 
> between physical packages first and ending up with the kind of 
> asymmetry you describe for any observable length is a bug.
> 
> You have not outlined your exact workload - do you run a simple CPU 
> consuming loop with no sleeping done whatsoever, or something more 
> complex?
> 
> Peter, Paul, Mike, any ideas?

SD_SHARE_PKG_RESOURCES is on in the SIBLING domain, so in the sync hint
wakeup case (given no other tasks running to muddy the water), the hint
allows us to do an affine wakeup, which allows select_idle_sibling() to
convert the CPU affine wakeup into a cache affine wakeup, waker will be
on one CPU, wakee on it's sibling.  Turning SD_SHARE_PKG_RESOURCES off
results in sync wakeup pairs landing CPU affine instead.

!sync wakeups spread to separate cores unless threads exceeds cores.

I just tested massive_intr (!sync) and tbench pairs (sync) on E5620 box,
and that's what I see happening.

sync wakee landing on a idle sibling is neither black nor white... more
of a London fog + L.A. smog.

	-Mike


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-11-03 13:06 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-30 19:57 HT (Hyper Threading) aware process scheduling doesn't work as it should Artem S. Tashkinov
2011-10-30 21:26 ` Henrique de Moraes Holschuh
2011-10-30 21:51   ` Artem S. Tashkinov
2011-10-31  9:16     ` Henrique de Moraes Holschuh
2011-10-31  9:40       ` Artem S. Tashkinov
2011-10-31 11:58         ` Henrique de Moraes Holschuh
2011-11-01  4:14           ` Zhu Yanhai
2011-11-01  5:15         ` ffab ffa
2011-10-31 18:59   ` Chris Friesen
2011-11-01  6:01     ` Mike Galbraith
2011-10-30 22:12 ` Arjan van de Ven
2011-10-30 22:29   ` Artem S. Tashkinov
2011-10-31  3:19     ` Yong Zhang
2011-10-31  8:18       ` Artem S. Tashkinov
2011-10-31 10:06 ` Con Kolivas
2011-10-31 11:42   ` Mike Galbraith
2011-11-01  0:41     ` Con Kolivas
2011-11-01  0:58       ` Gene Heskett
2011-11-01  5:08       ` Mike Galbraith
2011-11-03  8:18 ` Ingo Molnar
2011-11-03  9:44   ` Artem S. Tashkinov
2011-11-03 10:29     ` Ingo Molnar
2011-11-03 12:42     ` Henrique de Moraes Holschuh
2011-11-03 13:06       ` Artem S. Tashkinov
2011-11-03 13:00   ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox