From: Gregory Haskins <ghaskins@novell.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>,
Nick Piggin <nickpiggin@yahoo.com.au>, vatsa <vatsa@in.ibm.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
"D. Bahi" <dbahi@novell.com>
Subject: Re: [PATCH] sched: properly account IRQ and RT load in SCHED_OTHER load balancing
Date: Thu, 21 Aug 2008 08:26:08 -0400 [thread overview]
Message-ID: <48AD5EE0.8070407@novell.com> (raw)
In-Reply-To: <20080821114126.GB30667@elte.hu>
[-- Attachment #1: Type: text/plain, Size: 4241 bytes --]
Ingo Molnar wrote:
> * Gregory Haskins <ghaskins@novell.com> wrote:
>
>
>> I haven't had a chance to review the code thoroughly yet, but I had
>> been working on a similar fix and know that this is sorely needed.
>> So...
>>
>
> btw., why exactly does this patch speed up certain workloads? I'm not
> quite sure about the exact reasons of that.
>
> Ingo
>
I used to have a great demo for the prototype I was working on, but id
have to dig it up. The gist of it is that the pre-patched scheduler
basically gets thrown for a completely loop in the presence of a mixed
CFS/RT environment. This isn't a PREEMPT_RT specific problem per se,
though PREEMPT_RT does bring the problem to the forefront since it has
so many active RT tasks by default (for the IRQs, etc) which make it
more evident.
Since an RT tasks previous usage of declaring "load" did not actually
express the true nature of the RQ load, CFS tasks would have a few
really nasty things happen to them while trying to run on the system
simultaneously. One of them was that you could starve out CFS tasks
from certain cores (even though there was plenty of CPU bandwidth
available elsewhere) and the load-balancer would think everything is
fine and thus fail to make adjustments.
Say you have a 4 core system. You could, for instance, get into a
situation where the softirq-net-rx thread was consuming 80% of core 0,
yet the load balancer would still spread, say, a 40 thread CFS load
evenly across all cores (approximately 10 per core, though you would
account for the "load" that the softirq thread contributed too). The
threads on the other cores would of course enjoy 100% bandwidth, while
the ~10 threads on core 0 would only see 1/5th of that bandwidth.
What it comes down to is that the CFS load should have been evenly
distributed across the available bandwidth of 3*100% + 1*20%, not 4*100%
as it does today. The net result is that the application performs in a
very lopsided manner, with some threads getting significantly less (or
sometimes zero!) cpu time compared to their peers. You can make this
more obvious by nice'ing the CFS load up as high as it will go, which
will approximate 1/2 of the load of the softirq (since RT tasks
previously enjoyed a 2*MAX_SCHED_OTHER_LOAD rating.
I have observed this phenomenon (and its fix) while looking at things
like network intensive workloads. I'm sure there are plenty of others
that could cause similar ripples.
The fact is, the scheduler treats "load" to mean certain things which
simply did not apply to RT tasks. As you know very well im sure ;),
"load" is a metric which expresses the share of the cpu that will be
consumed and this is used by the load balancer to make its decisions.
However, you can put whatever rating you want on an RT task and it would
always be irrelevant. RT tasks run as frequently and as long as they
want (w.r.t. SCHED_OTHER) independent of what their load rating implies
to the balancer, so you cannot make an accurate assessment of the true
"available shares". This is why the load-balancer would become confused
and fail to see true imbalance in a mixed environment. Fixing this, as
Peter has attempted to do, will result in a much better distribution of
SCHED_OTHER tasks across the true available bandwidth, and thus improve
overall performance.
In previous discussions with people, I had always used a metaphor of a
stream. A system running SCHED_OTHER tasks is like a smooth running
stream, but dispatching an RT task (or an IRQ, even) is like throwing a
boulder into the water. It makes a big disruptive splash and causes
turbulent white water behind it. And the stream has no influence over
the size of the boulder, its placement in the stream, nor how long it
will be staying.
This fix (at least in concept) allows it to become more like gently
slipping a streamlined aerodynamic object into the water. The stream
still cannot do anything about the size or placement of the object, but
it can at least flow around it and smoothly adapt to the reduced volume
of water that the stream can carry. :)
HTH
-Greg
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
next prev parent reply other threads:[~2008-08-21 12:28 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-21 9:18 [PATCH] sched: properly account IRQ and RT load in SCHED_OTHER load balancing Peter Zijlstra
2008-08-21 10:47 ` Ingo Molnar
2008-08-21 11:17 ` Ingo Molnar
2008-08-21 11:22 ` Peter Zijlstra
2008-08-21 11:40 ` Ingo Molnar
2008-08-21 11:36 ` Gregory Haskins
2008-08-21 11:41 ` Ingo Molnar
2008-08-21 12:26 ` Gregory Haskins [this message]
2008-08-21 12:43 ` Peter Zijlstra
2008-08-21 12:47 ` Gregory Haskins
2008-08-21 12:56 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48AD5EE0.8070407@novell.com \
--to=ghaskins@novell.com \
--cc=dbahi@novell.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=nickpiggin@yahoo.com.au \
--cc=peterz@infradead.org \
--cc=vatsa@in.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox