* Re: 1 RT task blocks 4-core machine ?
@ 2010-10-09 17:42 Tommaso Cucinotta
2010-10-11 7:53 ` Peter Zijlstra
0 siblings, 1 reply; 4+ messages in thread
From: Tommaso Cucinotta @ 2010-10-09 17:42 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-kernel
Peter wrote:
> On Tue, 2010-10-05 at 00:26 +0200, Tommaso Cucinotta wrote:
> > A possible explanation might be that the CFS load balancing logic sees
> > my only active task (e.g., the ssh server or shell etc.) as running
> > alone on its core, and does not detect that it is inhibited to actually
> > run due to RT tasks on the same core. Therefore, it will not migrate
> > the task to the free cores. Does this explanation make sense
> > or is it completely wrong ?
>
> Possibly, its got some logic to detect this but maybe it gets confused
> still, in particular look at the adaptive cpu_power in
> update_cpu_power() and calling functions.
Ok, I'll have a look (when I have some time :-( ), thanks.
> > Also, I'd like to hear whether this is considered the "normal/desired"
> > behavior of intermixing RT and non-RT tasks.
>
> Pegging a cpu using sched_fifo/rr pretty much means you get to keep the
> pieces, if it works nice, if you can make it work better kudos, but no
> polling from sched_fifo/rr is not something that is considered sane for
> the general health of your system.
Sure, I was not thinking to push/pull across heterogeneous scheduling
classes, but rather to simply account for the proper per-CPU tasks count
and load (including all the tasks comprising RT ones) when load-balancing
in CFS. Perhaps, you mean, e.g., if a RT task ends, the CPU would go idle
and it would be supposed to pull ? Just we don't do that, and at the next
load-balancing decision things would be fixed up (please, consider I don't
know the CFS load balancer so well).
So, for example, in addition to fix the reported issue, we'd get also that,
when pinning a heavy RT workload on a CPU, CFS tasks would migrate to other
CPUs, if available. Again, that doesn't need to be instantaneous (push), but
it could happen later when the CFS load-balancer is invoked (is it invoked
periodically, as of now ?).
Thanks,
T.
--
Tommaso Cucinotta, Computer Engineering PhD, Researcher
ReTiS Lab, Scuola Superiore Sant'Anna, Pisa, Italy
Tel +39 050 882 024, Fax +39 050 882 003
http://retis.sssup.it/people/tommaso
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 1 RT task blocks 4-core machine ?
2010-10-09 17:42 1 RT task blocks 4-core machine ? Tommaso Cucinotta
@ 2010-10-11 7:53 ` Peter Zijlstra
0 siblings, 0 replies; 4+ messages in thread
From: Peter Zijlstra @ 2010-10-11 7:53 UTC (permalink / raw)
To: Tommaso Cucinotta; +Cc: linux-kernel
On Sat, 2010-10-09 at 19:42 +0200, Tommaso Cucinotta wrote:
> Peter wrote:
> > On Tue, 2010-10-05 at 00:26 +0200, Tommaso Cucinotta wrote:
> > > A possible explanation might be that the CFS load balancing logic sees
> > > my only active task (e.g., the ssh server or shell etc.) as running
> > > alone on its core, and does not detect that it is inhibited to actually
> > > run due to RT tasks on the same core. Therefore, it will not migrate
> > > the task to the free cores. Does this explanation make sense
> > > or is it completely wrong ?
> >
> > Possibly, its got some logic to detect this but maybe it gets confused
> > still, in particular look at the adaptive cpu_power in
> > update_cpu_power() and calling functions.
>
> Ok, I'll have a look (when I have some time :-( ), thanks.
>
> > > Also, I'd like to hear whether this is considered the "normal/desired"
> > > behavior of intermixing RT and non-RT tasks.
> >
> > Pegging a cpu using sched_fifo/rr pretty much means you get to keep the
> > pieces, if it works nice, if you can make it work better kudos, but no
> > polling from sched_fifo/rr is not something that is considered sane for
> > the general health of your system.
>
> Sure, I was not thinking to push/pull across heterogeneous scheduling
> classes, but rather to simply account for the proper per-CPU tasks count
> and load (including all the tasks comprising RT ones) when load-balancing
> in CFS.
Right, so we do that. Part of the problem is that RR/FIFO tasks have no
weight/load (not even a worst case weight like sporadic tasks have). So
what we do is (per-cpu) take an average measure of the time spend on !
CFS tasks (sched_rt_avg_update() and friends) and use that to lower that
CPUs total throughput, which is reflected in the mentioned ->cpu_power
variable.
> Perhaps, you mean, e.g., if a RT task ends, the CPU would go idle
> and it would be supposed to pull ? Just we don't do that, and at the next
> load-balancing decision things would be fixed up (please, consider I don't
> know the CFS load balancer so well).
No, what I meant was that if a particular CPU is very busy with !CFS
work, its ->cpu_power variable will decrease to 1 (0 will get us
division by zero issues). Somehow we need to avoid this load-balancer
from thinking its a good idea to place tasks there.
The natural balance is to move tasks away from weak CPUs, but clearly
its not good enough.
Also, there is housekeeping that needs to be done on a per-cpu basis.
CPU affine tasks like workqueue things need to run in order to keep the
system functional, pegging a CPU with a RT task starves these, causing
general system dysfunction.
> So, for example, in addition to fix the reported issue, we'd get also that,
> when pinning a heavy RT workload on a CPU, CFS tasks would migrate to other
> CPUs, if available. Again, that doesn't need to be instantaneous (push), but
> it could happen later when the CFS load-balancer is invoked (is it invoked
> periodically, as of now ?).
That should basically work, we normalize the cpu load (sum of all cfs
task weights) by the ->cpu_power, a weak cpu will tend to get all its
tasks migrated away to stronger CPUs, again, there's probably some
corner case that doesn't quite work as expected.
^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <AANLkTim2icGqFWB1S6TVcQwmo+sCcGdNDPO3s8LFqzR=@mail.gmail.com>]
* 1 RT task blocks 4-core machine ?
[not found] <AANLkTim2icGqFWB1S6TVcQwmo+sCcGdNDPO3s8LFqzR=@mail.gmail.com>
@ 2010-10-04 22:26 ` Tommaso Cucinotta
2010-10-06 13:34 ` Peter Zijlstra
0 siblings, 1 reply; 4+ messages in thread
From: Tommaso Cucinotta @ 2010-10-04 22:26 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Dhaval Giani, Ingo Molnar, Thomas Gleixner, Dario Faggioli,
Fabio Checconi, linux-kernel
Hi,
I noticed that I can loose control of a 2.6.35 kernel running on a
4-core system in a way which I find quite unexpected:
chrt -r 1 /usr/bin/yes > /dev/null
(default 95% per-cpu throttling). Ok, with rt bandwidth migration
among cores, my yes process will take undisturbed 100% of *one* core,
but I would be supposed to keep controlling the system using the
other three ones, wouldn't I ?
Instead, If I'm from a terminal, then I loose control of it, console
switching does not work anymore. Apparently I cannot do anything, but
sometimes I can log via ssh from another system.
A similar behavior happens if I try from ssh or from X.
Sometimes, my key presses (e.g., Alt-F2) are followed many many second
later. Except Alt-Sys-rq, which keep working, unless I come up with the
very bad idea of trying to "Nice all RT Tasks". This causes a real
freeze.
A possible explanation might be that the CFS load balancing logic sees
my only active task (e.g., the ssh server or shell etc.) as running
alone on its core, and does not detect that it is inhibited to actually
run due to RT tasks on the same core. Therefore, it will not migrate
the task to the free cores. Does this explanation make sense
or is it completely wrong ?
Also, I'd like to hear whether this is considered the "normal/desired"
behavior of intermixing RT and non-RT tasks.
Thanks and regards,
Tommaso
--
Tommaso Cucinotta, Computer Engineering PhD, Researcher
ReTiS Lab, Scuola Superiore Sant'Anna, Pisa, Italy
Tel +39 050 882 024, Fax +39 050 882 003
http://retis.sssup.it/people/tommaso
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: 1 RT task blocks 4-core machine ?
2010-10-04 22:26 ` Tommaso Cucinotta
@ 2010-10-06 13:34 ` Peter Zijlstra
0 siblings, 0 replies; 4+ messages in thread
From: Peter Zijlstra @ 2010-10-06 13:34 UTC (permalink / raw)
To: Tommaso Cucinotta
Cc: Dhaval Giani, Ingo Molnar, Thomas Gleixner, Dario Faggioli,
Fabio Checconi, linux-kernel
On Tue, 2010-10-05 at 00:26 +0200, Tommaso Cucinotta wrote:
> A possible explanation might be that the CFS load balancing logic sees
> my only active task (e.g., the ssh server or shell etc.) as running
> alone on its core, and does not detect that it is inhibited to actually
> run due to RT tasks on the same core. Therefore, it will not migrate
> the task to the free cores. Does this explanation make sense
> or is it completely wrong ?
Possibly, its got some logic to detect this but maybe it gets confused
still, in particular look at the adaptive cpu_power in
update_cpu_power() and calling functions.
> Also, I'd like to hear whether this is considered the "normal/desired"
> behavior of intermixing RT and non-RT tasks.
Pegging a cpu using sched_fifo/rr pretty much means you get to keep the
pieces, if it works nice, if you can make it work better kudos, but no
polling from sched_fifo/rr is not something that is considered sane for
the general health of your system.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-10-11 7:53 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-09 17:42 1 RT task blocks 4-core machine ? Tommaso Cucinotta
2010-10-11 7:53 ` Peter Zijlstra
[not found] <AANLkTim2icGqFWB1S6TVcQwmo+sCcGdNDPO3s8LFqzR=@mail.gmail.com>
2010-10-04 22:26 ` Tommaso Cucinotta
2010-10-06 13:34 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox