RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime_us
@ 2008-06-18 14:12 Daniel K.
  2008-06-18 14:37 ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel K. @ 2008-06-18 14:12 UTC (permalink / raw)
  To: Peter Zijlstra, mingo, Linux Kernel Mailing List

mkdir /dev/cgroup
mount -t cgroup -o cpu,cpuset cgroup /dev/cgroup

mkdir /dev/cgroup/0

echo 3 > /dev/cgroup/0/cpuset.cpus
echo 0 > /dev/cgroup/0/cpuset.mems
echo 100000 > /dev/cgroup/0/cpu.rt_period_us
echo   5000 > /dev/cgroup/0/cpu.rt_runtime_us

schedtool -R -p 1 -e burnP6 &
[1] 3309
echo 3309 > /dev/cgroup/0/tasks

At this point I'd expect the burnP6 task to use 5% of the available CPU
resources in the cgroup (5000/100000), but the real CPU usage, as
reported by top, is 20% This is 4 times the expected result, and as I
have 4 cores, I think there is a strong hint of correlation there.

Maybe with a 4 core system there really is 4 000 000 us available for
every 1 wall-time second?

However, I have only assigned one core (3) to _this_ cgroup, so I think
this cgroup is overusing its assigned resources.

What do you think?

Daniel K.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime_us
  2008-06-18 14:12 RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime_us Daniel K.
@ 2008-06-18 14:37 ` Peter Zijlstra
  2008-06-24  6:14   ` Max Krasnyansky
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2008-06-18 14:37 UTC (permalink / raw)
  To: Daniel K.
  Cc: mingo, Linux Kernel Mailing List, Max Krasnyanskiy, Paul Jackson,
	Gregory Haskins

On Wed, 2008-06-18 at 16:12 +0200, Daniel K. wrote:
> mkdir /dev/cgroup
> mount -t cgroup -o cpu,cpuset cgroup /dev/cgroup
> 
> mkdir /dev/cgroup/0
> 
> echo 3 > /dev/cgroup/0/cpuset.cpus
> echo 0 > /dev/cgroup/0/cpuset.mems
> echo 100000 > /dev/cgroup/0/cpu.rt_period_us
> echo   5000 > /dev/cgroup/0/cpu.rt_runtime_us
> 
> schedtool -R -p 1 -e burnP6 &
> [1] 3309
> echo 3309 > /dev/cgroup/0/tasks
> 
> At this point I'd expect the burnP6 task to use 5% of the available CPU
> resources in the cgroup (5000/100000), but the real CPU usage, as
> reported by top, is 20% This is 4 times the expected result, and as I
> have 4 cores, I think there is a strong hint of correlation there.
> 
> Maybe with a 4 core system there really is 4 000 000 us available for
> every 1 wall-time second?

Indeed. In effect each cpu (see below on specifics) gets the
runtime/period you specify, and it moves unused runtime between cpus.

> However, I have only assigned one core (3) to _this_ cgroup, so I think
> this cgroup is overusing its assigned resources.
> 
> What do you think?

I think you're on to something :-)

It uses root domains, that is the largest domain this cpu is part of
that has load-balancing enabled.

So while you have made your process part of the cgroup and the cpuset,
there is no strong relation between them, that is to say, I could either
mount the cpuset or cpu controller on a different mount point and add
tasks to one but not the other.

So the relation I used is that of load-balance domains.

So in order to get what you intended, do something like:


mount none /dev/cpuset cgroup -o cpuset
mount none /cgroup/cpu cgroup -o cpu

mkdir /dev/cpuset/root
mkdir /dev/cpuset/rt

#
# this might not actually make the kernel happy
# as it might attempt (and possibly succeed in)
# moving cpu bound kernel threads
#
for i in `cat /dev/cpuset/tasks`; do
	echo $i > /dev/cpuset/root/tasks;
done

echo 0-2 > /dev/cpuset/root/cpuset.cpus
echo 3 > /dev/cpuset/rt/cpuset.cpus

echo 0 > /dev/cpuset/cpuset.sched_load_balance

mkdir /cgroup/cpu/foo
echo 100000 > /cgroup/cpu/foo/cpu.rt_period_us
echo   5000 > /cgroup/cpu/foo/cpu.rt_runtime_us

echo $$ > /dev/cpuset/rt/tasks
echo $$ > /cgroup/cpu/foo/tasks

chrt -r -p 1 burnP6 &





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime_us
  2008-06-18 14:37 ` Peter Zijlstra
@ 2008-06-24  6:14   ` Max Krasnyansky
  2008-06-24  9:53     ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Max Krasnyansky @ 2008-06-24  6:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel K., mingo, Linux Kernel Mailing List, Paul Jackson,
	Gregory Haskins

Peter Zijlstra wrote:
> On Wed, 2008-06-18 at 16:12 +0200, Daniel K. wrote:
>> mkdir /dev/cgroup
>> mount -t cgroup -o cpu,cpuset cgroup /dev/cgroup
>>
>> mkdir /dev/cgroup/0
>>
>> echo 3 > /dev/cgroup/0/cpuset.cpus
>> echo 0 > /dev/cgroup/0/cpuset.mems
>> echo 100000 > /dev/cgroup/0/cpu.rt_period_us
>> echo   5000 > /dev/cgroup/0/cpu.rt_runtime_us
>>
>> schedtool -R -p 1 -e burnP6 &
>> [1] 3309
>> echo 3309 > /dev/cgroup/0/tasks
>>
>> At this point I'd expect the burnP6 task to use 5% of the available CPU
>> resources in the cgroup (5000/100000), but the real CPU usage, as
>> reported by top, is 20% This is 4 times the expected result, and as I
>> have 4 cores, I think there is a strong hint of correlation there.
>>
>> Maybe with a 4 core system there really is 4 000 000 us available for
>> every 1 wall-time second?
> 
> Indeed. In effect each cpu (see below on specifics) gets the
> runtime/period you specify, and it moves unused runtime between cpus.
> 
>> However, I have only assigned one core (3) to _this_ cgroup, so I think
>> this cgroup is overusing its assigned resources.
>>
>> What do you think?
> 
> I think you're on to something :-)
> 
> It uses root domains, that is the largest domain this cpu is part of
> that has load-balancing enabled.
> 
> So while you have made your process part of the cgroup and the cpuset,
> there is no strong relation between them, that is to say, I could either
> mount the cpuset or cpu controller on a different mount point and add
> tasks to one but not the other.
Daniel is probably really confused by now :).

> So the relation I used is that of load-balance domains.
That's the key thing.

> So in order to get what you intended, do something like:
> 
> mount none /dev/cpuset cgroup -o cpuset
> mount none /cgroup/cpu cgroup -o cpu
> 
> mkdir /dev/cpuset/root
> mkdir /dev/cpuset/rt
> 
> #
> # this might not actually make the kernel happy
> # as it might attempt (and possibly succeed in)
> # moving cpu bound kernel threads
> #
> for i in `cat /dev/cpuset/tasks`; do
> 	echo $i > /dev/cpuset/root/tasks;
> done
It won't let you add tasks before adding cpus.

> echo 0-2 > /dev/cpuset/root/cpuset.cpus
> echo 3 > /dev/cpuset/rt/cpuset.cpus
> 
> echo 0 > /dev/cpuset/cpuset.sched_load_balance
> 
> mkdir /cgroup/cpu/foo
> echo 100000 > /cgroup/cpu/foo/cpu.rt_period_us
> echo   5000 > /cgroup/cpu/foo/cpu.rt_runtime_us
> 
> echo $$ > /dev/cpuset/rt/tasks
> echo $$ > /cgroup/cpu/foo/tasks
> 
> chrt -r -p 1 burnP6 &

That seems too complicated :). There is no need to mount them separately. The
only part that was missing from Daniel's example is the sched_load_balance
thingy otherwise he can still have a single cgroup unless I missing something.
In other words:

mkdir /dev/cgroup
mount -t cgroup -o cpu,cpuset cgroup /dev/cgroup

# Setup first domain (cpu 0-2)
mkdir /dev/cgroup/0
echo 0-2 > /dev/cgroup/0/cpuset.cpus
echo 0 > /dev/cgroup/0/cpuset.mems

# Setup second domain (cpu 3)
mkdir /dev/cgroup/1
echo 3 > /dev/cgroup/1/cpuset.cpus
echo 0 > /dev/cgroup/1/cpuset.mems

# Do not balance between domains
echo 0 > /dev/cpuset/cpuset.sched_load_balance

# Move all tasks into first domain if needed
...

# Setup RT bandwidth for second domain
echo 100000 > /dev/cgroup/1/cpu.rt_period_us
echo   5000 > /dev/cgroup/1/cpu.rt_runtime_us

schedtool -R -p 1 -e burnP6 &
[1] 3309
echo 3309 > /dev/cgroup/1/tasks

Max

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime_us
  2008-06-24  6:14   ` Max Krasnyansky
@ 2008-06-24  9:53     ` Peter Zijlstra
  2008-06-24 16:50       ` Max Krasnyanskiy
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2008-06-24  9:53 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: Daniel K., mingo, Linux Kernel Mailing List, Paul Jackson,
	Gregory Haskins

On Mon, 2008-06-23 at 23:14 -0700, Max Krasnyansky wrote:

> That seems too complicated :). There is no need to mount them separately.

Mounting them independently gives you much greater flexibility.

You quickly run into overlapping cpu sets if you want multiple groups on
the isolated part.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime_us
  2008-06-24  9:53     ` Peter Zijlstra
@ 2008-06-24 16:50       ` Max Krasnyanskiy
  0 siblings, 0 replies; 5+ messages in thread
From: Max Krasnyanskiy @ 2008-06-24 16:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel K., mingo, Linux Kernel Mailing List, Paul Jackson,
	Gregory Haskins

Peter Zijlstra wrote:
> On Mon, 2008-06-23 at 23:14 -0700, Max Krasnyansky wrote:
> 
>> That seems too complicated :). There is no need to mount them separately.
> 
> Mounting them independently gives you much greater flexibility.
> 
> You quickly run into overlapping cpu sets if you want multiple groups on
> the isolated part.

Sure, it is more flexible but for simple cases it's more confusing.

Max

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-06-24 16:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-18 14:12 RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime_us Daniel K.
2008-06-18 14:37 ` Peter Zijlstra
2008-06-24  6:14   ` Max Krasnyansky
2008-06-24  9:53     ` Peter Zijlstra
2008-06-24 16:50       ` Max Krasnyanskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox