Re: RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime_us

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Max Krasnyansky <maxk@qualcomm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Daniel K." <dk@uw.no>,
	mingo@elte.hu,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Paul Jackson <pj@sgi.com>, Gregory Haskins <ghaskins@novell.com>
Subject: Re: RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime_us
Date: Mon, 23 Jun 2008 23:14:52 -0700	[thread overview]
Message-ID: <486090DC.7010705@qualcomm.com> (raw)
In-Reply-To: <1213799836.16944.244.camel@twins>

Peter Zijlstra wrote:
> On Wed, 2008-06-18 at 16:12 +0200, Daniel K. wrote:
>> mkdir /dev/cgroup
>> mount -t cgroup -o cpu,cpuset cgroup /dev/cgroup
>>
>> mkdir /dev/cgroup/0
>>
>> echo 3 > /dev/cgroup/0/cpuset.cpus
>> echo 0 > /dev/cgroup/0/cpuset.mems
>> echo 100000 > /dev/cgroup/0/cpu.rt_period_us
>> echo   5000 > /dev/cgroup/0/cpu.rt_runtime_us
>>
>> schedtool -R -p 1 -e burnP6 &
>> [1] 3309
>> echo 3309 > /dev/cgroup/0/tasks
>>
>> At this point I'd expect the burnP6 task to use 5% of the available CPU
>> resources in the cgroup (5000/100000), but the real CPU usage, as
>> reported by top, is 20% This is 4 times the expected result, and as I
>> have 4 cores, I think there is a strong hint of correlation there.
>>
>> Maybe with a 4 core system there really is 4 000 000 us available for
>> every 1 wall-time second?
> 
> Indeed. In effect each cpu (see below on specifics) gets the
> runtime/period you specify, and it moves unused runtime between cpus.
> 
>> However, I have only assigned one core (3) to _this_ cgroup, so I think
>> this cgroup is overusing its assigned resources.
>>
>> What do you think?
> 
> I think you're on to something :-)
> 
> It uses root domains, that is the largest domain this cpu is part of
> that has load-balancing enabled.
> 
> So while you have made your process part of the cgroup and the cpuset,
> there is no strong relation between them, that is to say, I could either
> mount the cpuset or cpu controller on a different mount point and add
> tasks to one but not the other.
Daniel is probably really confused by now :).

> So the relation I used is that of load-balance domains.
That's the key thing.

> So in order to get what you intended, do something like:
> 
> mount none /dev/cpuset cgroup -o cpuset
> mount none /cgroup/cpu cgroup -o cpu
> 
> mkdir /dev/cpuset/root
> mkdir /dev/cpuset/rt
> 
> #
> # this might not actually make the kernel happy
> # as it might attempt (and possibly succeed in)
> # moving cpu bound kernel threads
> #
> for i in `cat /dev/cpuset/tasks`; do
> 	echo $i > /dev/cpuset/root/tasks;
> done
It won't let you add tasks before adding cpus.

> echo 0-2 > /dev/cpuset/root/cpuset.cpus
> echo 3 > /dev/cpuset/rt/cpuset.cpus
> 
> echo 0 > /dev/cpuset/cpuset.sched_load_balance
> 
> mkdir /cgroup/cpu/foo
> echo 100000 > /cgroup/cpu/foo/cpu.rt_period_us
> echo   5000 > /cgroup/cpu/foo/cpu.rt_runtime_us
> 
> echo $$ > /dev/cpuset/rt/tasks
> echo $$ > /cgroup/cpu/foo/tasks
> 
> chrt -r -p 1 burnP6 &

That seems too complicated :). There is no need to mount them separately. The
only part that was missing from Daniel's example is the sched_load_balance
thingy otherwise he can still have a single cgroup unless I missing something.
In other words:

mkdir /dev/cgroup
mount -t cgroup -o cpu,cpuset cgroup /dev/cgroup

# Setup first domain (cpu 0-2)
mkdir /dev/cgroup/0
echo 0-2 > /dev/cgroup/0/cpuset.cpus
echo 0 > /dev/cgroup/0/cpuset.mems

# Setup second domain (cpu 3)
mkdir /dev/cgroup/1
echo 3 > /dev/cgroup/1/cpuset.cpus
echo 0 > /dev/cgroup/1/cpuset.mems

# Do not balance between domains
echo 0 > /dev/cpuset/cpuset.sched_load_balance

# Move all tasks into first domain if needed
...

# Setup RT bandwidth for second domain
echo 100000 > /dev/cgroup/1/cpu.rt_period_us
echo   5000 > /dev/cgroup/1/cpu.rt_runtime_us

schedtool -R -p 1 -e burnP6 &
[1] 3309
echo 3309 > /dev/cgroup/1/tasks

Max

next prev parent reply	other threads:[~2008-06-24  6:15 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-18 14:12 RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime_us Daniel K.
2008-06-18 14:37 ` Peter Zijlstra
2008-06-24  6:14   ` Max Krasnyansky [this message]
2008-06-24  9:53     ` Peter Zijlstra
2008-06-24 16:50       ` Max Krasnyanskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=486090DC.7010705@qualcomm.com \
    --to=maxk@qualcomm.com \
    --cc=dk@uw.no \
    --cc=ghaskins@novell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=pj@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.