linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paul Turner <pjt@google.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andi Kleen <andi@firstfloor.org>,
	Glauber Costa <glommer@parallels.com>,
	linux-kernel@vger.kernel.org, xemul@parallels.com,
	paul@paulmenage.org, lizf@cn.fujitsu.com, daniel.lezcano@free.fr,
	mingo@elte.hu, jbottomley@parallels.com
Subject: Re: [PATCH 0/9] Per-cgroup /proc/stat
Date: Mon, 19 Sep 2011 16:07:28 -0700	[thread overview]
Message-ID: <4E77CB30.3030509@google.com> (raw)
In-Reply-To: <1316076989.3045.8.camel@twins>

On 09/15/11 01:56, Peter Zijlstra wrote:
> On Wed, 2011-09-14 at 13:23 -0700, Andi Kleen wrote:
>> Peter Zijlstra<a.p.zijlstra@chello.nl>  writes:
>>>
>>> Guys we should seriously trim back a lot of that code, not grow ever
>>> more and more. The sad fact is that if you build a kernel with
>>> cpu-cgroup support the context switch cost is more than double that of a
>>> kernel without, and then you haven't even started creating cgroups yet.
>>
>> That sounds indeed quite bad. Is it known why it is so costly?
>
> Mostly because all data structures grow and all code paths grow, some by
> quite a bit, its spread all over the place, lots of little cuts etc..
>
> pjt and I tried trimming some of the code paths with static_branch() but
> didn't really get anywhere.. need to get back to looking at this stuff
> sometime soon.

When I get some time I think I'm just going to post a patch[*] that 
merges the useful _field_ (usage, usage_percpu) from cpuacct into cpu 
since we are *already* doing the accounting on the entity level making 
this addition free.

At that point we could !CONFIG_CGROUP_CPUACCT by default and deprecate 
the beast without breaking ABI for those who really need it (either 
because their applications have hard-coded paths or because they really 
like cgroup user/sys time -- which we COULD duplicate into cpu but I'm 
inclined not to).

[*]: the only real caveat is how loudly people scream about the code 
duplication; I think it's worth it if it let's us kill cpuacct in the 
long run.

Another unrelated optimization on this path I have sitting around in 
patches/ to push at some point is keeping the left-most entity out of 
tree; since the worst case is an entity with a lower-vruntime comes 
along and we insert the previous left-most and the best case is we get 
to pick it without futzing with the rb-tree.  I think this was good for 
a percent or two when I hacked it together before.

Another idea I have kicking around for this path is the introduction of 
a link_entity which bridges over nr_running=1 chains (break it 
opportunistically when an element in the chain goes to nr_running=2). 
This one requires some pretty careful accounting around the breaking of 
a chain though so I'm not touching it until I get the new load tracking 
code out.  (Incidentally when I benchmarked it before LPC I had it 
working out to be a little more efficient than the current math good for 
~2-3% on pipe_test.)

- Paul

  reply	other threads:[~2011-09-19 23:07 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-14 20:04 [PATCH 0/9] Per-cgroup /proc/stat Glauber Costa
2011-09-14 20:04 ` [PATCH 1/9] Remove parent field in cpuacct cgroup Glauber Costa
2011-09-19 16:03   ` Peter Zijlstra
2011-09-19 16:09     ` Glauber Costa
2011-09-19 16:19       ` Peter Zijlstra
2011-09-19 16:30         ` Glauber Costa
2011-09-19 16:39           ` Peter Zijlstra
2011-09-19 16:41             ` Glauber Costa
2011-09-19 18:40               ` Peter Zijlstra
2011-09-20 17:29                 ` Srivatsa Vaddagiri
2011-09-22 15:11                   ` Balbir Singh
2011-09-22 15:17                     ` Peter Zijlstra
2011-09-23  8:09                       ` Balbir Singh
2011-09-23 14:35                         ` Peter Zijlstra
2011-09-23 15:45                         ` Glauber Costa
2011-09-19 18:35           ` Peter Zijlstra
2011-09-19 18:38             ` Glauber Costa
2011-09-19 18:44               ` Peter Zijlstra
2011-09-19 19:14                 ` Glauber Costa
2011-09-19 19:18                   ` Peter Zijlstra
2011-09-19 19:19                     ` Glauber Costa
2011-09-14 20:04 ` [PATCH 2/9] Make cpuacct fields per cpu variables Glauber Costa
2011-09-19 16:10   ` Peter Zijlstra
2011-09-14 20:04 ` [PATCH 3/9] Include nice values in cpuacct Glauber Costa
2011-09-19 16:19   ` Peter Zijlstra
2011-09-19 16:26     ` Glauber Costa
2011-09-19 18:36       ` Peter Zijlstra
2011-09-19 18:37         ` Glauber Costa
2011-09-14 20:04 ` [PATCH 4/9] Include irq and softirq fields " Glauber Costa
2011-09-19 18:38   ` Peter Zijlstra
2011-09-19 18:40     ` Glauber Costa
2011-09-14 20:04 ` [PATCH 5/9] Include guest " Glauber Costa
2011-09-14 20:04 ` [PATCH 6/9] Include idle and iowait " Glauber Costa
2011-09-20  9:21   ` Peter Zijlstra
2011-09-20 12:36     ` Glauber Costa
2011-09-20 12:58       ` Peter Zijlstra
2011-09-20 12:58         ` Glauber Costa
2011-09-20 13:05           ` Peter Zijlstra
2011-09-20 13:29             ` Glauber Costa
2011-09-14 20:04 ` [PATCH 7/9] Create cpuacct.proc.stat file Glauber Costa
2011-09-20  9:22   ` Peter Zijlstra
2011-09-20 12:22     ` Glauber Costa
2011-09-14 20:04 ` [PATCH 8/9] per-cgroup boot time Glauber Costa
2011-09-20  9:25   ` Peter Zijlstra
2011-09-20 12:37     ` Glauber Costa
2011-09-20 13:04       ` Peter Zijlstra
2011-09-20 13:06         ` Glauber Costa
2011-09-20 13:31           ` Peter Zijlstra
2011-09-14 20:04 ` [PATCH 9/9] Report steal time for cgroup Glauber Costa
2011-09-20  9:29   ` Peter Zijlstra
2011-09-14 20:13 ` [PATCH 0/9] Per-cgroup /proc/stat Peter Zijlstra
2011-09-14 20:20   ` Glauber Costa
2011-09-15  8:53     ` Peter Zijlstra
2011-09-14 20:23   ` Andi Kleen
2011-09-15  8:56     ` Peter Zijlstra
2011-09-19 23:07       ` Paul Turner [this message]
2011-09-20  8:33         ` Peter Zijlstra
2011-09-20 21:37         ` Glauber Costa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E77CB30.3030509@google.com \
    --to=pjt@google.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=andi@firstfloor.org \
    --cc=daniel.lezcano@free.fr \
    --cc=glommer@parallels.com \
    --cc=jbottomley@parallels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=mingo@elte.hu \
    --cc=paul@paulmenage.org \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).