Re: [RFC] [PATCH 4/4] memcg: Document kernel memory accounting.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Glauber Costa <glommer@parallels.com>
To: Suleiman Souhlal <ssouhlal@FreeBSD.org>
Cc: gthelen@google.com, yinghan@google.com,
	kamezawa.hiroyu@jp.fujitsu.com, jbottomley@parallels.com,
	suleiman@google.com, linux-mm@kvack.org
Subject: Re: [RFC] [PATCH 4/4] memcg: Document kernel memory accounting.
Date: Mon, 17 Oct 2011 12:56:09 +0400	[thread overview]
Message-ID: <4E9BEDA9.6000908@parallels.com> (raw)
In-Reply-To: <1318639110-27714-5-git-send-email-ssouhlal@FreeBSD.org>

On 10/15/2011 04:38 AM, Suleiman Souhlal wrote:
> Signed-off-by: Suleiman Souhlal<suleiman@google.com>
> ---
>   Documentation/cgroups/memory.txt |   33 ++++++++++++++++++++++++++++++++-
>   1 files changed, 32 insertions(+), 1 deletions(-)
>
> diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
> index 06eb6d9..277cf25 100644
> --- a/Documentation/cgroups/memory.txt
> +++ b/Documentation/cgroups/memory.txt
> @@ -220,7 +220,37 @@ caches are dropped. But as mentioned above, global LRU can do swapout memory
>   from it for sanity of the system's memory management state. You can't forbid
>   it by cgroup.
>
> -2.5 Reclaim
> +2.5 Kernel Memory
> +
> +A cgroup's kernel memory is accounted into its memory.usage_in_bytes and
> +is also shown in memory.stat as kernel_memory. Kernel memory does not get
> +counted towards the root cgroup's memory.usage_in_bytes, but still
> +appears in its kernel_memory.
> +
> +Upon cgroup deletion, all the remaining kernel memory gets moved to the
> +root cgroup.
> +
> +An accounted kernel memory allocation may trigger reclaim in that cgroup,
> +and may also OOM.
> +
> +Currently only slab memory allocated without __GFP_NOACCOUNT and
> +__GFP_NOFAIL gets accounted to the current process' cgroup.
> +
> +2.5.1 Slab
> +
> +Slab gets accounted on a per-page basis, which is done by using per-cgroup
> +kmem_caches. These per-cgroup kmem_caches get created on-demand, the first
> +time a specific kmem_cache gets used by a cgroup.

Well, let me first start with some general comments:

I think the approach I've taken, which is, allowing the cache creators 
to register themselves for cgroup usage, is better than scanning the 
list of existing caches. Couple of key reasons:

1) We then don't need another flag. _GFP_NOACCOUNT => doing nothing.
2) Less polution in the slab structure itself, which makes it have
higher chances of inclusion, and less duplicate work in the slub.
3) Easier to do per-cache tuning if we ever want to.

About, on-demand creation, I think it is a nice idea. But it may impact 
allocation latency on caches that we are sure to be used, like the 
dentry cache. So that gives us:

4) If the cache creator is registering itself, it can specify which 
behavior it wants. On-Demand creation vs Straight creation.

> +Slab memory that cannot be attributed to a cgroup gets charged to the root
> +cgroup.
> +
> +A per-cgroup kmem_cache is named like the original, with the cgroup's name
> +in parethesis.

I used the address for simplicity, but I like names better. Agree here.
Extending it: If a task resides in the cgroup itself, I think it should 
see its cache only, in /proc/slabinfo (selectable, take a look at 
https://lkml.org/lkml/2011/10/6/132 for more details)

> +When a kmem_cache gets migrated to the root cgroup, "dead" is appended to
> +its name, to indicated that it is not going to be used for new allocations.

Why not just remove it?

> +2.6 Reclaim
>
>   Each cgroup maintains a per cgroup LRU which has the same structure as
>   global VM. When a cgroup goes over its limit, we first try
> @@ -396,6 +426,7 @@ active_anon	- # of bytes of anonymous and swap cache memory on active
>   inactive_file	- # of bytes of file-backed memory on inactive LRU list.
>   active_file	- # of bytes of file-backed memory on active LRU list.
>   unevictable	- # of bytes of memory that cannot be reclaimed (mlocked etc).
> +kernel_memory   - # of bytes of kernel memory.
>
>   # status considering hierarchy (see memory.use_hierarchy settings)
>

Another

* I think usage of res_counters is better than relying on slab fields to 
impose limits,
* We still need the ability to restrict kernel memory usage separately 
from user memory, dependent on a selectable, as we already discussed here.
* I think we should do everything in our power to reduce overhead for 
the special case in which only the root cgroup exist . Take a look at 
what happened with the following thread: 
https://lkml.org/lkml/2011/10/13/201. To be honest, I think it is an 
idea we should least consider: not to account *anything* to the root 
cgroup (make a selectable if we want to conserve behaviour), user 
memory, kernel memory. Then we can keep native performance for 
non-cgroup users. (But that's another discussion anyway)

All in all, this is a good start. Both our approaches have a lot in 
common (well, which is not strange, given that we discussed them a lot 
on the past month =p, and I did like some concepts)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-10-17  8:56 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-15  0:38 [RFC] [PATCH 0/4] memcg: Kernel memory accounting Suleiman Souhlal
2011-10-15  0:38 ` [RFC] [PATCH 1/4] memcg: Kernel memory accounting infrastructure Suleiman Souhlal
2011-10-15  0:38   ` [RFC] [PATCH 2/4] memcg: Introduce __GFP_NOACCOUNT Suleiman Souhlal
2011-10-15  0:38     ` [RFC] [PATCH 3/4] memcg: Slab accounting Suleiman Souhlal
2011-10-15  0:38       ` [RFC] [PATCH 4/4] memcg: Document kernel memory accounting Suleiman Souhlal
2011-10-17  8:56         ` Glauber Costa [this message]
2011-10-17 17:19           ` Suleiman Souhlal
2011-10-17  0:32 ` [RFC] [PATCH 0/4] memcg: Kernel " KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E9BEDA9.6000908@parallels.com \
    --to=glommer@parallels.com \
    --cc=gthelen@google.com \
    --cc=jbottomley@parallels.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=ssouhlal@FreeBSD.org \
    --cc=suleiman@google.com \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.