linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Glauber Costa <glommer@parallels.com>
Cc: linux-mm@kvack.org, cgroups@vger.kernel.org,
	Mel Gorman <mgorman@suse.de>, Tejun Heo <tj@kernel.org>,
	Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	kamezawa.hiroyu@jp.fujitsu.com, Christoph Lameter <cl@linux.com>,
	David Rientjes <rientjes@google.com>,
	Pekka Enberg <penberg@kernel.org>,
	devel@openvz.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 00/14] kmem controller for memcg.
Date: Thu, 18 Oct 2012 12:21:05 -0700	[thread overview]
Message-ID: <20121018122105.2efc2841.akpm@linux-foundation.org> (raw)
In-Reply-To: <50803379.8000808@parallels.com>

On Thu, 18 Oct 2012 20:51:05 +0400
Glauber Costa <glommer@parallels.com> wrote:

> On 10/18/2012 02:11 AM, Andrew Morton wrote:
> > On Tue, 16 Oct 2012 14:16:37 +0400
> > Glauber Costa <glommer@parallels.com> wrote:
> > 
> >> ...
> >>
> >> A general explanation of what this is all about follows:
> >>
> >> The kernel memory limitation mechanism for memcg concerns itself with
> >> disallowing potentially non-reclaimable allocations to happen in exaggerate
> >> quantities by a particular set of processes (cgroup). Those allocations could
> >> create pressure that affects the behavior of a different and unrelated set of
> >> processes.
> >>
> >> Its basic working mechanism is to annotate some allocations with the
> >> _GFP_KMEMCG flag. When this flag is set, the current process allocating will
> >> have its memcg identified and charged against. When reaching a specific limit,
> >> further allocations will be denied.
> > 
> > The need to set _GFP_KMEMCG is rather unpleasing, and makes one wonder
> > "why didn't it just track all allocations".
> > 
> This was raised as well by Peter Zijlstra during the memcg summit.

Firstly: please treat any question from a reviewer as an indication
that information was missing from the changelog or from code comments. 
Ideally all such queries are addressed in later version of the patch
and changelog.

> The
> answer I gave to him still stands: There is a cost associated with it.
> We believe it comes down to a trade off situation. How much tracking a
> particular kind of allocation help vs how much does it cost.
> 
> The free path is specially more expensive, since it will always incur in
> a page_cgroup lookup.

OK.  But that is a quantitative argument, without any quantities!  Do
we have even an estimate of what this cost will be?  Perhaps it's the
case that, if well implemented, that cost will be acceptable.  How do
we tell?

> > Does this mean that over time we can expect more sites to get the
> > _GFP_KMEMCG tagging?  
> 
> We have being doing kernel memory limitation for OpenVZ for a lot of
> times, using a quite different mechanism. What we do in this work (with
> slab included), allows us to achieve feature parity with that. It means
> it is good enough for production environments.

That's really good info.
 
> Whether or not more people will want other allocations to be tracked, I
> can't predict. What I do can say is that stack + slab is a very
> significant part of the memory one potentially cares about, and if
> anyone else ever have the need for more, it will come down to a
> trade-off calculation.

OK.
 
> > If so, are there any special implications, or do
> > we just go in, do the one-line patch and expect everything to work? 
> 
> With the infrastructure in place, it shouldn't be hard. But it's not
> necessarily a one-liner either. It depends on what are the pratical
> considerations for having that specific kind of allocation tied to a
> memcg. The slab, for instance, that follows this series, is far away
> from a one-liner: it is in fact, a 19-patch patch series.
> 
> 
> 
> > 
> > And how *accurate* is the proposed code?  What percentage of kernel
> > memory allocations are unaccounted, typical case and worst case?
> 
> With both patchsets applied, all memory used for the stack and most of
> the memory used for slab objects allocated in userspace process contexts
> are accounted.
> 
> I honestly don't know which percentage of the total kernel memory this
> represents.

It sounds like the coverage will be good.  What's left over?  Random
get_free_pages() calls and interrupt-time slab allocations?

I suppose that there are situations in which network rx could consume
significant amounts of unaccounted memory?

> The accuracy for stack pages is very high: In this series, we don't move
> stack pages around when moving a task to other cgroups (for stack, it
> could be done), but other than that, all processes that pops up in a
> cgroup and stay there will have its memory accurately accounted.
> 
> The slab is more complicated, and depends on the workload. It will be
> more accurate in workloads in which the level of object-sharing among
> cgroups is low. A container, for instance, is the perfect example of
> where this happen.
> 
> > 
> > All sorts of questions come to mind over this decision, but it was
> > unexplained.  It should be, please.  A lot!
> > 
> >>
> >> ...
> >>
> >> Limits lower than
> >> the user limit effectively means there is a separate kernel memory limit that
> >> may be reached independently than the user limit. Values equal or greater than
> >> the user limit implies only that kernel memory is tracked. This provides a
> >> unified vision of "maximum memory", be it kernel or user memory.
> >>
> > 
> > I'm struggling to understand that text much at all.  Reading the
> > Documentation/cgroups/memory.txt patch helped.
> > 
> 
> Great. If you have any specific suggestions I can change that. Maybe I
> should just paste the documentation bit in here...

That's not a bad idea.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-10-18 19:21 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-16 10:16 [PATCH v5 00/14] kmem controller for memcg Glauber Costa
2012-10-16 10:16 ` [PATCH v5 01/14] memcg: Make it possible to use the stock for more than one page Glauber Costa
2012-10-17 22:11   ` Andrew Morton
2012-10-18 16:54     ` Glauber Costa
2012-10-16 10:16 ` [PATCH v5 02/14] memcg: Reclaim when more than one page needed Glauber Costa
2012-10-17 21:46   ` David Rientjes
2012-10-16 10:16 ` [PATCH v5 03/14] memcg: change defines to an enum Glauber Costa
2012-10-17 21:50   ` David Rientjes
2012-10-16 10:16 ` [PATCH v5 04/14] kmem accounting basic infrastructure Glauber Costa
2012-10-16 12:14   ` Michal Hocko
2012-10-17 22:08   ` David Rientjes
2012-10-18 17:01     ` Glauber Costa
2012-10-18 19:47       ` Tejun Heo
2012-10-18 19:37     ` Tejun Heo
2012-10-17 22:12   ` Andrew Morton
2012-10-18 17:03     ` Glauber Costa
2012-10-16 10:16 ` [PATCH v5 05/14] Add a __GFP_KMEMCG flag Glauber Costa
2012-10-16 12:15   ` Michal Hocko
2012-10-17 22:09   ` David Rientjes
2012-10-16 10:16 ` [PATCH v5 06/14] memcg: kmem controller infrastructure Glauber Costa
2012-10-17  6:40   ` Kamezawa Hiroyuki
2012-10-17 22:12   ` Andrew Morton
2012-10-18  9:16     ` Glauber Costa
2012-10-18 22:06       ` David Rientjes
2012-10-19  9:10         ` Glauber Costa
2012-10-19  9:31           ` David Rientjes
2012-10-19 10:00             ` Glauber Costa
2012-10-17 22:37   ` David Rientjes
2012-10-18  9:23     ` Glauber Costa
2012-10-18 21:59       ` David Rientjes
2012-10-19 10:08         ` Glauber Costa
2012-10-19 20:34           ` David Rientjes
2012-10-22 12:34             ` Glauber Costa
2012-10-22 12:51               ` Michal Hocko
2012-10-22 12:52                 ` Glauber Costa
2012-10-16 10:16 ` [PATCH v5 07/14] mm: Allocate kernel pages to the right memcg Glauber Costa
2012-10-16 15:31   ` Christoph Lameter
2012-10-16 18:55     ` Glauber Costa
2012-10-17 22:12   ` Andrew Morton
2012-10-18  9:24     ` Glauber Costa
2012-10-18 20:44       ` Andrew Morton
2012-10-18 11:53     ` Glauber Costa
2012-10-17 22:43   ` David Rientjes
2012-10-16 10:16 ` [PATCH v5 08/14] res_counter: return amount of charges after res_counter_uncharge Glauber Costa
2012-10-17 23:23   ` David Rientjes
2012-10-16 10:16 ` [PATCH v5 09/14] memcg: kmem accounting lifecycle management Glauber Costa
2012-10-17 23:28   ` David Rientjes
2012-10-18  6:14     ` Michal Hocko
2012-10-18  9:42     ` Glauber Costa
2012-10-16 10:16 ` [PATCH v5 10/14] memcg: use static branches when code not in use Glauber Costa
2012-10-16 10:16 ` [PATCH v5 11/14] memcg: allow a memcg with kmem charges to be destructed Glauber Costa
2012-10-17 22:12   ` Andrew Morton
2012-10-18  9:33     ` Glauber Costa
2012-10-16 10:16 ` [PATCH v5 12/14] execute the whole memcg freeing in free_worker Glauber Costa
2012-10-17  6:56   ` Kamezawa Hiroyuki
2012-10-16 10:16 ` [PATCH v5 13/14] protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombs Glauber Costa
2012-10-17 22:12   ` Andrew Morton
2012-10-18  9:37     ` Glauber Costa
2012-10-16 10:16 ` [PATCH v5 14/14] Add documentation about the kmem controller Glauber Costa
2012-10-16 12:23   ` Michal Hocko
2012-10-16 18:25   ` Christoph Lameter
2012-10-16 18:55     ` Aristeu Rozanski
2012-10-16 19:02     ` Glauber Costa
2012-10-16 19:30       ` Christoph Lameter
2012-10-17 22:12   ` Andrew Morton
2012-10-18  9:38     ` Glauber Costa
2012-10-17 22:11 ` [PATCH v5 00/14] kmem controller for memcg Andrew Morton
2012-10-18 16:51   ` Glauber Costa
2012-10-18 19:21     ` Andrew Morton [this message]
2012-10-19  9:55       ` Glauber Costa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121018122105.2efc2841.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=devel@openvz.org \
    --cc=glommer@parallels.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).