linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Balbir Singh <bsingharora@gmail.com>
Cc: linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
	Tejun Heo <tj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: [mm v2 0/3] Support memory cgroup hotplug
Date: Wed, 23 Nov 2016 10:28:31 +0100	[thread overview]
Message-ID: <20161123092830.GH2864@dhcp22.suse.cz> (raw)
In-Reply-To: <61dc32fd-2802-6deb-24cf-fa11b5b31532@gmail.com>

On Wed 23-11-16 19:37:16, Balbir Singh wrote:
> 
> 
> On 23/11/16 19:07, Michal Hocko wrote:
> > On Wed 23-11-16 18:50:42, Balbir Singh wrote:
> >>
> >>
> >> On 23/11/16 18:25, Michal Hocko wrote:
> >>> On Wed 23-11-16 15:36:51, Balbir Singh wrote:
> >>>> In the absence of hotplug we use extra memory proportional to
> >>>> (possible_nodes - online_nodes) * number_of_cgroups. PPC64 has a patch
> >>>> to disable large consumption with large number of cgroups. This patch
> >>>> adds hotplug support to memory cgroups and reverts the commit that
> >>>> limited possible nodes to online nodes.
> >>>
> >>> Balbir,
> >>> I have asked this in the previous version but there still seems to be a
> >>> lack of information of _why_ do we want this, _how_ much do we save on
> >>> the memory overhead on most systems and _why_ the additional complexity
> >>> is really worth it. Please make sure to add all this in the cover
> >>> letter.
> >>>
> >>
> >> The data is in the patch referred to in patch 3. The order of waste was
> >> 200MB for 400 cgroup directories enough for us to restrict possible_map
> >> to online_map. These patches allow us to have a larger possible map and
> >> allow onlining nodes not in the online_map, which is currently a restriction
> >> on ppc64.
> > 
> > How common is to have possible_map >> online_map? If this is ppc64 then
> > what is the downside of keeping the current restriction instead?
> > 
> 
> On my system CONFIG_NODE_SHIFT is 8, 256 nodes and possible_nodes are 2
> The downside is the ability to hotplug and online an offline node.
> Please see http://www.spinics.net/lists/linux-mm/msg116724.html

OK, so we are slowly getting to what I've asked originally ;) So who
cares? Depending on CONFIG_NODE_SHIFT (which tends to be quite large in
distribution or other general purpose kernels) the overhead is 424B (as
per pahole on the current kernel) for one numa node. Most machines are
to be expected 1-4 numa nodes so the overhead might be somewhere around
100K per memcg (with 256 possible nodes). Not trivial amount for sure
but I would rather encourage people to lower the possible node count for
their hardware if it is artificially large.

> >> A typical system that I use has about 100-150 directories, depending on the
> >> number of users/docker instances/configuration/virtual machines. These numbers
> >> will only grow as we pack more of these instances on them.
> >>
> >> From a complexity view point, the patches are quite straight forward.
> > 
> > Well, I would like to hear more about that. {get,put}_online_memory
> > at random places doesn't sound all that straightforward to me.
> > 
> 
> I thought those places were not random :) I tried to think them out as
> discussed with Vladimir. I don't claim the code is bug free, we can fix
> any bugs as we test this more.

I am more worried about synchronization with the hotplug which tends to
be a PITA in places were we were simply safe by definition until now. We
do not have all that many users of memcg->nodeinfo[nid] from what I can see
but are all of them safe to never race with the hotplug. A lack of
highlevel design description is less than encouraging. So please try to
spend some time describing how do we use nodeinfo currently and how is
the synchronization with the hotplug supposed to work and what
guarantees that no stale nodinfos can be ever used. This is just too
easy to get wrong...
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2016-11-23  9:28 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-23  4:36 [mm v2 0/3] Support memory cgroup hotplug Balbir Singh
2016-11-23  4:36 ` [mm v2 1/3] mm: Add basic infrastructure for memcg hotplug support Balbir Singh
2016-11-23  4:36 ` [mm v2 2/3] mm: Move operations to hotplug callbacks Balbir Singh
2016-11-23  4:36 ` [mm v2 3/3] powerpc/mm: fix node_possible_map limitations Balbir Singh
2016-11-23  7:25 ` [mm v2 0/3] Support memory cgroup hotplug Michal Hocko
2016-11-23  7:50   ` Balbir Singh
2016-11-23  8:07     ` Michal Hocko
2016-11-23  8:37       ` Balbir Singh
2016-11-23  9:28         ` Michal Hocko [this message]
2016-11-23 13:05           ` Balbir Singh
2016-11-23 13:22             ` Michal Hocko
2016-11-28 21:10             ` Tejun Heo
2016-11-29  0:09               ` Balbir Singh
2016-11-29  0:42                 ` Tejun Heo
2016-11-29  4:57                   ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161123092830.GH2864@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).