From: Vladimir Davydov <vdavydov@tarantool.org>
To: Balbir Singh <bsingharora@gmail.com>
Cc: mpe@ellerman.id.au, hannes@cmpxchg.org, mhocko@kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org,
Tejun Heo <tj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RESEND] [PATCH v1 1/3] Add basic infrastructure for memcg hotplug support
Date: Wed, 16 Nov 2016 12:01:29 +0300 [thread overview]
Message-ID: <20161116090129.GA18225@esperanza> (raw)
In-Reply-To: <1479253501-26261-2-git-send-email-bsingharora@gmail.com>
Hello,
On Wed, Nov 16, 2016 at 10:44:59AM +1100, Balbir Singh wrote:
> The lack of hotplug support makes us allocate all memory
> upfront for per node data structures. With large number
> of cgroups this can be an overhead. PPC64 actually limits
> n_possible nodes to n_online to avoid some of this overhead.
>
> This patch adds the basic notifiers to listen to hotplug
> events and does the allocation and free of those structures
> per cgroup. We walk every cgroup per event, its a trade-off
> of allocating upfront vs allocating on demand and freeing
> on offline.
>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
>
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> ---
> mm/memcontrol.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 60 insertions(+), 8 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 91dfc7c..5585fce 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -63,6 +63,7 @@
> #include <linux/lockdep.h>
> #include <linux/file.h>
> #include <linux/tracehook.h>
> +#include <linux/memory.h>
> #include "internal.h"
> #include <net/sock.h>
> #include <net/ip.h>
> @@ -1342,6 +1343,10 @@ int mem_cgroup_select_victim_node(struct mem_cgroup *memcg)
> {
> return 0;
> }
> +
> +static void mem_cgroup_may_update_nodemask(struct mem_cgroup *memcg)
> +{
> +}
> #endif
>
> static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg,
> @@ -4115,14 +4120,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> {
> struct mem_cgroup_per_node *pn;
> int tmp = node;
> - /*
> - * This routine is called against possible nodes.
> - * But it's BUG to call kmalloc() against offline node.
> - *
> - * TODO: this routine can waste much memory for nodes which will
> - * never be onlined. It's better to use memory hotplug callback
> - * function.
> - */
> +
> if (!node_state(node, N_NORMAL_MEMORY))
> tmp = -1;
> pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
> @@ -5773,6 +5771,59 @@ static int __init cgroup_memory(char *s)
> }
> __setup("cgroup.memory=", cgroup_memory);
>
> +static void memcg_node_offline(int node)
> +{
> + struct mem_cgroup *memcg;
> +
> + if (node < 0)
> + return;
Is this possible?
> +
> + for_each_mem_cgroup(memcg) {
> + free_mem_cgroup_per_node_info(memcg, node);
> + mem_cgroup_may_update_nodemask(memcg);
If memcg->numainfo_events is 0, mem_cgroup_may_update_nodemask() won't
update memcg->scan_nodes. Is it OK?
> + }
What if a memory cgroup is created or destroyed while you're walking the
tree? Should we probably use get_online_mems() in mem_cgroup_alloc() to
avoid that?
> +}
> +
> +static void memcg_node_online(int node)
> +{
> + struct mem_cgroup *memcg;
> +
> + if (node < 0)
> + return;
> +
> + for_each_mem_cgroup(memcg) {
> + alloc_mem_cgroup_per_node_info(memcg, node);
> + mem_cgroup_may_update_nodemask(memcg);
> + }
> +}
> +
> +static int memcg_memory_hotplug_callback(struct notifier_block *self,
> + unsigned long action, void *arg)
> +{
> + struct memory_notify *marg = arg;
> + int node = marg->status_change_nid;
> +
> + switch (action) {
> + case MEM_GOING_OFFLINE:
> + case MEM_CANCEL_ONLINE:
> + memcg_node_offline(node);
Judging by __offline_pages(), the MEM_GOING_OFFLINE event is emitted
before migrating pages off the node. So, I guess freeing per-node info
here isn't quite correct, as pages still need it to be moved from the
node's LRU lists. Better move it to MEM_OFFLINE?
> + break;
> + case MEM_GOING_ONLINE:
> + case MEM_CANCEL_OFFLINE:
> + memcg_node_online(node);
> + break;
> + case MEM_ONLINE:
> + case MEM_OFFLINE:
> + break;
> + }
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block memcg_memory_hotplug_nb __meminitdata = {
> + .notifier_call = memcg_memory_hotplug_callback,
> + .priority = IPC_CALLBACK_PRI,
I wonder why you chose this priority?
> +};
> +
> /*
> * subsys_initcall() for memory controller.
> *
> @@ -5797,6 +5848,7 @@ static int __init mem_cgroup_init(void)
> #endif
>
> hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
> + register_hotmemory_notifier(&memcg_memory_hotplug_nb);
>
> for_each_possible_cpu(cpu)
> INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work,
I guess, we should modify mem_cgroup_alloc/free() in the scope of this
patch, otherwise it doesn't make much sense IMHO. May be, it's even
worth merging patches 1 and 2 altogether.
Thanks,
Vladimir
WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@tarantool.org>
To: Balbir Singh <bsingharora@gmail.com>
Cc: mpe@ellerman.id.au, hannes@cmpxchg.org, mhocko@kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org,
Tejun Heo <tj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RESEND] [PATCH v1 1/3] Add basic infrastructure for memcg hotplug support
Date: Wed, 16 Nov 2016 12:01:29 +0300 [thread overview]
Message-ID: <20161116090129.GA18225@esperanza> (raw)
In-Reply-To: <1479253501-26261-2-git-send-email-bsingharora@gmail.com>
Hello,
On Wed, Nov 16, 2016 at 10:44:59AM +1100, Balbir Singh wrote:
> The lack of hotplug support makes us allocate all memory
> upfront for per node data structures. With large number
> of cgroups this can be an overhead. PPC64 actually limits
> n_possible nodes to n_online to avoid some of this overhead.
>
> This patch adds the basic notifiers to listen to hotplug
> events and does the allocation and free of those structures
> per cgroup. We walk every cgroup per event, its a trade-off
> of allocating upfront vs allocating on demand and freeing
> on offline.
>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
>
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> ---
> mm/memcontrol.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 60 insertions(+), 8 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 91dfc7c..5585fce 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -63,6 +63,7 @@
> #include <linux/lockdep.h>
> #include <linux/file.h>
> #include <linux/tracehook.h>
> +#include <linux/memory.h>
> #include "internal.h"
> #include <net/sock.h>
> #include <net/ip.h>
> @@ -1342,6 +1343,10 @@ int mem_cgroup_select_victim_node(struct mem_cgroup *memcg)
> {
> return 0;
> }
> +
> +static void mem_cgroup_may_update_nodemask(struct mem_cgroup *memcg)
> +{
> +}
> #endif
>
> static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg,
> @@ -4115,14 +4120,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> {
> struct mem_cgroup_per_node *pn;
> int tmp = node;
> - /*
> - * This routine is called against possible nodes.
> - * But it's BUG to call kmalloc() against offline node.
> - *
> - * TODO: this routine can waste much memory for nodes which will
> - * never be onlined. It's better to use memory hotplug callback
> - * function.
> - */
> +
> if (!node_state(node, N_NORMAL_MEMORY))
> tmp = -1;
> pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
> @@ -5773,6 +5771,59 @@ static int __init cgroup_memory(char *s)
> }
> __setup("cgroup.memory=", cgroup_memory);
>
> +static void memcg_node_offline(int node)
> +{
> + struct mem_cgroup *memcg;
> +
> + if (node < 0)
> + return;
Is this possible?
> +
> + for_each_mem_cgroup(memcg) {
> + free_mem_cgroup_per_node_info(memcg, node);
> + mem_cgroup_may_update_nodemask(memcg);
If memcg->numainfo_events is 0, mem_cgroup_may_update_nodemask() won't
update memcg->scan_nodes. Is it OK?
> + }
What if a memory cgroup is created or destroyed while you're walking the
tree? Should we probably use get_online_mems() in mem_cgroup_alloc() to
avoid that?
> +}
> +
> +static void memcg_node_online(int node)
> +{
> + struct mem_cgroup *memcg;
> +
> + if (node < 0)
> + return;
> +
> + for_each_mem_cgroup(memcg) {
> + alloc_mem_cgroup_per_node_info(memcg, node);
> + mem_cgroup_may_update_nodemask(memcg);
> + }
> +}
> +
> +static int memcg_memory_hotplug_callback(struct notifier_block *self,
> + unsigned long action, void *arg)
> +{
> + struct memory_notify *marg = arg;
> + int node = marg->status_change_nid;
> +
> + switch (action) {
> + case MEM_GOING_OFFLINE:
> + case MEM_CANCEL_ONLINE:
> + memcg_node_offline(node);
Judging by __offline_pages(), the MEM_GOING_OFFLINE event is emitted
before migrating pages off the node. So, I guess freeing per-node info
here isn't quite correct, as pages still need it to be moved from the
node's LRU lists. Better move it to MEM_OFFLINE?
> + break;
> + case MEM_GOING_ONLINE:
> + case MEM_CANCEL_OFFLINE:
> + memcg_node_online(node);
> + break;
> + case MEM_ONLINE:
> + case MEM_OFFLINE:
> + break;
> + }
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block memcg_memory_hotplug_nb __meminitdata = {
> + .notifier_call = memcg_memory_hotplug_callback,
> + .priority = IPC_CALLBACK_PRI,
I wonder why you chose this priority?
> +};
> +
> /*
> * subsys_initcall() for memory controller.
> *
> @@ -5797,6 +5848,7 @@ static int __init mem_cgroup_init(void)
> #endif
>
> hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
> + register_hotmemory_notifier(&memcg_memory_hotplug_nb);
>
> for_each_possible_cpu(cpu)
> INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work,
I guess, we should modify mem_cgroup_alloc/free() in the scope of this
patch, otherwise it doesn't make much sense IMHO. May be, it's even
worth merging patches 1 and 2 altogether.
Thanks,
Vladimir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-11-16 9:01 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-15 23:44 [RESEND][v1 0/3] Support memory cgroup hotplug Balbir Singh
2016-11-15 23:44 ` Balbir Singh
2016-11-15 23:44 ` [RESEND] [PATCH v1 1/3] Add basic infrastructure for memcg hotplug support Balbir Singh
2016-11-15 23:44 ` Balbir Singh
2016-11-16 9:01 ` Vladimir Davydov [this message]
2016-11-16 9:01 ` Vladimir Davydov
2016-11-17 0:28 ` Balbir Singh
2016-11-17 0:28 ` Balbir Singh
2016-11-21 8:36 ` Vladimir Davydov
2016-11-21 8:36 ` Vladimir Davydov
2016-11-22 0:17 ` Balbir Singh
2016-11-22 0:17 ` Balbir Singh
2016-11-15 23:45 ` [RESEND] [PATCH v1 2/3] Move from all possible nodes to online nodes Balbir Singh
2016-11-15 23:45 ` Balbir Singh
2016-11-15 23:45 ` [RESEND] [PATCH v1 3/3] powerpc: fix node_possible_map limitations Balbir Singh
2016-11-15 23:45 ` Balbir Singh
2016-11-16 16:40 ` Reza Arbab
2016-11-16 16:40 ` Reza Arbab
2016-11-16 16:45 ` [PATCH] powerpc/mm: allow memory hotplug into an offline node Reza Arbab
2016-11-16 16:45 ` Reza Arbab
2017-02-01 1:05 ` Michael Ellerman
2017-02-01 1:05 ` Michael Ellerman
2016-11-21 14:03 ` [RESEND][v1 0/3] Support memory cgroup hotplug Michal Hocko
2016-11-21 14:03 ` Michal Hocko
2016-11-22 0:16 ` Balbir Singh
2016-11-22 0:16 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161116090129.GA18225@esperanza \
--to=vdavydov@tarantool.org \
--cc=akpm@linux-foundation.org \
--cc=bsingharora@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mhocko@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.