From: Pekka Enberg <penberg@cs.helsinki.fi>
To: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>, Andi Kleen <andi@firstfloor.org>,
Christoph Lameter <cl@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
haicheng.li@intel.com,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [patch v2] slab: add memory hotplug support
Date: Tue, 30 Mar 2010 12:01:40 +0300 [thread overview]
Message-ID: <84144f021003300201x563c72vb41cc9de359cc7d0@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1003271940190.8399@chino.kir.corp.google.com>
On Sun, Mar 28, 2010 at 5:40 AM, David Rientjes <rientjes@google.com> wrote:
> Slab lacks any memory hotplug support for nodes that are hotplugged
> without cpus being hotplugged. This is possible at least on x86
> CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
> ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
> node. It can also be done manually by writing the start address to
> /sys/devices/system/memory/probe for kernels that have
> CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
> then onlining the new memory region.
>
> When a node is hotadded, a nodelist for that node is allocated and
> initialized for each slab cache. If this isn't completed due to a lack
> of memory, the hotadd is aborted: we have a reasonable expectation that
> kmalloc_node(nid) will work for all caches if nid is online and memory is
> available.
>
> Since nodelists must be allocated and initialized prior to the new node's
> memory actually being online, the struct kmem_list3 is allocated off-node
> due to kmalloc_node()'s fallback.
>
> When an entire node would be offlined, its nodelists are subsequently
> drained. If slab objects still exist and cannot be freed, the offline is
> aborted. It is possible that objects will be allocated between this
> drain and page isolation, so it's still possible that the offline will
> still fail, however.
>
> Signed-off-by: David Rientjes <rientjes@google.com>
Nick, Christoph, lets make a a deal: you ACK, I merge. How does that
sound to you?
> ---
> mm/slab.c | 157 ++++++++++++++++++++++++++++++++++++++++++++++++------------
> 1 files changed, 125 insertions(+), 32 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -115,6 +115,7 @@
> #include <linux/reciprocal_div.h>
> #include <linux/debugobjects.h>
> #include <linux/kmemcheck.h>
> +#include <linux/memory.h>
>
> #include <asm/cacheflush.h>
> #include <asm/tlbflush.h>
> @@ -1102,6 +1103,52 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
> }
> #endif
>
> +/*
> + * Allocates and initializes nodelists for a node on each slab cache, used for
> + * either memory or cpu hotplug. If memory is being hot-added, the kmem_list3
> + * will be allocated off-node since memory is not yet online for the new node.
> + * When hotplugging memory or a cpu, existing nodelists are not replaced if
> + * already in use.
> + *
> + * Must hold cache_chain_mutex.
> + */
> +static int init_cache_nodelists_node(int node)
> +{
> + struct kmem_cache *cachep;
> + struct kmem_list3 *l3;
> + const int memsize = sizeof(struct kmem_list3);
> +
> + list_for_each_entry(cachep, &cache_chain, next) {
> + /*
> + * Set up the size64 kmemlist for cpu before we can
> + * begin anything. Make sure some other cpu on this
> + * node has not already allocated this
> + */
> + if (!cachep->nodelists[node]) {
> + l3 = kmalloc_node(memsize, GFP_KERNEL, node);
> + if (!l3)
> + return -ENOMEM;
> + kmem_list3_init(l3);
> + l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
> + ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
> +
> + /*
> + * The l3s don't come and go as CPUs come and
> + * go. cache_chain_mutex is sufficient
> + * protection here.
> + */
> + cachep->nodelists[node] = l3;
> + }
> +
> + spin_lock_irq(&cachep->nodelists[node]->list_lock);
> + cachep->nodelists[node]->free_limit =
> + (1 + nr_cpus_node(node)) *
> + cachep->batchcount + cachep->num;
> + spin_unlock_irq(&cachep->nodelists[node]->list_lock);
> + }
> + return 0;
> +}
> +
> static void __cpuinit cpuup_canceled(long cpu)
> {
> struct kmem_cache *cachep;
> @@ -1172,7 +1219,7 @@ static int __cpuinit cpuup_prepare(long cpu)
> struct kmem_cache *cachep;
> struct kmem_list3 *l3 = NULL;
> int node = cpu_to_node(cpu);
> - const int memsize = sizeof(struct kmem_list3);
> + int err;
>
> /*
> * We need to do this right in the beginning since
> @@ -1180,35 +1227,9 @@ static int __cpuinit cpuup_prepare(long cpu)
> * kmalloc_node allows us to add the slab to the right
> * kmem_list3 and not this cpu's kmem_list3
> */
> -
> - list_for_each_entry(cachep, &cache_chain, next) {
> - /*
> - * Set up the size64 kmemlist for cpu before we can
> - * begin anything. Make sure some other cpu on this
> - * node has not already allocated this
> - */
> - if (!cachep->nodelists[node]) {
> - l3 = kmalloc_node(memsize, GFP_KERNEL, node);
> - if (!l3)
> - goto bad;
> - kmem_list3_init(l3);
> - l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
> - ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
> -
> - /*
> - * The l3s don't come and go as CPUs come and
> - * go. cache_chain_mutex is sufficient
> - * protection here.
> - */
> - cachep->nodelists[node] = l3;
> - }
> -
> - spin_lock_irq(&cachep->nodelists[node]->list_lock);
> - cachep->nodelists[node]->free_limit =
> - (1 + nr_cpus_node(node)) *
> - cachep->batchcount + cachep->num;
> - spin_unlock_irq(&cachep->nodelists[node]->list_lock);
> - }
> + err = init_cache_nodelists_node(node);
> + if (err < 0)
> + goto bad;
>
> /*
> * Now we can go ahead with allocating the shared arrays and
> @@ -1331,11 +1352,75 @@ static struct notifier_block __cpuinitdata cpucache_notifier = {
> &cpuup_callback, NULL, 0
> };
>
> +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> +/*
> + * Drains freelist for a node on each slab cache, used for memory hot-remove.
> + * Returns -EBUSY if all objects cannot be drained so that the node is not
> + * removed.
> + *
> + * Must hold cache_chain_mutex.
> + */
> +static int __meminit drain_cache_nodelists_node(int node)
> +{
> + struct kmem_cache *cachep;
> + int ret = 0;
> +
> + list_for_each_entry(cachep, &cache_chain, next) {
> + struct kmem_list3 *l3;
> +
> + l3 = cachep->nodelists[node];
> + if (!l3)
> + continue;
> +
> + drain_freelist(cachep, l3, l3->free_objects);
> +
> + if (!list_empty(&l3->slabs_full) ||
> + !list_empty(&l3->slabs_partial)) {
> + ret = -EBUSY;
> + break;
> + }
> + }
> + return ret;
> +}
> +
> +static int __meminit slab_memory_callback(struct notifier_block *self,
> + unsigned long action, void *arg)
> +{
> + struct memory_notify *mnb = arg;
> + int ret = 0;
> + int nid;
> +
> + nid = mnb->status_change_nid;
> + if (nid < 0)
> + goto out;
> +
> + switch (action) {
> + case MEM_GOING_ONLINE:
> + mutex_lock(&cache_chain_mutex);
> + ret = init_cache_nodelists_node(nid);
> + mutex_unlock(&cache_chain_mutex);
> + break;
> + case MEM_GOING_OFFLINE:
> + mutex_lock(&cache_chain_mutex);
> + ret = drain_cache_nodelists_node(nid);
> + mutex_unlock(&cache_chain_mutex);
> + break;
> + case MEM_ONLINE:
> + case MEM_OFFLINE:
> + case MEM_CANCEL_ONLINE:
> + case MEM_CANCEL_OFFLINE:
> + break;
> + }
> +out:
> + return ret ? notifier_from_errno(ret) : NOTIFY_OK;
> +}
> +#endif /* CONFIG_NUMA && CONFIG_MEMORY_HOTPLUG */
> +
> /*
> * swap the static kmem_list3 with kmalloced memory
> */
> -static void init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
> - int nodeid)
> +static void __init init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
> + int nodeid)
> {
> struct kmem_list3 *ptr;
>
> @@ -1580,6 +1665,14 @@ void __init kmem_cache_init_late(void)
> */
> register_cpu_notifier(&cpucache_notifier);
>
> +#ifdef CONFIG_NUMA
> + /*
> + * Register a memory hotplug callback that initializes and frees
> + * nodelists.
> + */
> + hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> +#endif
> +
> /*
> * The reap timers are started later, with a module init call: That part
> * of the kernel is not yet operational.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-03-30 9:01 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-11 20:53 [PATCH] [0/4] Update slab memory hotplug series Andi Kleen
2010-02-11 20:54 ` [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2 Andi Kleen
2010-02-11 21:41 ` David Rientjes
2010-02-11 21:55 ` Andi Kleen
2010-02-15 6:04 ` Nick Piggin
2010-02-15 10:07 ` Andi Kleen
2010-02-15 10:22 ` Nick Piggin
2010-02-11 20:54 ` [PATCH] [2/4] SLAB: Separate node initialization into separate function Andi Kleen
2010-02-11 21:44 ` David Rientjes
2010-02-11 20:54 ` [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2 Andi Kleen
2010-02-11 21:45 ` David Rientjes
2010-02-15 6:06 ` Nick Piggin
2010-02-15 21:47 ` David Rientjes
2010-02-16 14:04 ` Nick Piggin
2010-02-16 20:45 ` Pekka Enberg
2010-02-11 20:54 ` [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap Andi Kleen
2010-02-11 21:45 ` David Rientjes
2010-02-15 6:15 ` Nick Piggin
2010-02-15 10:32 ` Andi Kleen
2010-02-15 10:41 ` Nick Piggin
2010-02-15 10:52 ` Andi Kleen
2010-02-15 11:01 ` Nick Piggin
2010-02-15 15:30 ` Andi Kleen
2010-02-19 18:22 ` Christoph Lameter
2010-02-20 9:01 ` Andi Kleen
2010-02-22 10:53 ` Pekka Enberg
2010-02-22 14:31 ` Andi Kleen
2010-02-22 16:11 ` Pekka Enberg
2010-02-22 20:20 ` Andi Kleen
2010-02-24 15:49 ` Christoph Lameter
2010-02-25 7:26 ` Pekka Enberg
2010-02-25 8:01 ` David Rientjes
2010-02-25 18:30 ` Christoph Lameter
2010-02-25 21:45 ` David Rientjes
2010-02-25 22:31 ` Christoph Lameter
2010-02-26 10:45 ` Pekka Enberg
2010-02-26 11:43 ` Andi Kleen
2010-02-26 12:35 ` Pekka Enberg
2010-02-26 14:08 ` Andi Kleen
2010-02-26 1:09 ` KAMEZAWA Hiroyuki
2010-02-26 11:41 ` Andi Kleen
2010-02-26 15:04 ` Christoph Lameter
2010-02-26 15:05 ` Christoph Lameter
2010-02-26 15:59 ` Andi Kleen
2010-02-26 15:57 ` Andi Kleen
2010-02-26 17:24 ` Christoph Lameter
2010-02-26 17:31 ` Andi Kleen
2010-03-01 1:59 ` KAMEZAWA Hiroyuki
2010-03-01 10:27 ` David Rientjes
2010-02-27 0:01 ` David Rientjes
2010-03-01 10:24 ` [patch] slab: add memory hotplug support David Rientjes
2010-03-02 5:53 ` Pekka Enberg
2010-03-02 20:20 ` Christoph Lameter
2010-03-02 21:03 ` David Rientjes
2010-03-03 1:28 ` KAMEZAWA Hiroyuki
2010-03-03 2:39 ` David Rientjes
2010-03-03 2:51 ` KAMEZAWA Hiroyuki
2010-03-02 12:53 ` Andi Kleen
2010-03-02 15:04 ` Pekka Enberg
2010-03-03 14:34 ` Andi Kleen
2010-03-03 15:46 ` Christoph Lameter
2010-03-02 21:17 ` David Rientjes
2010-03-05 6:20 ` Nick Piggin
2010-03-05 12:47 ` Anca Emanuel
2010-03-05 13:58 ` Anca Emanuel
2010-03-05 14:11 ` Christoph Lameter
2010-03-08 3:06 ` Andi Kleen
2010-03-08 2:58 ` Andi Kleen
2010-03-08 23:19 ` David Rientjes
2010-03-09 13:46 ` Nick Piggin
2010-03-22 17:28 ` Pekka Enberg
2010-03-22 21:12 ` Nick Piggin
2010-03-28 2:13 ` David Rientjes
2010-03-28 2:40 ` [patch v2] " David Rientjes
2010-03-30 9:01 ` Pekka Enberg [this message]
2010-03-30 16:43 ` Christoph Lameter
2010-04-04 20:45 ` David Rientjes
2010-04-07 16:29 ` Pekka Enberg
2010-02-25 18:34 ` [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap Christoph Lameter
2010-02-25 18:46 ` Pekka Enberg
2010-02-25 19:19 ` Christoph Lameter
2010-03-02 12:55 ` Andi Kleen
2010-02-19 18:22 ` Christoph Lameter
2010-02-22 10:57 ` Pekka Enberg
2010-02-13 10:24 ` [PATCH] [0/4] Update slab memory hotplug series Pekka Enberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=84144f021003300201x563c72vb41cc9de359cc7d0@mail.gmail.com \
--to=penberg@cs.helsinki.fi \
--cc=andi@firstfloor.org \
--cc=cl@linux-foundation.org \
--cc=haicheng.li@intel.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).