public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Glauber Costa <glommer@parallels.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<kamezawa.hiroyu@jp.fujitsu.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
	Michal Hocko <mhocko@suse.cz>, Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Suleiman Souhlal <suleiman@google.com>
Subject: Re: [PATCH v6 25/29] memcg/sl[au]b: shrink dead caches
Date: Wed, 7 Nov 2012 10:22:17 +0100	[thread overview]
Message-ID: <509A2849.9090509@parallels.com> (raw)
In-Reply-To: <20121106231627.3610c908.akpm@linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 3181 bytes --]

On 11/07/2012 08:16 AM, Andrew Morton wrote:
> On Wed, 7 Nov 2012 08:13:08 +0100 Glauber Costa <glommer@parallels.com> wrote:
> 
>> On 11/06/2012 01:48 AM, Andrew Morton wrote:
>>> On Thu,  1 Nov 2012 16:07:41 +0400
>>> Glauber Costa <glommer@parallels.com> wrote:
>>>
>>>> This means that when we destroy a memcg cache that happened to be empty,
>>>> those caches may take a lot of time to go away: removing the memcg
>>>> reference won't destroy them - because there are pending references, and
>>>> the empty pages will stay there, until a shrinker is called upon for any
>>>> reason.
>>>>
>>>> In this patch, we will call kmem_cache_shrink for all dead caches that
>>>> cannot be destroyed because of remaining pages. After shrinking, it is
>>>> possible that it could be freed. If this is not the case, we'll schedule
>>>> a lazy worker to keep trying.
>>>
>>> This patch is really quite nasty.  We poll the cache once per minute
>>> trying to shrink then free it?  a) it gives rise to concerns that there
>>> will be scenarios where the system could suffer unlimited memory windup
>>> but mainly b) it's just lame.
>>>
>>> The kernel doesn't do this sort of thing.  The kernel tries to be
>>> precise: in a situation like this we keep track of the number of
>>> outstanding objects and when that falls to zero, we free their
>>> container synchronously.  If those objects are normally left floating
>>> around in an allocated but reclaimable state then we can address that
>>> by synchronously freeing them if their container has been destroyed.
>>>
>>> Or something like that.  If it's something else then fine, but not this.
>>>
>>> What do we need to do to fix this?
>>>
>> The original patch had a unlikely() test in the free path, conditional
>> on whether or not the cache is dead, that would then call this is the
>> cache would now be empty.
>>
>> I got several requests to remove it and change it to something like
>> this, because that is a fast path (I myself think an unlikely branch is
>> not that bad)
>>
>> If you think such a test is acceptable, I can bring it back and argue in
>> the basis of "akpm made me do it!". But meanwhile I will give this extra
>> though to see if there is any alternative way I can do it...
> 
> OK, thanks, please do take a look at it.
> 
> I'd be interested in seeing the old version of the patch which had this
> test-n-branch.  Perhaps there's some trick we can pull to lessen its cost.
> 
Attached.

This is the last version that used it (well, I believe it is). There is
other unrelated things in this patch, that I got rid of. Look for
kmem_cache_verify_dead().

In a summary, all calls to the free function would as a last step do:
kmem_cache_verify_dead() that would either be an empty placeholder, or:

+static inline void kmem_cache_verify_dead(struct kmem_cache *s)
+{
+       if (unlikely(s->memcg_params.dead))
+               schedule_work(&s->memcg_params.cache_shrinker);
+}


cache_shrinker got changed to the destroy worker. So if we are freeing
an object from a cache that is dead, we try to schedule a worker that
will eventually call kmem_cache_srhink(), and hopefully
kmem_cache_destroy() - if last object.


[-- Attachment #2: 0015-memcg-sl-au-b-shrink-dead-caches.patch --]
[-- Type: text/x-patch, Size: 6815 bytes --]

>From c99404a760fa69e8ccda0ff4b2636c6abd1ac990 Mon Sep 17 00:00:00 2001
From: Glauber Costa <glommer@parallels.com>
Date: Thu, 3 May 2012 13:33:03 -0300
Subject: [PATCH v3 15/16] memcg/sl[au]b: shrink dead caches

In the slub allocator, when the last object of a page goes away, we
don't necessarily free it - there is not necessarily a test for empty
page in any slab_free path.

This means that when we destroy a memcg cache that happened to be empty,
those caches may take a lot of time to go away: removing the memcg
reference won't destroy them - because there are pending references, and
the empty pages will stay there, until a shrinker is called upon for any
reason.

This patch marks all memcg caches as dead. kmem_cache_shrink is called
for the ones who are not yet dead - this will force internal cache
reorganization, and then all references to empty pages will be removed.

An unlikely branch is used to make sure this case does not affect
performance in the usual slab_free path.

The slab allocator has a time based reaper that would eventually get rid
of the objects, but we can also call it explicitly, since dead caches
are not a likely event.

[ v2: also call verify_dead for the slab ]

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Christoph Lameter <cl@linux.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Michal Hocko <mhocko@suse.cz>
CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Suleiman Souhlal <suleiman@google.com>
---
 include/linux/slab.h |  3 +++
 mm/memcontrol.c      | 44 +++++++++++++++++++++++++++++++++++++++++++-
 mm/slab.c            |  2 ++
 mm/slab.h            | 10 ++++++++++
 mm/slub.c            |  1 +
 5 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 9badb8c..765e12c 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -182,6 +182,8 @@ unsigned int kmem_cache_size(struct kmem_cache *);
 #endif
 
 #ifdef CONFIG_MEMCG_KMEM
+#include <linux/workqueue.h>
+
 struct mem_cgroup_cache_params {
 	struct mem_cgroup *memcg;
 	struct kmem_cache *parent;
@@ -190,6 +192,7 @@ struct mem_cgroup_cache_params {
 	atomic_t nr_pages;
 	struct list_head destroyed_list; /* Used when deleting memcg cache */
 	struct list_head sibling_list;
+	struct work_struct cache_shrinker;
 };
 #endif
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index da38652..c0cf564 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -578,7 +578,7 @@ static char *memcg_cache_name(struct mem_cgroup *memcg, struct kmem_cache *cache
 
 	BUG_ON(dentry == NULL);
 
-	name = kasprintf(GFP_KERNEL, "%s(%d:%s)",
+	name = kasprintf(GFP_KERNEL, "%s(%d:%s)dead",
 	    cachep->name, css_id(&memcg->css), dentry->d_name.name);
 
 	return name;
@@ -739,12 +739,25 @@ static void disarm_kmem_keys(struct mem_cgroup *memcg)
 	WARN_ON(res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0);
 }
 
+static void cache_shrinker_work_func(struct work_struct *work)
+{
+	struct mem_cgroup_cache_params *params;
+	struct kmem_cache *cachep;
+
+	params = container_of(work, struct mem_cgroup_cache_params,
+			      cache_shrinker);
+	cachep = container_of(params, struct kmem_cache, memcg_params);
+
+	kmem_cache_shrink(cachep);
+}
+
 static DEFINE_MUTEX(memcg_cache_mutex);
 static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
 						  struct kmem_cache *cachep)
 {
 	struct kmem_cache *new_cachep;
 	int idx;
+	char *name;
 
 	BUG_ON(!memcg_can_account_kmem(memcg));
 
@@ -764,10 +777,21 @@ static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
 		goto out;
 	}
 
+	/*
+	 * Because the cache is expected to duplicate the string,
+	 * we must make sure it has opportunity to copy its full
+	 * name. Only now we can remove the dead part from it
+	 */
+	name = (char *)new_cachep->name;
+	if (name)
+		name[strlen(name) - 4] = '\0';
+
 	mem_cgroup_get(memcg);
 	memcg->slabs[idx] = new_cachep;
 	new_cachep->memcg_params.memcg = memcg;
 	atomic_set(&new_cachep->memcg_params.nr_pages , 0);
+	INIT_WORK(&new_cachep->memcg_params.cache_shrinker,
+		  cache_shrinker_work_func);
 out:
 	mutex_unlock(&memcg_cache_mutex);
 	return new_cachep;
@@ -790,6 +814,21 @@ static void kmem_cache_destroy_work_func(struct work_struct *w)
 	struct mem_cgroup_cache_params *p, *tmp;
 	unsigned long flags;
 	LIST_HEAD(del_unlocked);
+	LIST_HEAD(shrinkers);
+
+	spin_lock_irqsave(&cache_queue_lock, flags);
+	list_for_each_entry_safe(p, tmp, &destroyed_caches, destroyed_list) {
+		cachep = container_of(p, struct kmem_cache, memcg_params);
+		if (atomic_read(&cachep->memcg_params.nr_pages) != 0)
+			list_move(&cachep->memcg_params.destroyed_list, &shrinkers);
+	}
+	spin_unlock_irqrestore(&cache_queue_lock, flags);
+
+	list_for_each_entry_safe(p, tmp, &shrinkers, destroyed_list) {
+		cachep = container_of(p, struct kmem_cache, memcg_params);
+		list_del(&cachep->memcg_params.destroyed_list);
+		kmem_cache_shrink(cachep);
+	}
 
 	spin_lock_irqsave(&cache_queue_lock, flags);
 	list_for_each_entry_safe(p, tmp, &destroyed_caches, destroyed_list) {
@@ -867,11 +906,14 @@ static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
 
 	spin_lock_irqsave(&cache_queue_lock, flags);
 	for (i = 0; i < MAX_KMEM_CACHE_TYPES; i++) {
+		char *name;
 		cachep = memcg->slabs[i];
 		if (!cachep)
 			continue;
 
 		cachep->memcg_params.dead = true;
+		name = (char *)cachep->name;
+		name[strlen(name)] = 'd';
 		__mem_cgroup_destroy_cache(cachep);
 	}
 	spin_unlock_irqrestore(&cache_queue_lock, flags);
diff --git a/mm/slab.c b/mm/slab.c
index bd9928f..6cb4abf 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3785,6 +3785,8 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp,
 	}
 
 	ac_put_obj(cachep, ac, objp);
+
+	kmem_cache_verify_dead(cachep);
 }
 
 /**
diff --git a/mm/slab.h b/mm/slab.h
index 6024ad1..d21b982 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -80,6 +80,12 @@ static inline bool slab_equal_or_parent(struct kmem_cache *s,
 {
 	return (p == s) || (p == s->memcg_params.parent);
 }
+
+static inline void kmem_cache_verify_dead(struct kmem_cache *s)
+{
+	if (unlikely(s->memcg_params.dead))
+		schedule_work(&s->memcg_params.cache_shrinker);
+}
 #else
 static inline bool cache_match_memcg(struct kmem_cache *cachep,
 				     struct mem_cgroup *memcg)
@@ -100,5 +106,9 @@ static inline bool slab_equal_or_parent(struct kmem_cache *s,
 {
 	return true;
 }
+
+static inline void kmem_cache_verify_dead(struct kmem_cache *s)
+{
+}
 #endif
 #endif
diff --git a/mm/slub.c b/mm/slub.c
index 0b68d15..9d79216 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2602,6 +2602,7 @@ redo:
 	} else
 		__slab_free(s, page, x, addr);
 
+	kmem_cache_verify_dead(s);
 }
 
 void kmem_cache_free(struct kmem_cache *s, void *x)
-- 
1.7.11.4


  reply	other threads:[~2012-11-07  9:22 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-01 12:07 [PATCH v6 00/29] kmem controller for memcg Glauber Costa
2012-11-01 12:07 ` [PATCH v6 01/29] memcg: Make it possible to use the stock for more than one page Glauber Costa
2012-11-01 12:07 ` [PATCH v6 02/29] memcg: Reclaim when more than one page needed Glauber Costa
2012-11-01 12:07 ` [PATCH v6 03/29] memcg: change defines to an enum Glauber Costa
2012-11-01 12:07 ` [PATCH v6 04/29] kmem accounting basic infrastructure Glauber Costa
2012-11-01 12:07 ` [PATCH v6 05/29] Add a __GFP_KMEMCG flag Glauber Costa
2012-11-01 19:58   ` Christoph Lameter
2012-11-01 12:07 ` [PATCH v6 06/29] memcg: kmem controller infrastructure Glauber Costa
2012-11-01 20:03   ` Christoph Lameter
2012-11-01 12:07 ` [PATCH v6 07/29] mm: Allocate kernel pages to the right memcg Glauber Costa
2012-11-01 12:07 ` [PATCH v6 08/29] res_counter: return amount of charges after res_counter_uncharge Glauber Costa
2012-11-01 12:07 ` [PATCH v6 09/29] memcg: kmem accounting lifecycle management Glauber Costa
2012-11-01 12:07 ` [PATCH v6 10/29] memcg: use static branches when code not in use Glauber Costa
2012-11-01 12:07 ` [PATCH v6 11/29] memcg: allow a memcg with kmem charges to be destructed Glauber Costa
2012-11-02  0:05   ` Andrew Morton
2012-11-02  7:50     ` Glauber Costa
2012-11-06 10:54     ` Michal Hocko
2012-11-01 12:07 ` [PATCH v6 12/29] execute the whole memcg freeing in free_worker Glauber Costa
2012-11-01 12:07 ` [PATCH v6 13/29] protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombs Glauber Costa
2012-11-01 12:07 ` [PATCH v6 14/29] Add documentation about the kmem controller Glauber Costa
2012-11-01 12:07 ` [PATCH v6 15/29] slab/slub: struct memcg_params Glauber Costa
2012-11-01 12:07 ` [PATCH v6 16/29] slab: annotate on-slab caches nodelist locks Glauber Costa
2012-11-01 12:07 ` [PATCH v6 17/29] consider a memcg parameter in kmem_create_cache Glauber Costa
2012-11-01 12:07 ` [PATCH v6 18/29] Allocate memory for memcg caches whenever a new memcg appears Glauber Costa
2012-11-06  0:23   ` Andrew Morton
2012-11-07  7:05     ` Glauber Costa
2012-11-07  7:10       ` Andrew Morton
2012-11-01 12:07 ` [PATCH v6 19/29] memcg: infrastructure to match an allocation to the right cache Glauber Costa
2012-11-06  0:28   ` Andrew Morton
2012-11-06  8:03     ` Michal Hocko
2012-11-08 11:05       ` Michal Hocko
2012-11-08 14:33         ` Michal Hocko
2012-11-07  7:04     ` Glauber Costa
2012-11-07  7:13       ` Andrew Morton
2012-11-01 12:07 ` [PATCH v6 20/29] memcg: skip memcg kmem allocations in specified code regions Glauber Costa
2012-11-06  0:33   ` Andrew Morton
2012-11-01 12:07 ` [PATCH v6 21/29] sl[au]b: always get the cache from its page in kmem_cache_free Glauber Costa
2012-11-01 12:07 ` [PATCH v6 22/29] sl[au]b: Allocate objects from memcg cache Glauber Costa
2012-11-01 12:07 ` [PATCH v6 23/29] memcg: destroy memcg caches Glauber Costa
2012-11-02  0:05   ` Andrew Morton
2012-11-02  7:46     ` Glauber Costa
2012-11-02 20:19       ` Michal Hocko
2012-11-06  0:40   ` Andrew Morton
2012-11-01 12:07 ` [PATCH v6 24/29] memcg/sl[au]b Track all the memcg children of a kmem_cache Glauber Costa
2012-11-01 12:07 ` [PATCH v6 25/29] memcg/sl[au]b: shrink dead caches Glauber Costa
2012-11-06  0:48   ` Andrew Morton
2012-11-07  7:13     ` Glauber Costa
2012-11-07  7:16       ` Andrew Morton
2012-11-07  9:22         ` Glauber Costa [this message]
2012-11-07 22:46           ` Andrew Morton
2012-11-08  7:13             ` Glauber Costa
2012-11-08 17:15             ` Christoph Lameter
2012-11-08 19:21               ` Andrew Morton
2012-11-08 22:31                 ` Glauber Costa
2012-11-08 22:40                   ` Andrew Morton
2012-11-09 20:06                     ` Christoph Lameter
2012-11-09 20:04                 ` Christoph Lameter
2012-11-01 12:07 ` [PATCH v6 26/29] Aggregate memcg cache values in slabinfo Glauber Costa
2012-11-06  0:57   ` Andrew Morton
2012-11-01 12:07 ` [PATCH v6 27/29] slab: propagate tunables values Glauber Costa
2012-11-01 12:07 ` [PATCH v6 28/29] slub: slub-specific propagation changes Glauber Costa
2012-11-06 19:25   ` Andrew Morton
2012-11-07 15:53   ` Sasha Levin
2012-11-08  6:51     ` Glauber Costa
2012-11-09  3:37       ` Sasha Levin
2012-11-14 12:06         ` Glauber Costa
2012-11-01 12:07 ` [PATCH v6 29/29] Add slab-specific documentation about the kmem controller Glauber Costa
2012-11-02  0:04 ` [PATCH v6 00/29] kmem controller for memcg Andrew Morton
2012-11-02  7:41   ` Glauber Costa
2012-11-02 19:25     ` JoonSoo Kim
2012-11-02 23:06       ` Tejun Heo
2012-11-05  8:14         ` Glauber Costa
2012-11-05  8:18       ` Glauber Costa
2012-11-03  3:36     ` Greg Thelen
2012-11-02  8:30   ` Pekka Enberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=509A2849.9090509@parallels.com \
    --to=glommer@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=penberg@cs.helsinki.fi \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=suleiman@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox