[PATCH v5 31/31] memcg: debugging facility to access dangling memcgs

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Glauber Costa <glommer@openvz.org>
To: <linux-mm@kvack.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>, <cgroups@vger.kernel.org>,
	<kamezawa.hiroyu@jp.fujitsu.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>,
	hughd@google.com, Greg Thelen <gthelen@google.com>,
	<linux-fsdevel@vger.kernel.org>,
	Glauber Costa <glommer@openvz.org>
Subject: [PATCH v5 31/31] memcg: debugging facility to access dangling memcgs
Date: Thu,  9 May 2013 10:06:48 +0400	[thread overview]
Message-ID: <1368079608-5611-32-git-send-email-glommer@openvz.org> (raw)
In-Reply-To: <1368079608-5611-1-git-send-email-glommer@openvz.org>

If memcg is tracking anything other than plain user memory (swap,
tcp buf mem, or slab memory), it is possible - and normal - that a
reference will be held by the group after it is dead.  Still, for
developers, it would be extremely useful to be able to query about those
states during debugging.

This patch provides a debugging facility in the root memcg, so we
can inspect which memcgs still have pending objects, and what is the
cause of this state.

[akpm@linux-foundation.org: fix up Kconfig text]
Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>

---
This is a debug-only patch, intended for -mm only
---
 Documentation/cgroups/memory.txt |  16 ++++
 init/Kconfig                     |  17 +++++
 mm/memcontrol.c                  | 160 +++++++++++++++++++++++++++++++++++++--
 3 files changed, 187 insertions(+), 6 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 09027a9..1178e23 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -72,6 +72,7 @@ Brief summary of control files.
  memory.move_charge_at_immigrate # set/show controls of moving charges
  memory.oom_control		 # set/show oom controls.
  memory.numa_stat		 # show the number of memory usage per numa node
+ memory.dangling_memcgs          # show debugging information about dangling groups
 
  memory.kmem.limit_in_bytes      # set/show hard limit for kernel memory
  memory.kmem.usage_in_bytes      # show current kernel memory allocation
@@ -579,6 +580,21 @@ unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
 
 And we have total = file + anon + unevictable.
 
+5.7 dangling_memcgs
+
+This file will only be ever present in the root cgroup, if the option
+CONFIG_MEMCG_DEBUG_ASYNC_DESTROY is set. When a memcg is destroyed, the memory
+consumed by it may not be immediately freed. This is because when some
+extensions are used, such as swap or kernel memory, objects can outlive the
+group and hold a reference to it.
+
+If this is the case, the dangling_memcgs file will show information about what
+are the memcgs still alive, and which references are still preventing it to be
+freed. There is nothing wrong with that, but it is very useful when debugging,
+to know where this memory is being held. This is a developer-oriented debugging
+facility only, and no guarantees of interface stability will be given. The file
+is read-only, and has the sole purpose of displaying information.
+
 6. Hierarchy support
 
 The memory controller supports a deep hierarchy and hierarchical accounting.
diff --git a/init/Kconfig b/init/Kconfig
index 6e47c09..346cd1b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -946,6 +946,23 @@ config MEMCG_KMEM
 	  the kmem extension can use it to guarantee that no group of processes
 	  will ever exhaust kernel resources alone.
 
+config MEMCG_DEBUG_ASYNC_DESTROY
+	bool "Memory Resource Controller Debug asynchronous object destruction"
+	depends on MEMCG_KMEM || MEMCG_SWAP
+	default n
+	help
+	  When a memcg is destroyed, the memory consumed by it may not be
+	  immediately freed. This is because when some extensions are used, such
+	  as swap or kernel memory, objects can outlive the group and hold a
+	  reference to it.
+
+	  If this is the case, the dangling_memcgs file will show information
+	  about what are the memcgs still alive, and which references are still
+	  preventing it to be freed. There is nothing wrong with that, but it is
+	  very useful when debugging, to know where this memory is being held.
+	  This is a developer-oriented debugging facility only, and no
+	  guarantees of interface stability will be given.
+
 config CGROUP_HUGETLB
 	bool "HugeTLB Resource Controller for Control Groups"
 	depends on RESOURCE_COUNTERS && HUGETLB_PAGE
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fc3a8d5..2780c39 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -330,11 +330,20 @@ struct mem_cgroup {
 		struct list_head dead;
 	};
 
-	/*
-	 * Should we move charges of a task when a task is moved into this
-	 * mem_cgroup ? And what type of charges should we move ?
-	 */
-	unsigned long 	move_charge_at_immigrate;
+	union {
+		/*
+		 * Should we move charges of a task when a task is moved into
+		 * this mem_cgroup ? And what type of charges should we move ?
+		 */
+		unsigned long move_charge_at_immigrate;
+
+		/*
+		 * We are no longer concerned about moving charges after memcg
+		 * is dead. So we will fill this up with its name, to aid
+		 * debugging.
+		 */
+		char *memcg_name;
+	};
 	/*
 	 * set > 0 if pages under this cgroup are moving to other cgroup.
 	 */
@@ -399,10 +408,40 @@ static inline void memcg_dangling_free(struct mem_cgroup *memcg)
 	mutex_lock(&dangling_memcgs_mutex);
 	list_del(&memcg->dead);
 	mutex_unlock(&dangling_memcgs_mutex);
+#ifdef CONFIG_MEMCG_DEBUG_ASYNC_DESTROY
+	free_pages((unsigned long)memcg->memcg_name, 0);
+#endif
 }
 
 static inline void memcg_dangling_add(struct mem_cgroup *memcg)
 {
+#ifdef CONFIG_MEMCG_DEBUG_ASYNC_DESTROY
+	/*
+	 * cgroup.c will do page-sized allocations most of the time,
+	 * so we'll just follow the pattern. Also, __get_free_pages
+	 * is a better interface than kmalloc for us here, because
+	 * we'd like this memory to be always billed to the root cgroup,
+	 * not to the process removing the memcg. While kmalloc would
+	 * require us to wrap it into memcg_stop/resume_kmem_account,
+	 * with __get_free_pages we just don't pass the memcg flag.
+	 */
+	memcg->memcg_name = (char *)__get_free_pages(GFP_KERNEL, 0);
+
+	/*
+	 * we will, in general, just ignore failures. No need to go crazy,
+	 * being this just a debugging interface. It is nice to copy a memcg
+	 * name over, but if we (unlikely) can't, just the address will do
+	 */
+	if (!memcg->memcg_name)
+		goto add_list;
+
+	if (cgroup_path(memcg->css.cgroup, memcg->memcg_name, PAGE_SIZE) < 0) {
+		free_pages((unsigned long)memcg->memcg_name, 0);
+		memcg->memcg_name = NULL;
+	}
+
+add_list:
+#endif
 	INIT_LIST_HEAD(&memcg->dead);
 	mutex_lock(&dangling_memcgs_mutex);
 	list_add(&memcg->dead, &dangling_memcgs);
@@ -3594,7 +3633,7 @@ static void kmem_cache_destroy_work_func(struct work_struct *w)
 	 */
 	if (atomic_read(&cachep->memcg_params->nr_pages) != 0) {
 		kmem_cache_shrink(cachep);
-		if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
+		if (atomic_read(&cachep->memcg_params->nr_pages) != 0)
 			return;
 	} else
 		kmem_cache_destroy(cachep);
@@ -5384,6 +5423,107 @@ static ssize_t mem_cgroup_read(struct cgroup *cont, struct cftype *cft,
 	return simple_read_from_buffer(buf, nbytes, ppos, str, len);
 }
 
+#ifdef CONFIG_MEMCG_DEBUG_ASYNC_DESTROY
+static void
+mem_cgroup_dangling_swap(struct mem_cgroup *memcg, struct seq_file *m)
+{
+#ifdef CONFIG_MEMCG_SWAP
+	u64 kmem;
+	u64 memsw;
+
+	/*
+	 * kmem will also propagate here, so we are only interested in the
+	 * difference.  See comment in mem_cgroup_reparent_charges for details.
+	 *
+	 * We could save this value for later consumption by kmem reports, but
+	 * there is not a lot of problem if the figures differ slightly.
+	 */
+	kmem = res_counter_read_u64(&memcg->kmem, RES_USAGE);
+	memsw = res_counter_read_u64(&memcg->memsw, RES_USAGE) - kmem;
+	seq_printf(m, "\t%llu swap bytes\n", memsw);
+#endif
+}
+
+
+static void
+mem_cgroup_dangling_tcp(struct mem_cgroup *memcg, struct seq_file *m)
+{
+#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+	struct tcp_memcontrol *tcp = &memcg->tcp_mem;
+	s64 tcp_socks;
+	u64 tcp_bytes;
+
+	tcp_socks = percpu_counter_sum_positive(&tcp->tcp_sockets_allocated);
+	tcp_bytes = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_USAGE);
+	seq_printf(m, "\t%llu tcp bytes", tcp_bytes);
+	/*
+	 * if tcp_bytes == 0, tcp_socks != 0 is a bug. One more reason to print
+	 * it!
+	 */
+	if (tcp_bytes || tcp_socks)
+		seq_printf(m, ", in %lld sockets", tcp_socks);
+	seq_printf(m, "\n");
+
+#endif
+}
+
+static void
+mem_cgroup_dangling_kmem(struct mem_cgroup *memcg, struct seq_file *m)
+{
+#ifdef CONFIG_MEMCG_KMEM
+	u64 kmem;
+	struct memcg_cache_params *params;
+
+	kmem = res_counter_read_u64(&memcg->kmem, RES_USAGE);
+	seq_printf(m, "\t%llu kmem bytes", kmem);
+
+	/* list below may not be initialized, so not even try */
+	if (!kmem)
+		return;
+
+	seq_printf(m, " in caches");
+	mutex_lock(&memcg->slab_caches_mutex);
+	list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
+			struct kmem_cache *s = memcg_params_to_cache(params);
+
+		seq_printf(m, " %s", s->name);
+	}
+	mutex_unlock(&memcg->slab_caches_mutex);
+	seq_printf(m, "\n");
+#endif
+}
+
+/*
+ * After a memcg is destroyed, it may still be kept around in memory.
+ * Currently, the two main reasons for it are swap entries, and kernel memory.
+ * Because they will be freed assynchronously, they will pin the memcg structure
+ * and its resources until the last reference goes away.
+ *
+ * This root-only file will show information about which users
+ */
+static int mem_cgroup_dangling_read(struct cgroup *cont, struct cftype *cft,
+					struct seq_file *m)
+{
+	struct mem_cgroup *memcg;
+
+	mutex_lock(&dangling_memcgs_mutex);
+
+	list_for_each_entry(memcg, &dangling_memcgs, dead) {
+		if (memcg->memcg_name)
+			seq_printf(m, "%s:\n", memcg->memcg_name);
+		else
+			seq_printf(m, "%p (name lost):\n", memcg);
+
+		mem_cgroup_dangling_swap(memcg, m);
+		mem_cgroup_dangling_tcp(memcg, m);
+		mem_cgroup_dangling_kmem(memcg, m);
+	}
+
+	mutex_unlock(&dangling_memcgs_mutex);
+	return 0;
+}
+#endif
+
 static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
 {
 	int ret = -EINVAL;
@@ -6331,6 +6471,14 @@ static struct cftype mem_cgroup_files[] = {
 	},
 #endif
 #endif
+
+#ifdef CONFIG_MEMCG_DEBUG_ASYNC_DESTROY
+	{
+		.name = "dangling_memcgs",
+		.read_seq_string = mem_cgroup_dangling_read,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+	},
+#endif
 	{ },	/* terminate */
 };
 
-- 
1.8.1.4

next prev parent reply	other threads:[~2013-05-09  6:07 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-09  6:06 [PATCH v5 00/31] kmemcg shrinkers Glauber Costa
2013-05-09  6:06 ` [PATCH v5 01/31] super: fix calculation of shrinkable objects for small numbers Glauber Costa
2013-05-09  6:06 ` [PATCH v5 02/31] vmscan: take at least one pass with shrinkers Glauber Costa
2013-05-09 11:12   ` Mel Gorman
     [not found]     ` <20130509111226.GR11497-l3A5Bk7waGM@public.gmane.org>
2013-05-09 11:28       ` Glauber Costa
     [not found]         ` <518B884C.9090704-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-05-09 11:35           ` Glauber Costa
2013-05-09  6:06 ` [PATCH v5 03/31] dcache: convert dentry_stat.nr_unused to per-cpu counters Glauber Costa
2013-05-09  6:06 ` [PATCH v5 04/31] dentry: move to per-sb LRU locks Glauber Costa
     [not found]   ` <1368079608-5611-5-git-send-email-glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2013-05-10  5:29     ` Dave Chinner
2013-05-10  8:16       ` Dave Chinner
2013-05-09  6:06 ` [PATCH v5 05/31] dcache: remove dentries from LRU before putting on dispose list Glauber Costa
2013-05-09  6:06 ` [PATCH v5 06/31] mm: new shrinker API Glauber Costa
2013-05-09 13:30   ` Mel Gorman
     [not found] ` <1368079608-5611-1-git-send-email-glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2013-05-09  6:06   ` [PATCH v5 07/31] shrinker: convert superblock shrinkers to new API Glauber Costa
2013-05-09 13:33     ` Mel Gorman
2013-05-09  6:06 ` [PATCH v5 08/31] list: add a new LRU list type Glauber Costa
     [not found]   ` <1368079608-5611-9-git-send-email-glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2013-05-09 13:37     ` Mel Gorman
     [not found]       ` <20130509133742.GW11497-l3A5Bk7waGM@public.gmane.org>
2013-05-09 21:02         ` Glauber Costa
2013-05-10  9:21           ` Mel Gorman
2013-05-10  9:56             ` Glauber Costa
     [not found]               ` <518CC44D.1020409-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-05-10 10:01                 ` Mel Gorman
2013-05-09  6:06 ` [PATCH v5 09/31] inode: convert inode lru list to generic lru list code Glauber Costa
2013-05-09  6:06 ` [PATCH v5 10/31] dcache: convert to use new lru list infrastructure Glauber Costa
2013-05-09  6:06 ` [PATCH v5 11/31] list_lru: per-node " Glauber Costa
2013-05-09 13:42   ` Mel Gorman
     [not found]     ` <20130509134246.GX11497-l3A5Bk7waGM@public.gmane.org>
2013-05-09 21:05       ` Glauber Costa
2013-05-09  6:06 ` [PATCH v5 12/31] shrinker: add node awareness Glauber Costa
2013-05-09  6:06 ` [PATCH v5 13/31] fs: convert inode and dentry shrinking to be node aware Glauber Costa
2013-05-09  6:06 ` [PATCH v5 14/31] xfs: convert buftarg LRU to generic code Glauber Costa
     [not found]   ` <1368079608-5611-15-git-send-email-glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2013-05-09 13:43     ` Mel Gorman
2013-05-09  6:06 ` [PATCH v5 15/31] xfs: convert dquot cache lru to list_lru Glauber Costa
2013-05-09  6:06 ` [PATCH v5 16/31] fs: convert fs shrinkers to new scan/count API Glauber Costa
2013-05-09  6:06 ` [PATCH v5 17/31] drivers: convert shrinkers to new count/scan API Glauber Costa
2013-05-09 13:52   ` Mel Gorman
     [not found]     ` <20130509135209.GZ11497-l3A5Bk7waGM@public.gmane.org>
2013-05-09 21:19       ` Glauber Costa
2013-05-10  9:00         ` Mel Gorman
2013-05-09  6:06 ` [PATCH v5 18/31] shrinker: convert remaining shrinkers to " Glauber Costa
2013-05-09  6:06 ` [PATCH v5 19/31] hugepage: convert huge zero page shrinker to new shrinker API Glauber Costa
     [not found]   ` <1368079608-5611-20-git-send-email-glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2013-05-10  1:24     ` Kirill A. Shutemov
2013-05-09  6:06 ` [PATCH v5 20/31] shrinker: Kill old ->shrink API Glauber Costa
2013-05-09 13:53   ` Mel Gorman
2013-05-09  6:06 ` [PATCH v5 21/31] vmscan: also shrink slab in memcg pressure Glauber Costa
2013-05-09  6:06 ` [PATCH v5 22/31] memcg,list_lru: duplicate LRUs upon kmemcg creation Glauber Costa
2013-05-09  6:06 ` [PATCH v5 23/31] lru: add an element to a memcg list Glauber Costa
2013-05-09  6:06 ` [PATCH v5 24/31] list_lru: per-memcg walks Glauber Costa
2013-05-09  6:06 ` [PATCH v5 25/31] memcg: per-memcg kmem shrinking Glauber Costa
2013-05-09  6:06 ` [PATCH v5 26/31] memcg: scan cache objects hierarchically Glauber Costa
2013-05-09  6:06 ` [PATCH v5 27/31] super: targeted memcg reclaim Glauber Costa
2013-05-09  6:06 ` [PATCH v5 28/31] memcg: move initialization to memcg creation Glauber Costa
2013-05-09  6:06 ` [PATCH v5 29/31] vmpressure: in-kernel notifications Glauber Costa
2013-05-09  6:06 ` [PATCH v5 30/31] memcg: reap dead memcgs upon global memory pressure Glauber Costa
2013-05-09  6:06 ` Glauber Costa [this message]
2013-05-09 10:55 ` [PATCH v5 00/31] kmemcg shrinkers Mel Gorman
     [not found]   ` <20130509105519.GQ11497-l3A5Bk7waGM@public.gmane.org>
2013-05-09 11:34     ` Glauber Costa
2013-05-09 13:18   ` Dave Chinner
2013-05-09 14:03     ` Mel Gorman
     [not found]       ` <20130509140311.GB11497-l3A5Bk7waGM@public.gmane.org>
2013-05-09 21:24         ` Glauber Costa

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:09027a9 dfblob:1178e23 dfblob:6e47c09 dfblob:346cd1b
dfblob:fc3a8d5 dfblob:2780c39 )
 OR (
bs:"[PATCH v5 31/31] memcg: debugging facility to access dangling memcgs" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1368079608-5611-32-git-send-email-glommer@openvz.org \
    --to=glommer@openvz.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).