[PATCH 0/2] Show information about dangling memcgs

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Show information about dangling memcgs
@ 2012-11-22 10:29 Glauber Costa
  2012-11-22 10:29 ` [PATCH 1/2] cgroup: helper do determine group name Glauber Costa
  2012-11-22 10:29 ` [PATCH 2/2] memcg: debugging facility to access dangling memcgs Glauber Costa
  0 siblings, 2 replies; 16+ messages in thread
From: Glauber Costa @ 2012-11-22 10:29 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, kamezawa.hiroyu,
	Tejun Heo

Hi,

As suggested by Kame, this is a proposed interface to show information
about who and what is keeping memcgs in memory after they are removed.

It is not very complicated, and I was took care to note in the Docs that this
is debug only.

Please let me know if you think.

Glauber Costa (2):
  cgroup: helper do determine group name
  memcg: debugging facility to access dangling memcgs.

 Documentation/cgroups/memory.txt |  13 +++
 include/linux/cgroup.h           |   1 +
 kernel/cgroup.c                  |   9 +++
 mm/memcontrol.c                  | 167 +++++++++++++++++++++++++++++++++++----
 4 files changed, 176 insertions(+), 14 deletions(-)

-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] cgroup: helper do determine group name
  2012-11-22 10:29 [PATCH 0/2] Show information about dangling memcgs Glauber Costa
@ 2012-11-22 10:29 ` Glauber Costa
  2012-11-22 14:32   ` Tejun Heo
  2012-11-23  8:55   ` Michal Hocko
  2012-11-22 10:29 ` [PATCH 2/2] memcg: debugging facility to access dangling memcgs Glauber Costa
  1 sibling, 2 replies; 16+ messages in thread
From: Glauber Costa @ 2012-11-22 10:29 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, kamezawa.hiroyu,
	Tejun Heo, Glauber Costa

With more than one user, it is useful to have a helper function in the
cgroup core to derive a group's name.

We'll just return a pointer, and it is not expected to get incredibly
complicated. But it is useful to have it so we can abstract away the
vfs relation from its users.

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Tejun Heo <tj@kernel.org>
CC: Michal Hocko <mhocko@suse.cz>
CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
---
Tejun:

I know the rcu is no longer necessary. I am using mhocko's tree,
that doesn't seem to have your last stream of patches yet. If you
approve the interface, we'll need a follow up on this to remove the
rcu dereference of the dentry.

 include/linux/cgroup.h |  1 +
 kernel/cgroup.c        |  9 +++++++++
 mm/memcontrol.c        | 11 ++++-------
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index a178a91..57c4ab1 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -401,6 +401,7 @@ int cgroup_rm_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts);
 int cgroup_is_removed(const struct cgroup *cgrp);
 
 int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen);
+extern const char *cgroup_name(const struct cgroup *cgrp);
 
 int cgroup_task_count(const struct cgroup *cgrp);
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 3d68aad..d0d291e 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1757,6 +1757,15 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
 }
 EXPORT_SYMBOL_GPL(cgroup_path);
 
+const char *cgroup_name(const struct cgroup *cgrp)
+{
+	struct dentry *dentry;
+	rcu_read_lock();
+	dentry = rcu_dereference_check(cgrp->dentry, cgroup_lock_is_held());
+	rcu_read_unlock();
+	return dentry->d_name.name;
+}
+
 /*
  * Control Group taskset
  */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e3d805f..05b87aa 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3141,16 +3141,13 @@ void mem_cgroup_destroy_cache(struct kmem_cache *cachep)
 static char *memcg_cache_name(struct mem_cgroup *memcg, struct kmem_cache *s)
 {
 	char *name;
-	struct dentry *dentry;
+	const char *cgname;
 
-	rcu_read_lock();
-	dentry = rcu_dereference(memcg->css.cgroup->dentry);
-	rcu_read_unlock();
-
-	BUG_ON(dentry == NULL);
+	cgname = cgroup_name(memcg->css.cgroup);
+	BUG_ON(cgname == NULL);
 
 	name = kasprintf(GFP_KERNEL, "%s(%d:%s)", s->name,
-			 memcg_cache_id(memcg), dentry->d_name.name);
+			 memcg_cache_id(memcg), cgname);
 
 	return name;
 }
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] memcg: debugging facility to access dangling memcgs.
  2012-11-22 10:29 [PATCH 0/2] Show information about dangling memcgs Glauber Costa
  2012-11-22 10:29 ` [PATCH 1/2] cgroup: helper do determine group name Glauber Costa
@ 2012-11-22 10:29 ` Glauber Costa
  2012-11-22 10:36   ` Glauber Costa
  2012-11-23  9:20   ` Michal Hocko
  1 sibling, 2 replies; 16+ messages in thread
From: Glauber Costa @ 2012-11-22 10:29 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, kamezawa.hiroyu,
	Tejun Heo, Glauber Costa

If memcg is tracking anything other than plain user memory (swap, tcp
buf mem, or slab memory), it is possible that a reference will be held
by the group after it is dead.

This patch provides a debugging facility in the root memcg, so we can
inspect which memcgs still have pending objects, and what is the cause
of this state.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
 Documentation/cgroups/memory.txt |  13 ++++
 mm/memcontrol.c                  | 156 +++++++++++++++++++++++++++++++++++++--
 2 files changed, 162 insertions(+), 7 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 8b8c28b..704247eb 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -70,6 +70,7 @@ Brief summary of control files.
  memory.move_charge_at_immigrate # set/show controls of moving charges
  memory.oom_control		 # set/show oom controls.
  memory.numa_stat		 # show the number of memory usage per numa node
+ memory.dangling_memcgs          # show debugging information about dangling groups
 
  memory.kmem.limit_in_bytes      # set/show hard limit for kernel memory
  memory.kmem.usage_in_bytes      # show current kernel memory allocation
@@ -577,6 +578,18 @@ unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
 
 And we have total = file + anon + unevictable.
 
+5.7 dangling_memcgs
+
+This file will only be ever present in the root cgroup. When a memcg is
+destroyed, the memory consumed by it may not be immediately freed. This is
+because when some extensions are used, such as swap or kernel memory, objects
+can outlive the group and hold a reference to it.
+
+If this is the case, the dangling_memcgs file will show information about what
+are the memcgs still alive, and which references are still preventing it to be
+freed. This is a debugging facility only, and no guarantees of interface
+stability will be given.
+
 6. Hierarchy support
 
 The memory controller supports a deep hierarchy and hierarchical accounting.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 05b87aa..46f7cfb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -311,14 +311,31 @@ struct mem_cgroup {
 	/* thresholds for mem+swap usage. RCU-protected */
 	struct mem_cgroup_thresholds memsw_thresholds;
 
-	/* For oom notifier event fd */
-	struct list_head oom_notify;
+	union {
+		/* For oom notifier event fd */
+		struct list_head oom_notify;
+		/*
+		 * we can only trigger an oom event if the memcg is alive.
+		 * so we will reuse this field to hook the memcg in the list
+		 * of dead memcgs.
+		 */
+		struct list_head dead;
+	};
 
-	/*
-	 * Should we move charges of a task when a task is moved into this
-	 * mem_cgroup ? And what type of charges should we move ?
-	 */
-	unsigned long 	move_charge_at_immigrate;
+	union {
+		/*
+		 * Should we move charges of a task when a task is moved into
+		 * this mem_cgroup ? And what type of charges should we move ?
+		 */
+		unsigned long	move_charge_at_immigrate;
+
+		/*
+		 * We are no longer concerned about moving charges after memcg
+		 * is dead. So we will fill this up with its name, to aid
+		 * debugging.
+		 */
+		char *memcg_name;
+	};
 	/*
 	 * set > 0 if pages under this cgroup are moving to other cgroup.
 	 */
@@ -349,6 +366,33 @@ struct mem_cgroup {
 #endif
 };
 
+#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEMCG_SWAP)
+static LIST_HEAD(dangling_memcgs);
+static DEFINE_MUTEX(dangling_memcgs_mutex);
+
+static inline void memcg_dangling_free(struct mem_cgroup *memcg)
+{
+	mutex_lock(&dangling_memcgs_mutex);
+	list_del(&memcg->dead);
+	mutex_unlock(&dangling_memcgs_mutex);
+	kfree(memcg->memcg_name);
+}
+
+static inline void memcg_dangling_add(struct mem_cgroup *memcg)
+{
+
+	memcg->memcg_name = kstrdup(cgroup_name(memcg->css.cgroup), GFP_KERNEL);
+
+	INIT_LIST_HEAD(&memcg->dead);
+	mutex_lock(&dangling_memcgs_mutex);
+	list_add(&memcg->dead, &dangling_memcgs);
+	mutex_unlock(&dangling_memcgs_mutex);
+}
+#else
+static inline void memcg_dangling_free(struct mem_cgroup *memcg) {}
+static inline void memcg_dangling_add(struct mem_cgroup *memcg) {}
+#endif
+
 /* internal only representation about the status of kmem accounting. */
 enum {
 	KMEM_ACCOUNTED_ACTIVE = 0, /* accounted by this cgroup itself */
@@ -4868,6 +4912,92 @@ static ssize_t mem_cgroup_read(struct cgroup *cont, struct cftype *cft,
 	return simple_read_from_buffer(buf, nbytes, ppos, str, len);
 }
 
+#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEMCG_SWAP)
+static void
+mem_cgroup_dangling_swap(struct mem_cgroup *memcg, struct seq_file *m)
+{
+#ifdef CONFIG_MEMCG_SWAP
+	u64 kmem;
+	u64 memsw;
+
+	/*
+	 * kmem will also propagate here, so we are only interested in the
+	 * difference.  See comment in mem_cgroup_reparent_charges for details.
+	 *
+	 * We could save this value for later consumption by kmem reports, but
+	 * there is not a lot of problem if the figures differ slightly.
+	 */
+	kmem = res_counter_read_u64(&memcg->kmem, RES_USAGE);
+	memsw = res_counter_read_u64(&memcg->memsw, RES_USAGE) - kmem;
+	seq_printf(m, "\t%llu swap bytes\n", memsw);
+#endif
+}
+
+static void
+mem_cgroup_dangling_kmem(struct mem_cgroup *memcg, struct seq_file *m)
+{
+#ifdef CONFIG_MEMCG_KMEM
+	u64 kmem;
+	struct memcg_cache_params *params;
+
+#ifdef CONFIG_INET
+	struct tcp_memcontrol *tcp = &memcg->tcp_mem;
+	s64 tcp_socks;
+	u64 tcp_bytes;
+
+	tcp_socks = percpu_counter_sum_positive(&tcp->tcp_sockets_allocated);
+	tcp_bytes = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_USAGE);
+	seq_printf(m, "\t%llu tcp bytes, in %lld sockets\n",
+		   tcp_bytes, tcp_socks);
+
+#endif
+
+	kmem = res_counter_read_u64(&memcg->kmem, RES_USAGE);
+	seq_printf(m, "\t%llu kmem bytes", kmem);
+
+	/* list below may not be initialized, so not even try */
+	if (!kmem)
+		return;
+
+	seq_printf(m, " in caches");
+	mutex_lock(&memcg->slab_caches_mutex);
+	list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
+			struct kmem_cache *s = memcg_params_to_cache(params);
+
+		seq_printf(m, " %s", s->name);
+	}
+	mutex_unlock(&memcg->slab_caches_mutex);
+	seq_printf(m, "\n");
+#endif
+}
+
+/*
+ * After a memcg is destroyed, it may still be kept around in memory.
+ * Currently, the two main reasons for it are swap entries, and kernel memory.
+ * Because they will be freed assynchronously, they will pin the memcg structure
+ * and its resources until the last reference goes away.
+ *
+ * This root-only file will show information about which users
+ */
+static int mem_cgroup_dangling_read(struct cgroup *cont, struct cftype *cft,
+					struct seq_file *m)
+{
+	struct mem_cgroup *memcg;
+
+	mutex_lock(&dangling_memcgs_mutex);
+
+	list_for_each_entry(memcg, &dangling_memcgs, dead) {
+		seq_printf(m, "%s:\n", memcg->memcg_name);
+
+		mem_cgroup_dangling_swap(memcg, m);
+		mem_cgroup_dangling_kmem(memcg, m);
+	}
+
+	mutex_unlock(&dangling_memcgs_mutex);
+	return 0;
+}
+#endif
+
 static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
 {
 	int ret = -EINVAL;
@@ -5831,6 +5961,14 @@ static struct cftype mem_cgroup_files[] = {
 	},
 #endif
 #endif
+
+#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEMCG_SWAP)
+	{
+		.name = "dangling_memcgs",
+		.read_seq_string = mem_cgroup_dangling_read,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+	},
+#endif
 	{ },	/* terminate */
 };
 
@@ -5933,6 +6071,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
 	 * the cgroup_lock.
 	 */
 	disarm_static_keys(memcg);
+
 	if (size < PAGE_SIZE)
 		kfree(memcg);
 	else
@@ -5950,6 +6089,8 @@ static void free_work(struct work_struct *work)
 	struct mem_cgroup *memcg;
 
 	memcg = container_of(work, struct mem_cgroup, work_freeing);
+
+	memcg_dangling_free(memcg);
 	__mem_cgroup_free(memcg);
 }
 
@@ -6139,6 +6280,7 @@ static void mem_cgroup_destroy(struct cgroup *cont)
 
 	kmem_cgroup_destroy(memcg);
 
+	memcg_dangling_add(memcg);
 	mem_cgroup_put(memcg);
 }
 
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] memcg: debugging facility to access dangling memcgs.
  2012-11-22 10:29 ` [PATCH 2/2] memcg: debugging facility to access dangling memcgs Glauber Costa
@ 2012-11-22 10:36   ` Glauber Costa
  2012-11-22 13:53     ` Glauber Costa
  2012-11-23  9:20   ` Michal Hocko
  1 sibling, 1 reply; 16+ messages in thread
From: Glauber Costa @ 2012-11-22 10:36 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, Andrew Morton, Johannes Weiner, Michal Hocko,
	kamezawa.hiroyu, Tejun Heo, apw, Joe Perches

On 11/22/2012 02:29 PM, Glauber Costa wrote:
> If memcg is tracking anything other than plain user memory (swap, tcp
> buf mem, or slab memory), it is possible that a reference will be held
> by the group after it is dead.
> 
> This patch provides a debugging facility in the root memcg, so we can
> inspect which memcgs still have pending objects, and what is the cause
> of this state.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  Documentation/cgroups/memory.txt |  13 ++++
>  mm/memcontrol.c                  | 156 +++++++++++++++++++++++++++++++++++++--
>  2 files changed, 162 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
> index 8b8c28b..704247eb 100644
> --- a/Documentation/cgroups/memory.txt
> +++ b/Documentation/cgroups/memory.txt
> @@ -70,6 +70,7 @@ Brief summary of control files.
>   memory.move_charge_at_immigrate # set/show controls of moving charges
>   memory.oom_control		 # set/show oom controls.
>   memory.numa_stat		 # show the number of memory usage per numa node
> + memory.dangling_memcgs          # show debugging information about dangling groups
>  
>   memory.kmem.limit_in_bytes      # set/show hard limit for kernel memory
>   memory.kmem.usage_in_bytes      # show current kernel memory allocation
> @@ -577,6 +578,18 @@ unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
>  
>  And we have total = file + anon + unevictable.
>  
> +5.7 dangling_memcgs
> +
> +This file will only be ever present in the root cgroup. When a memcg is
> +destroyed, the memory consumed by it may not be immediately freed. This is
> +because when some extensions are used, such as swap or kernel memory, objects
> +can outlive the group and hold a reference to it.
> +
> +If this is the case, the dangling_memcgs file will show information about what
> +are the memcgs still alive, and which references are still preventing it to be
> +freed. This is a debugging facility only, and no guarantees of interface
> +stability will be given.
> +
>  6. Hierarchy support
>  
>  The memory controller supports a deep hierarchy and hierarchical accounting.
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 05b87aa..46f7cfb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -311,14 +311,31 @@ struct mem_cgroup {
>  	/* thresholds for mem+swap usage. RCU-protected */
>  	struct mem_cgroup_thresholds memsw_thresholds;
>  
> -	/* For oom notifier event fd */
> -	struct list_head oom_notify;
> +	union {
> +		/* For oom notifier event fd */
> +		struct list_head oom_notify;
> +		/*
> +		 * we can only trigger an oom event if the memcg is alive.
> +		 * so we will reuse this field to hook the memcg in the list
> +		 * of dead memcgs.
> +		 */
> +		struct list_head dead;
> +	};
>  
> -	/*
> -	 * Should we move charges of a task when a task is moved into this
> -	 * mem_cgroup ? And what type of charges should we move ?
> -	 */
> -	unsigned long 	move_charge_at_immigrate;
> +	union {
> +		/*
> +		 * Should we move charges of a task when a task is moved into
> +		 * this mem_cgroup ? And what type of charges should we move ?
> +		 */
> +		unsigned long	move_charge_at_immigrate;
> +
> +		/*
> +		 * We are no longer concerned about moving charges after memcg
> +		 * is dead. So we will fill this up with its name, to aid
> +		 * debugging.
> +		 */
> +		char *memcg_name;
> +	};
>  	/*
>  	 * set > 0 if pages under this cgroup are moving to other cgroup.
>  	 */
> @@ -349,6 +366,33 @@ struct mem_cgroup {
>  #endif
>  };
>  
> +#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEMCG_SWAP)
> +static LIST_HEAD(dangling_memcgs);
> +static DEFINE_MUTEX(dangling_memcgs_mutex);
> +
> +static inline void memcg_dangling_free(struct mem_cgroup *memcg)
> +{
> +	mutex_lock(&dangling_memcgs_mutex);
> +	list_del(&memcg->dead);
> +	mutex_unlock(&dangling_memcgs_mutex);
> +	kfree(memcg->memcg_name);
> +}
> +
> +static inline void memcg_dangling_add(struct mem_cgroup *memcg)
> +{
> +
> +	memcg->memcg_name = kstrdup(cgroup_name(memcg->css.cgroup), GFP_KERNEL);
> +
> +	INIT_LIST_HEAD(&memcg->dead);
> +	mutex_lock(&dangling_memcgs_mutex);
> +	list_add(&memcg->dead, &dangling_memcgs);
> +	mutex_unlock(&dangling_memcgs_mutex);
> +}
> +#else
> +static inline void memcg_dangling_free(struct mem_cgroup *memcg) {}
> +static inline void memcg_dangling_add(struct mem_cgroup *memcg) {}
> +#endif
> +
>  /* internal only representation about the status of kmem accounting. */
>  enum {
>  	KMEM_ACCOUNTED_ACTIVE = 0, /* accounted by this cgroup itself */
> @@ -4868,6 +4912,92 @@ static ssize_t mem_cgroup_read(struct cgroup *cont, struct cftype *cft,
>  	return simple_read_from_buffer(buf, nbytes, ppos, str, len);
>  }
>  
> +#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEMCG_SWAP)
> +static void
> +mem_cgroup_dangling_swap(struct mem_cgroup *memcg, struct seq_file *m)
> +{
> +#ifdef CONFIG_MEMCG_SWAP
> +	u64 kmem;
> +	u64 memsw;
> +
> +	/*
> +	 * kmem will also propagate here, so we are only interested in the
> +	 * difference.  See comment in mem_cgroup_reparent_charges for details.
> +	 *
> +	 * We could save this value for later consumption by kmem reports, but
> +	 * there is not a lot of problem if the figures differ slightly.
> +	 */
> +	kmem = res_counter_read_u64(&memcg->kmem, RES_USAGE);
> +	memsw = res_counter_read_u64(&memcg->memsw, RES_USAGE) - kmem;
> +	seq_printf(m, "\t%llu swap bytes\n", memsw);
> +#endif
> +}
> +
> +static void
> +mem_cgroup_dangling_kmem(struct mem_cgroup *memcg, struct seq_file *m)
> +{
> +#ifdef CONFIG_MEMCG_KMEM
> +	u64 kmem;
> +	struct memcg_cache_params *params;
> +
> +#ifdef CONFIG_INET
> +	struct tcp_memcontrol *tcp = &memcg->tcp_mem;
> +	s64 tcp_socks;
> +	u64 tcp_bytes;
> +
> +	tcp_socks = percpu_counter_sum_positive(&tcp->tcp_sockets_allocated);
> +	tcp_bytes = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_USAGE);
> +	seq_printf(m, "\t%llu tcp bytes, in %lld sockets\n",
> +		   tcp_bytes, tcp_socks);
> +
> +#endif
> +
> +	kmem = res_counter_read_u64(&memcg->kmem, RES_USAGE);
> +	seq_printf(m, "\t%llu kmem bytes", kmem);
> +
> +	/* list below may not be initialized, so not even try */
> +	if (!kmem)
> +		return;
> +
> +	seq_printf(m, " in caches");
> +	mutex_lock(&memcg->slab_caches_mutex);
> +	list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
> +			struct kmem_cache *s = memcg_params_to_cache(params);
> +
> +		seq_printf(m, " %s", s->name);
> +	}
> +	mutex_unlock(&memcg->slab_caches_mutex);
> +	seq_printf(m, "\n");
> +#endif
> +}
> +
> +/*
> + * After a memcg is destroyed, it may still be kept around in memory.
> + * Currently, the two main reasons for it are swap entries, and kernel memory.
> + * Because they will be freed assynchronously, they will pin the memcg structure
> + * and its resources until the last reference goes away.
> + *
> + * This root-only file will show information about which users
> + */
> +static int mem_cgroup_dangling_read(struct cgroup *cont, struct cftype *cft,
> +					struct seq_file *m)
> +{
> +	struct mem_cgroup *memcg;
> +
> +	mutex_lock(&dangling_memcgs_mutex);
> +
> +	list_for_each_entry(memcg, &dangling_memcgs, dead) {
> +		seq_printf(m, "%s:\n", memcg->memcg_name);
> +
> +		mem_cgroup_dangling_swap(memcg, m);
> +		mem_cgroup_dangling_kmem(memcg, m);
> +	}
> +
> +	mutex_unlock(&dangling_memcgs_mutex);
> +	return 0;
> +}
> +#endif
> +
>  static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
>  {
>  	int ret = -EINVAL;
> @@ -5831,6 +5961,14 @@ static struct cftype mem_cgroup_files[] = {
>  	},
>  #endif
>  #endif
> +
> +#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEMCG_SWAP)
> +	{
> +		.name = "dangling_memcgs",
> +		.read_seq_string = mem_cgroup_dangling_read,
> +		.flags = CFTYPE_ONLY_ON_ROOT,
> +	},
> +#endif
>  	{ },	/* terminate */
>  };
>  
> @@ -5933,6 +6071,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
>  	 * the cgroup_lock.
>  	 */
>  	disarm_static_keys(memcg);
> +
>  	if (size < PAGE_SIZE)
>  		kfree(memcg);
>  	else

damn me!

It's not the first time I've seen those extra newlines slipping through
the final private review and ending up in the final patch...

I wonder if there would be any value in having checkpatch checking for
those?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] memcg: debugging facility to access dangling memcgs.
  2012-11-22 10:36   ` Glauber Costa
@ 2012-11-22 13:53     ` Glauber Costa
  2012-11-22 14:02       ` Joe Perches
  0 siblings, 1 reply; 16+ messages in thread
From: Glauber Costa @ 2012-11-22 13:53 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, Andrew Morton, Johannes Weiner, Michal Hocko,
	kamezawa.hiroyu, Tejun Heo, apw, Joe Perches

On 11/22/2012 02:36 PM, Glauber Costa wrote:
>> @@ -5933,6 +6071,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
>> >  	 * the cgroup_lock.
>> >  	 */
>> >  	disarm_static_keys(memcg);
>> > +
>> >  	if (size < PAGE_SIZE)
>> >  		kfree(memcg);
>> >  	else

Joe,

since you removed the code from my former e-mail:

That one after "disarm_static_keys".

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] memcg: debugging facility to access dangling memcgs.
  2012-11-22 13:53     ` Glauber Costa
@ 2012-11-22 14:02       ` Joe Perches
  2012-11-22 15:02         ` Andy Whitcroft
  0 siblings, 1 reply; 16+ messages in thread
From: Joe Perches @ 2012-11-22 14:02 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, Andrew Morton, Johannes Weiner, Michal Hocko,
	kamezawa.hiroyu, Tejun Heo, apw

On Thu, 2012-11-22 at 17:53 +0400, Glauber Costa wrote:
> On 11/22/2012 02:36 PM, Glauber Costa wrote:
> >> @@ -5933,6 +6071,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
> >> >  	 * the cgroup_lock.
> >> >  	 */
> >> >  	disarm_static_keys(memcg);
> >> > +
> >> >  	if (size < PAGE_SIZE)
> >> >  		kfree(memcg);
> >> >  	else
> 
> Joe,
> 
> since you removed the code from my former e-mail:

Because you quoted a whole bunch of lines that
included newlines 

> That one after "disarm_static_keys".

Vertical spacing has many uses within a functional
block.  I don't see anything wrong with that one.

The one I suspected you meant was:

> +static inline void memcg_dangling_add(struct mem_cgroup *memcg)
> +{
> +
> +     memcg->memcg_name = kstrdup(cgroup_name(memcg->css.cgroup), GFP_KERNEL);

I think a blank line after an initial { is suspect.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] cgroup: helper do determine group name
  2012-11-22 10:29 ` [PATCH 1/2] cgroup: helper do determine group name Glauber Costa
@ 2012-11-22 14:32   ` Tejun Heo
  2012-11-23  8:55   ` Michal Hocko
  1 sibling, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2012-11-22 14:32 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, Andrew Morton, Johannes Weiner, Michal Hocko,
	kamezawa.hiroyu

On Thu, Nov 22, 2012 at 02:29:49PM +0400, Glauber Costa wrote:
> With more than one user, it is useful to have a helper function in the
> cgroup core to derive a group's name.
> 
> We'll just return a pointer, and it is not expected to get incredibly
> complicated. But it is useful to have it so we can abstract away the
> vfs relation from its users.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> CC: Tejun Heo <tj@kernel.org>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> ---
> Tejun:
> 
> I know the rcu is no longer necessary. I am using mhocko's tree,
> that doesn't seem to have your last stream of patches yet. If you
> approve the interface, we'll need a follow up on this to remove the
> rcu dereference of the dentry.

Yeap, looks good to me, but please add function comment on it.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] memcg: debugging facility to access dangling memcgs.
  2012-11-22 14:02       ` Joe Perches
@ 2012-11-22 15:02         ` Andy Whitcroft
  0 siblings, 0 replies; 16+ messages in thread
From: Andy Whitcroft @ 2012-11-22 15:02 UTC (permalink / raw)
  To: Joe Perches
  Cc: Glauber Costa, linux-mm, Andrew Morton, Johannes Weiner,
	Michal Hocko, kamezawa.hiroyu, Tejun Heo

On Thu, Nov 22, 2012 at 06:02:36AM -0800, Joe Perches wrote:
> On Thu, 2012-11-22 at 17:53 +0400, Glauber Costa wrote:
> > On 11/22/2012 02:36 PM, Glauber Costa wrote:
> > >> @@ -5933,6 +6071,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
> > >> >  	 * the cgroup_lock.
> > >> >  	 */
> > >> >  	disarm_static_keys(memcg);
> > >> > +
> > >> >  	if (size < PAGE_SIZE)
> > >> >  		kfree(memcg);
> > >> >  	else
> > 
> > Joe,
> > 
> > since you removed the code from my former e-mail:
> 
> Because you quoted a whole bunch of lines that
> included newlines 
> 
> > That one after "disarm_static_keys".
> 
> Vertical spacing has many uses within a functional
> block.  I don't see anything wrong with that one.
> 
> The one I suspected you meant was:
> 
> > +static inline void memcg_dangling_add(struct mem_cgroup *memcg)
> > +{
> > +
> > +     memcg->memcg_name = kstrdup(cgroup_name(memcg->css.cgroup), GFP_KERNEL);
> 
> I think a blank line after an initial { is suspect.


Yeah it is a bit odd to go changing the spacing in a function when you
arn't making any other change, but if the code is clearer with it so be
it.  The blank line at the start of a block is a little less obviously
ever useful.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] cgroup: helper do determine group name
  2012-11-22 10:29 ` [PATCH 1/2] cgroup: helper do determine group name Glauber Costa
  2012-11-22 14:32   ` Tejun Heo
@ 2012-11-23  8:55   ` Michal Hocko
  2012-11-23 10:36     ` Michal Hocko
  1 sibling, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2012-11-23  8:55 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, Andrew Morton, Johannes Weiner, kamezawa.hiroyu,
	Tejun Heo

On Thu 22-11-12 14:29:49, Glauber Costa wrote:
> With more than one user, it is useful to have a helper function in the
> cgroup core to derive a group's name.
> 
> We'll just return a pointer, and it is not expected to get incredibly
> complicated. But it is useful to have it so we can abstract away the
> vfs relation from its users.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> CC: Tejun Heo <tj@kernel.org>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>

Looks good to me in general. Minor comments bellow.
Anyway.
Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
> Tejun:
> 
> I know the rcu is no longer necessary. I am using mhocko's tree,
> that doesn't seem to have your last stream of patches yet.

Which patches are we talking about? Are they in a pullable (for -mm)
branch or I have to cherry-pick them?

> If you approve the interface, we'll need a follow up on this to remove
> the rcu dereference of the dentry.
> 
>  include/linux/cgroup.h |  1 +
>  kernel/cgroup.c        |  9 +++++++++
>  mm/memcontrol.c        | 11 ++++-------
>  3 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index a178a91..57c4ab1 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -401,6 +401,7 @@ int cgroup_rm_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts);
>  int cgroup_is_removed(const struct cgroup *cgrp);
>  
>  int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen);
> +extern const char *cgroup_name(const struct cgroup *cgrp);
>  
>  int cgroup_task_count(const struct cgroup *cgrp);
>  
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 3d68aad..d0d291e 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -1757,6 +1757,15 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
>  }
>  EXPORT_SYMBOL_GPL(cgroup_path);
>  

This expects css reference at caller, right. Please make it explicit
here in case somebody wants to use this somewhere else.
Besides that rcu_read_{un}lock are not necessary if you keep the
reference, right? The last dput happens only after the last css_put.

> +const char *cgroup_name(const struct cgroup *cgrp)
> +{
> +	struct dentry *dentry;
> +	rcu_read_lock();
> +	dentry = rcu_dereference_check(cgrp->dentry, cgroup_lock_is_held());
> +	rcu_read_unlock();
> +	return dentry->d_name.name;
> +}
> +
>  /*
>   * Control Group taskset
>   */
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e3d805f..05b87aa 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3141,16 +3141,13 @@ void mem_cgroup_destroy_cache(struct kmem_cache *cachep)
>  static char *memcg_cache_name(struct mem_cgroup *memcg, struct kmem_cache *s)
>  {
>  	char *name;
> -	struct dentry *dentry;
> +	const char *cgname;
>  
> -	rcu_read_lock();
> -	dentry = rcu_dereference(memcg->css.cgroup->dentry);
> -	rcu_read_unlock();
> -
> -	BUG_ON(dentry == NULL);
> +	cgname = cgroup_name(memcg->css.cgroup);
> +	BUG_ON(cgname == NULL);
>  
>  	name = kasprintf(GFP_KERNEL, "%s(%d:%s)", s->name,
> -			 memcg_cache_id(memcg), dentry->d_name.name);
> +			 memcg_cache_id(memcg), cgname);
>  
>  	return name;
>  }
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] memcg: debugging facility to access dangling memcgs.
  2012-11-22 10:29 ` [PATCH 2/2] memcg: debugging facility to access dangling memcgs Glauber Costa
  2012-11-22 10:36   ` Glauber Costa
@ 2012-11-23  9:20   ` Michal Hocko
  2012-11-23  9:33     ` Glauber Costa
  1 sibling, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2012-11-23  9:20 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, Andrew Morton, Johannes Weiner, kamezawa.hiroyu,
	Tejun Heo

On Thu 22-11-12 14:29:50, Glauber Costa wrote:
[...]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 05b87aa..46f7cfb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
[...]
> @@ -349,6 +366,33 @@ struct mem_cgroup {
>  #endif
>  };
>  
> +#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEMCG_SWAP)

Can we have a common config for this something like CONFIG_MEMCG_ASYNC_DESTROY
which would be selected if either of the two (or potentially others)
would be selected.
Also you are saying that the feature is only for debugging purposes so
it shouldn't be on by default probably.

> +static LIST_HEAD(dangling_memcgs);
> +static DEFINE_MUTEX(dangling_memcgs_mutex);
> +
> +static inline void memcg_dangling_free(struct mem_cgroup *memcg)
> +{
> +	mutex_lock(&dangling_memcgs_mutex);
> +	list_del(&memcg->dead);
> +	mutex_unlock(&dangling_memcgs_mutex);
> +	kfree(memcg->memcg_name);
> +}
> +
> +static inline void memcg_dangling_add(struct mem_cgroup *memcg)
> +{
> +
> +	memcg->memcg_name = kstrdup(cgroup_name(memcg->css.cgroup), GFP_KERNEL);

Who gets charged for this allocation? What if the allocation fails (not
that it would be probable but still...)?

> +
> +	INIT_LIST_HEAD(&memcg->dead);
> +	mutex_lock(&dangling_memcgs_mutex);
> +	list_add(&memcg->dead, &dangling_memcgs);
> +	mutex_unlock(&dangling_memcgs_mutex);
> +}
> +#else
> +static inline void memcg_dangling_free(struct mem_cgroup *memcg) {}
> +static inline void memcg_dangling_add(struct mem_cgroup *memcg) {}
> +#endif
> +
>  /* internal only representation about the status of kmem accounting. */
>  enum {
>  	KMEM_ACCOUNTED_ACTIVE = 0, /* accounted by this cgroup itself */
> @@ -4868,6 +4912,92 @@ static ssize_t mem_cgroup_read(struct cgroup *cont, struct cftype *cft,
>  	return simple_read_from_buffer(buf, nbytes, ppos, str, len);
>  }
>  
> +#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEMCG_SWAP)
> +static void
> +mem_cgroup_dangling_swap(struct mem_cgroup *memcg, struct seq_file *m)
> +{
> +#ifdef CONFIG_MEMCG_SWAP
> +	u64 kmem;
> +	u64 memsw;
> +
> +	/*
> +	 * kmem will also propagate here, so we are only interested in the
> +	 * difference.  See comment in mem_cgroup_reparent_charges for details.
> +	 *
> +	 * We could save this value for later consumption by kmem reports, but
> +	 * there is not a lot of problem if the figures differ slightly.
> +	 */
> +	kmem = res_counter_read_u64(&memcg->kmem, RES_USAGE);
> +	memsw = res_counter_read_u64(&memcg->memsw, RES_USAGE) - kmem;
> +	seq_printf(m, "\t%llu swap bytes\n", memsw);
> +#endif
> +}
> +
> +static void
> +mem_cgroup_dangling_kmem(struct mem_cgroup *memcg, struct seq_file *m)
> +{
> +#ifdef CONFIG_MEMCG_KMEM
> +	u64 kmem;
> +	struct memcg_cache_params *params;
> +
> +#ifdef CONFIG_INET
> +	struct tcp_memcontrol *tcp = &memcg->tcp_mem;
> +	s64 tcp_socks;
> +	u64 tcp_bytes;
> +
> +	tcp_socks = percpu_counter_sum_positive(&tcp->tcp_sockets_allocated);
> +	tcp_bytes = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_USAGE);
> +	seq_printf(m, "\t%llu tcp bytes, in %lld sockets\n",
> +		   tcp_bytes, tcp_socks);
> +
> +#endif

Looks like this deserves its own function rather than this ifdef games
inside functions.

> +
> +	kmem = res_counter_read_u64(&memcg->kmem, RES_USAGE);
> +	seq_printf(m, "\t%llu kmem bytes", kmem);
> +
> +	/* list below may not be initialized, so not even try */
> +	if (!kmem)
> +		return;
> +
> +	seq_printf(m, " in caches");
> +	mutex_lock(&memcg->slab_caches_mutex);
> +	list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
> +			struct kmem_cache *s = memcg_params_to_cache(params);
> +
> +		seq_printf(m, " %s", s->name);
> +	}
> +	mutex_unlock(&memcg->slab_caches_mutex);
> +	seq_printf(m, "\n");
> +#endif
> +}
> +
> +/*
> + * After a memcg is destroyed, it may still be kept around in memory.
> + * Currently, the two main reasons for it are swap entries, and kernel memory.
> + * Because they will be freed assynchronously, they will pin the memcg structure
> + * and its resources until the last reference goes away.
> + *
> + * This root-only file will show information about which users
> + */
> +static int mem_cgroup_dangling_read(struct cgroup *cont, struct cftype *cft,
> +					struct seq_file *m)
> +{
> +	struct mem_cgroup *memcg;
> +
> +	mutex_lock(&dangling_memcgs_mutex);
> +
> +	list_for_each_entry(memcg, &dangling_memcgs, dead) {
> +		seq_printf(m, "%s:\n", memcg->memcg_name);

Hmm, we have lost the cgroup path so know there is something called A
but we do not know whether it was A/A A/B/A A/......../A (aka we have
lost the hierarchy information and a group with the same name might
exist which can be really confusing).

That being said I would prefer if this was covered by a debugging
option, off by default.
It would be better if we could preserve the whole group name (something
like cgroup_path does) but I guess this would break caches names, right?
And finally it would be really nice if you described what is the
exported information good for. Can I somehow change the current state
(e.g. force freeing those objects so that the memcg can finally pass out
in piece)?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] memcg: debugging facility to access dangling memcgs.
  2012-11-23  9:20   ` Michal Hocko
@ 2012-11-23  9:33     ` Glauber Costa
  2012-11-23 10:33       ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: Glauber Costa @ 2012-11-23  9:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Johannes Weiner, kamezawa.hiroyu,
	Tejun Heo

On 11/23/2012 01:20 PM, Michal Hocko wrote:
> On Thu 22-11-12 14:29:50, Glauber Costa wrote:
> [...]
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 05b87aa..46f7cfb 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
> [...]
>> @@ -349,6 +366,33 @@ struct mem_cgroup {
>>  #endif
>>  };
>>  
>> +#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEMCG_SWAP)
> 
> Can we have a common config for this something like CONFIG_MEMCG_ASYNC_DESTROY
> which would be selected if either of the two (or potentially others)
> would be selected.
> Also you are saying that the feature is only for debugging purposes so
> it shouldn't be on by default probably.
> 

I personally wouldn't mind. But the big value I see from it is basically
being able to turn it off. For all the rest, we would have to wrap it
under one of those config options anyway...

>> +static LIST_HEAD(dangling_memcgs);
>> +static DEFINE_MUTEX(dangling_memcgs_mutex);
>> +
>> +static inline void memcg_dangling_free(struct mem_cgroup *memcg)
>> +{
>> +	mutex_lock(&dangling_memcgs_mutex);
>> +	list_del(&memcg->dead);
>> +	mutex_unlock(&dangling_memcgs_mutex);
>> +	kfree(memcg->memcg_name);
>> +}
>> +
>> +static inline void memcg_dangling_add(struct mem_cgroup *memcg)
>> +{
>> +
>> +	memcg->memcg_name = kstrdup(cgroup_name(memcg->css.cgroup), GFP_KERNEL);
> 
> Who gets charged for this allocation? What if the allocation fails (not
> that it would be probable but still...)?
> 

Well, yeah, the lack of test is my bad - sorry.

As for charging, This will be automatically charged to whoever calls
mem_cgroup_destroy(). It is certainly not in the cgroup being destroyed,
otherwise it would have a task and destruction would not be possible.

But now that you mention, maybe it would be better to get it to the root
cgroup every time? This way this can't itself hold anybody in memory.



>> +
>> +	INIT_LIST_HEAD(&memcg->dead);
>> +	mutex_lock(&dangling_memcgs_mutex);
>> +	list_add(&memcg->dead, &dangling_memcgs);
>> +	mutex_unlock(&dangling_memcgs_mutex);
>> +}
>> +#else
>> +static inline void memcg_dangling_free(struct mem_cgroup *memcg) {}
>> +static inline void memcg_dangling_add(struct mem_cgroup *memcg) {}
>> +#endif
>> +
>>  /* internal only representation about the status of kmem accounting. */
>>  enum {
>>  	KMEM_ACCOUNTED_ACTIVE = 0, /* accounted by this cgroup itself */
>> @@ -4868,6 +4912,92 @@ static ssize_t mem_cgroup_read(struct cgroup *cont, struct cftype *cft,
>>  	return simple_read_from_buffer(buf, nbytes, ppos, str, len);
>>  }
>>  
>> +#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEMCG_SWAP)
>> +static void
>> +mem_cgroup_dangling_swap(struct mem_cgroup *memcg, struct seq_file *m)
>> +{
>> +#ifdef CONFIG_MEMCG_SWAP
>> +	u64 kmem;
>> +	u64 memsw;
>> +
>> +	/*
>> +	 * kmem will also propagate here, so we are only interested in the
>> +	 * difference.  See comment in mem_cgroup_reparent_charges for details.
>> +	 *
>> +	 * We could save this value for later consumption by kmem reports, but
>> +	 * there is not a lot of problem if the figures differ slightly.
>> +	 */
>> +	kmem = res_counter_read_u64(&memcg->kmem, RES_USAGE);
>> +	memsw = res_counter_read_u64(&memcg->memsw, RES_USAGE) - kmem;
>> +	seq_printf(m, "\t%llu swap bytes\n", memsw);
>> +#endif
>> +}
>> +
>> +static void
>> +mem_cgroup_dangling_kmem(struct mem_cgroup *memcg, struct seq_file *m)
>> +{
>> +#ifdef CONFIG_MEMCG_KMEM
>> +	u64 kmem;
>> +	struct memcg_cache_params *params;
>> +
>> +#ifdef CONFIG_INET
>> +	struct tcp_memcontrol *tcp = &memcg->tcp_mem;
>> +	s64 tcp_socks;
>> +	u64 tcp_bytes;
>> +
>> +	tcp_socks = percpu_counter_sum_positive(&tcp->tcp_sockets_allocated);
>> +	tcp_bytes = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_USAGE);
>> +	seq_printf(m, "\t%llu tcp bytes, in %lld sockets\n",
>> +		   tcp_bytes, tcp_socks);
>> +
>> +#endif
> 
> Looks like this deserves its own function rather than this ifdef games
> inside functions.
> 
ok...

>> +
>> +	kmem = res_counter_read_u64(&memcg->kmem, RES_USAGE);
>> +	seq_printf(m, "\t%llu kmem bytes", kmem);
>> +
>> +	/* list below may not be initialized, so not even try */
>> +	if (!kmem)
>> +		return;
>> +
>> +	seq_printf(m, " in caches");
>> +	mutex_lock(&memcg->slab_caches_mutex);
>> +	list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
>> +			struct kmem_cache *s = memcg_params_to_cache(params);
>> +
>> +		seq_printf(m, " %s", s->name);
>> +	}
>> +	mutex_unlock(&memcg->slab_caches_mutex);
>> +	seq_printf(m, "\n");
>> +#endif
>> +}
>> +
>> +/*
>> + * After a memcg is destroyed, it may still be kept around in memory.
>> + * Currently, the two main reasons for it are swap entries, and kernel memory.
>> + * Because they will be freed assynchronously, they will pin the memcg structure
>> + * and its resources until the last reference goes away.
>> + *
>> + * This root-only file will show information about which users
>> + */
>> +static int mem_cgroup_dangling_read(struct cgroup *cont, struct cftype *cft,
>> +					struct seq_file *m)
>> +{
>> +	struct mem_cgroup *memcg;
>> +
>> +	mutex_lock(&dangling_memcgs_mutex);
>> +
>> +	list_for_each_entry(memcg, &dangling_memcgs, dead) {
>> +		seq_printf(m, "%s:\n", memcg->memcg_name);
> 
> Hmm, we have lost the cgroup path so know there is something called A
> but we do not know whether it was A/A A/B/A A/......../A (aka we have
> lost the hierarchy information and a group with the same name might
> exist which can be really confusing).
> 
Since all I store in the now-dead cgroup structure is a pointer, I guess
I could store the whole path...

> That being said I would prefer if this was covered by a debugging
> option, off by default.

Ok.

> It would be better if we could preserve the whole group name (something
> like cgroup_path does) but I guess this would break caches names, right?

I can't see how it would influence the cache names either way. I mean -
the effect of that would be that patches 1 and 2 here would be totally
independent, since we would be using cgroup_path instead of cgroup_name
in this facility.


> And finally it would be really nice if you described what is the
> exported information good for. Can I somehow change the current state
> (e.g. force freeing those objects so that the memcg can finally pass out
> in piece)?
> 
I am open, but I would personally like to have this as a view-only
interface, just so we suspect a leak occurs, we can easily see what is
the dead memcg contribution to it. It shows you where the data come
from, and if you want to free it, you go search for subsystem-specific
ways to force a free should you want.

I really can't see anything good coming from being able to force changes
to the kernel from this interface.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] memcg: debugging facility to access dangling memcgs.
  2012-11-23  9:33     ` Glauber Costa
@ 2012-11-23 10:33       ` Michal Hocko
  2012-11-23 10:37         ` Glauber Costa
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2012-11-23 10:33 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, Andrew Morton, Johannes Weiner, kamezawa.hiroyu,
	Tejun Heo

On Fri 23-11-12 13:33:36, Glauber Costa wrote:
> On 11/23/2012 01:20 PM, Michal Hocko wrote:
> > On Thu 22-11-12 14:29:50, Glauber Costa wrote:
> > [...]
> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >> index 05b87aa..46f7cfb 100644
> >> --- a/mm/memcontrol.c
> >> +++ b/mm/memcontrol.c
> > [...]
> >> @@ -349,6 +366,33 @@ struct mem_cgroup {
> >>  #endif
> >>  };
> >>  
> >> +#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEMCG_SWAP)
> > 
> > Can we have a common config for this something like CONFIG_MEMCG_ASYNC_DESTROY
> > which would be selected if either of the two (or potentially others)
> > would be selected.
> > Also you are saying that the feature is only for debugging purposes so
> > it shouldn't be on by default probably.
> > 
> 
> I personally wouldn't mind. But the big value I see from it is basically
> being able to turn it off. For all the rest, we would have to wrap it
> under one of those config options anyway...

Sure you would need to habe mem_cgroup_dangling_FOO wrapped by the
correct one anyway but that still need to live inside a bigger ifdef and
naming all the FOO is awkward. Besides that one
CONFIG_MEMCG_ASYNC_DESTROY_DEBUG could have a Kconfig entry and so be
enabled separately.

> >> +static LIST_HEAD(dangling_memcgs);
> >> +static DEFINE_MUTEX(dangling_memcgs_mutex);
> >> +
> >> +static inline void memcg_dangling_free(struct mem_cgroup *memcg)
> >> +{
> >> +	mutex_lock(&dangling_memcgs_mutex);
> >> +	list_del(&memcg->dead);
> >> +	mutex_unlock(&dangling_memcgs_mutex);
> >> +	kfree(memcg->memcg_name);
> >> +}
> >> +
> >> +static inline void memcg_dangling_add(struct mem_cgroup *memcg)
> >> +{
> >> +
> >> +	memcg->memcg_name = kstrdup(cgroup_name(memcg->css.cgroup), GFP_KERNEL);
> > 
> > Who gets charged for this allocation? What if the allocation fails (not
> > that it would be probable but still...)?
> > 
> 
> Well, yeah, the lack of test is my bad - sorry.
> 
> As for charging, This will be automatically charged to whoever calls
> mem_cgroup_destroy().

Which can be anybody as it depends e.g. on css reference counting.

> It is certainly not in the cgroup being destroyed, otherwise it would
> have a task and destruction would not be possible.
> 
> But now that you mention, maybe it would be better to get it to the root
> cgroup every time? This way this can't itself hold anybody in memory.

yes, root cgroup sounds good.

[...]
> > It would be better if we could preserve the whole group name (something
> > like cgroup_path does) but I guess this would break caches names, right?
> 
> I can't see how it would influence the cache names either way. I mean -
> the effect of that would be that patches 1 and 2 here would be totally
> independent, since we would be using cgroup_path instead of cgroup_name
> in this facility.

Ohh, you are right you are using kmem_cache name for those. Sorry for
the confusion
 
> > And finally it would be really nice if you described what is the
> > exported information good for. Can I somehow change the current state
> > (e.g. force freeing those objects so that the memcg can finally pass out
> > in piece)?
> > 
> I am open, but I would personally like to have this as a view-only
> interface,

And I was not proposing to make it RW. I am just missing a description
that would explain: "Ohh well, the file says there are some dangling
memcgs. Should I report a bug or sue somebody or just have a coffee and
wait some more?"

> just so we suspect a leak occurs, we can easily see what is
> the dead memcg contribution to it. It shows you where the data come
> from, and if you want to free it, you go search for subsystem-specific
> ways to force a free should you want.

Yes, I can imagine its usefulness for developers but I do not see much
of an use for admins yet. So I am a bit hesitant for this being on by
default.

> I really can't see anything good coming from being able to force changes
> to the kernel from this interface.

Agreed. Definitely not from this interface.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] cgroup: helper do determine group name
  2012-11-23  8:55   ` Michal Hocko
@ 2012-11-23 10:36     ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2012-11-23 10:36 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, Andrew Morton, Johannes Weiner, kamezawa.hiroyu,
	Tejun Heo

On Fri 23-11-12 09:55:32, Michal Hocko wrote:
[...]
> Besides that rcu_read_{un}lock are not necessary if you keep the
> reference, right? The last dput happens only after the last css_put.

Stupid me. And of course rcu_dereference_check would tell me the truth
 
> > +const char *cgroup_name(const struct cgroup *cgrp)
> > +{
> > +	struct dentry *dentry;
> > +	rcu_read_lock();
> > +	dentry = rcu_dereference_check(cgrp->dentry, cgroup_lock_is_held());
> > +	rcu_read_unlock();
> > +	return dentry->d_name.name;
> > +}
> > +
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] memcg: debugging facility to access dangling memcgs.
  2012-11-23 10:33       ` Michal Hocko
@ 2012-11-23 10:37         ` Glauber Costa
  2012-11-23 10:51           ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: Glauber Costa @ 2012-11-23 10:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Johannes Weiner, kamezawa.hiroyu,
	Tejun Heo

On 11/23/2012 02:33 PM, Michal Hocko wrote:
> On Fri 23-11-12 13:33:36, Glauber Costa wrote:
>> On 11/23/2012 01:20 PM, Michal Hocko wrote:
>>> On Thu 22-11-12 14:29:50, Glauber Costa wrote:
>>> [...]
>>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>>> index 05b87aa..46f7cfb 100644
>>>> --- a/mm/memcontrol.c
>>>> +++ b/mm/memcontrol.c
>>> [...]
>>>> @@ -349,6 +366,33 @@ struct mem_cgroup {
>>>>  #endif
>>>>  };
>>>>  
>>>> +#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEMCG_SWAP)
>>>
>>> Can we have a common config for this something like CONFIG_MEMCG_ASYNC_DESTROY
>>> which would be selected if either of the two (or potentially others)
>>> would be selected.
>>> Also you are saying that the feature is only for debugging purposes so
>>> it shouldn't be on by default probably.
>>>
>>
>> I personally wouldn't mind. But the big value I see from it is basically
>> being able to turn it off. For all the rest, we would have to wrap it
>> under one of those config options anyway...
> 
> Sure you would need to habe mem_cgroup_dangling_FOO wrapped by the
> correct one anyway but that still need to live inside a bigger ifdef and
> naming all the FOO is awkward. Besides that one
> CONFIG_MEMCG_ASYNC_DESTROY_DEBUG could have a Kconfig entry and so be
> enabled separately.
> 

How about a more general memcg debug option like CONFIG_MEMCG_DEBUG?
Do you foresee more facilities we could enable under this?



> Ohh, you are right you are using kmem_cache name for those. Sorry for
> the confusion
>  
>>> And finally it would be really nice if you described what is the
>>> exported information good for. Can I somehow change the current state
>>> (e.g. force freeing those objects so that the memcg can finally pass out
>>> in piece)?
>>>
>> I am open, but I would personally like to have this as a view-only
>> interface,
> 
> And I was not proposing to make it RW. I am just missing a description
> that would explain: "Ohh well, the file says there are some dangling
> memcgs. Should I report a bug or sue somebody or just have a coffee and
> wait some more?"
> 

People should pay me beer if the number of pending caches is odd, and
pay you beer if the number is even. If the number is 0, we both get it.

>> just so we suspect a leak occurs, we can easily see what is
>> the dead memcg contribution to it. It shows you where the data come
>> from, and if you want to free it, you go search for subsystem-specific
>> ways to force a free should you want.
> 
> Yes, I can imagine its usefulness for developers but I do not see much
> of an use for admins yet. So I am a bit hesitant for this being on by
> default.
> 
Fully agreed. I am implementing this because Kame suggested. I promptly
agreed because I remembered how many times I asked myself "Who is
holding this?" and had to go put some printks all over...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] memcg: debugging facility to access dangling memcgs.
  2012-11-23 10:37         ` Glauber Costa
@ 2012-11-23 10:51           ` Michal Hocko
  2012-11-23 14:00             ` Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2012-11-23 10:51 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, Andrew Morton, Johannes Weiner, kamezawa.hiroyu,
	Tejun Heo

On Fri 23-11-12 14:37:05, Glauber Costa wrote:
> On 11/23/2012 02:33 PM, Michal Hocko wrote:
> > On Fri 23-11-12 13:33:36, Glauber Costa wrote:
> >> On 11/23/2012 01:20 PM, Michal Hocko wrote:
> >>> On Thu 22-11-12 14:29:50, Glauber Costa wrote:
> >>> [...]
> >>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >>>> index 05b87aa..46f7cfb 100644
> >>>> --- a/mm/memcontrol.c
> >>>> +++ b/mm/memcontrol.c
> >>> [...]
> >>>> @@ -349,6 +366,33 @@ struct mem_cgroup {
> >>>>  #endif
> >>>>  };
> >>>>  
> >>>> +#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEMCG_SWAP)
> >>>
> >>> Can we have a common config for this something like CONFIG_MEMCG_ASYNC_DESTROY
> >>> which would be selected if either of the two (or potentially others)
> >>> would be selected.
> >>> Also you are saying that the feature is only for debugging purposes so
> >>> it shouldn't be on by default probably.
> >>>
> >>
> >> I personally wouldn't mind. But the big value I see from it is basically
> >> being able to turn it off. For all the rest, we would have to wrap it
> >> under one of those config options anyway...
> > 
> > Sure you would need to habe mem_cgroup_dangling_FOO wrapped by the
> > correct one anyway but that still need to live inside a bigger ifdef and
> > naming all the FOO is awkward. Besides that one
> > CONFIG_MEMCG_ASYNC_DESTROY_DEBUG could have a Kconfig entry and so be
> > enabled separately.
> > 
> 
> How about a more general memcg debug option like CONFIG_MEMCG_DEBUG?
> Do you foresee more facilities we could enable under this?

This sounds to generic and it doesn't say what kind of debugging you
get. CONFIG_MEMCG_ASYNC_DESTROY_DEBUG on the other hand can be pretty
explicit that it is aimed at debugging memory leaks caused by async
memcg destruction.

> > Ohh, you are right you are using kmem_cache name for those. Sorry for
> > the confusion
> >  
> >>> And finally it would be really nice if you described what is the
> >>> exported information good for. Can I somehow change the current state
> >>> (e.g. force freeing those objects so that the memcg can finally pass out
> >>> in piece)?
> >>>
> >> I am open, but I would personally like to have this as a view-only
> >> interface,
> > 
> > And I was not proposing to make it RW. I am just missing a description
> > that would explain: "Ohh well, the file says there are some dangling
> > memcgs. Should I report a bug or sue somebody or just have a coffee and
> > wait some more?"
> > 
> 
> People should pay me beer if the number of pending caches is odd, and
> pay you beer if the number is even. If the number is 0, we both get it.

Deal

> >> just so we suspect a leak occurs, we can easily see what is
> >> the dead memcg contribution to it. It shows you where the data come
> >> from, and if you want to free it, you go search for subsystem-specific
> >> ways to force a free should you want.
> > 
> > Yes, I can imagine its usefulness for developers but I do not see much
> > of an use for admins yet. So I am a bit hesitant for this being on by
> > default.
> > 
> Fully agreed. I am implementing this because Kame suggested. I promptly
> agreed because I remembered how many times I asked myself "Who is
> holding this?" and had to go put some printks all over...

So please make it configurable, off by default and be explicit about its
usefulness.

Thanks
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] memcg: debugging facility to access dangling memcgs.
  2012-11-23 10:51           ` Michal Hocko
@ 2012-11-23 14:00             ` Tejun Heo
  0 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2012-11-23 14:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Glauber Costa, linux-mm, Andrew Morton, Johannes Weiner,
	kamezawa.hiroyu

On Fri, Nov 23, 2012 at 11:51:54AM +0100, Michal Hocko wrote:
> > Fully agreed. I am implementing this because Kame suggested. I promptly
> > agreed because I remembered how many times I asked myself "Who is
> > holding this?" and had to go put some printks all over...
> 
> So please make it configurable, off by default and be explicit about its
> usefulness.

And please make the file name explicitly indicate that it's a debug
thing, so that someone doesn't grow a messy dependency on it for
whatever reason.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-11-23 14:00 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-22 10:29 [PATCH 0/2] Show information about dangling memcgs Glauber Costa
2012-11-22 10:29 ` [PATCH 1/2] cgroup: helper do determine group name Glauber Costa
2012-11-22 14:32   ` Tejun Heo
2012-11-23  8:55   ` Michal Hocko
2012-11-23 10:36     ` Michal Hocko
2012-11-22 10:29 ` [PATCH 2/2] memcg: debugging facility to access dangling memcgs Glauber Costa
2012-11-22 10:36   ` Glauber Costa
2012-11-22 13:53     ` Glauber Costa
2012-11-22 14:02       ` Joe Perches
2012-11-22 15:02         ` Andy Whitcroft
2012-11-23  9:20   ` Michal Hocko
2012-11-23  9:33     ` Glauber Costa
2012-11-23 10:33       ` Michal Hocko
2012-11-23 10:37         ` Glauber Costa
2012-11-23 10:51           ` Michal Hocko
2012-11-23 14:00             ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).