public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes.
@ 2012-03-29  6:22 KAMEZAWA Hiroyuki
  2012-03-29  6:26 ` [BUGFIX][PATCH 1/3] memcg/tcp : fix to see memcg's use_hierarchy in tcp memcontrol KAMEZAWA Hiroyuki
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  6:22 UTC (permalink / raw)
  To: Glauber Costa, Linux Kernel; +Cc: Andrew Morton, davem


This series is 3 bugfixes for memcg's kmem.tcp memory controller.
Maybe this should go via network tree. (CC akpm for noticing an ugly change in res_counter.)

All patches are generated onto today linus's git tree.

Brief description:

Patch  1/3 .... tcp memcontrol doesn't see memcg's use_hierarchy value. Fix it.

Patch  2/3 and 3/3 .... 
                Because tcp memcontrol doesn't do any accounting when limit=RESOUCE_MAX,
	        there will be account leakage when limit is changed. This can trigger
                WARN_ON() in res_counter which checks usage >= 0.

Patch 2/3  .... don't call static_key_slow_dec(&memcg_socket_limit_enabled) until
                a cgroup under accounted is destroyed.

Patch 3/3  .... add res_counter_uncharge_nowarn() to ignore leakage.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [BUGFIX][PATCH 1/3] memcg/tcp : fix to see memcg's use_hierarchy in tcp memcontrol
  2012-03-29  6:22 [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes KAMEZAWA Hiroyuki
@ 2012-03-29  6:26 ` KAMEZAWA Hiroyuki
  2012-03-29  6:29 ` [BUGFIX][PATCH 2/3] memcg/tcp: fix static_branch handling KAMEZAWA Hiroyuki
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  6:26 UTC (permalink / raw)
  To: Glauber Costa, Linux Kernel; +Cc: Andrew Morton, davem

Now, tcp memory control cgroup ignores memcg's use_hierarchy value
and act as use_hierarchy=1 always. After this patch, tcp memcontrol will
work as memcg is designed.

Note:
  I know there is a discussion to remove use_hierarchy but this
  is BUG, now.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/memcontrol.h |    3 +++
 mm/memcontrol.c            |    5 +++++
 net/ipv4/tcp_memcontrol.c  |    2 +-
 3 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index f94efd2..e116b7c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -199,6 +199,9 @@ void mem_cgroup_split_huge_fixup(struct page *head);
 bool mem_cgroup_bad_page_check(struct page *page);
 void mem_cgroup_print_bad_page(struct page *page);
 #endif
+
+bool mem_cgroup_use_hierarchy(struct mem_cgroup *memcg);
+
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct mem_cgroup;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7d698df..467881f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -867,6 +867,11 @@ struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm)
 	return memcg;
 }
 
+bool mem_cgroup_use_hierarchy(struct mem_cgroup *memcg)
+{
+	return memcg->use_hierarchy;
+}
+
 /**
  * mem_cgroup_iter - iterate over memory cgroup hierarchy
  * @root: hierarchy root
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index e795272..32764a6 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -75,7 +75,7 @@ int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
 	tcp->tcp_memory_pressure = 0;
 
 	parent_cg = tcp_prot.proto_cgroup(parent);
-	if (parent_cg)
+	if (parent_cg && mem_cgroup_use_hierarchy(parent))
 		res_parent = parent_cg->memory_allocated;
 
 	res_counter_init(&tcp->tcp_memory_allocated, res_parent);
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [BUGFIX][PATCH 2/3] memcg/tcp: fix static_branch handling
  2012-03-29  6:22 [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes KAMEZAWA Hiroyuki
  2012-03-29  6:26 ` [BUGFIX][PATCH 1/3] memcg/tcp : fix to see memcg's use_hierarchy in tcp memcontrol KAMEZAWA Hiroyuki
@ 2012-03-29  6:29 ` KAMEZAWA Hiroyuki
  2012-03-29  6:31 ` [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started KAMEZAWA Hiroyuki
  2012-03-29  6:51 ` [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes David Miller
  3 siblings, 0 replies; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  6:29 UTC (permalink / raw)
  To: Glauber Costa, Linux Kernel; +Cc: Andrew Morton, davem

tcp memcontrol uses static_branch to optimize limit=RESOURCE_MAX case.
If all cgroup's limit=RESOUCE_MAX, resource usage is not accounted.
But it's buggy now.

For example, do following
 # while sleep 1;do
   echo 9223372036854775807 > /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
   echo 300M > /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
   done

and run network application under A. tcp's usage is sometimes accounted
and sometimes not accounted because of frequent changes of static_branch.
Then, finally, you can see broken tcp.usage_in_bytes.
WARN_ON() is printed because res_counter->usage goes below 0.
==
kernel: ------------[ cut here ]----------
kernel: WARNING: at kernel/res_counter.c:96 res_counter_uncharge_locked+0x37/0x40()
 <snip>
kernel: Pid: 17753, comm: bash Tainted: G  W    3.3.0+ #99
kernel: Call Trace:
kernel: <IRQ>  [<ffffffff8104cc9f>] warn_slowpath_common+0x7f/0xc0
kernel: [<ffffffff810d7e88>] ? rb_reserve__next_event+0x68/0x470
kernel: [<ffffffff8104ccfa>] warn_slowpath_null+0x1a/0x20
kernel: [<ffffffff810b4e37>] res_counter_uncharge_locked+0x37/0x40
...
==

This patch removes static_branch_slow_dec() at changing res_counter's
limit to RESOUCE_MAX. By this, once accounting started, the accountting
will continue until the cgroup is destroyed.

I think this will not be problem in real use.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/net/tcp_memcontrol.h |    1 +
 net/ipv4/tcp_memcontrol.c    |   24 ++++++++++++++++++------
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/include/net/tcp_memcontrol.h b/include/net/tcp_memcontrol.h
index 48410ff..f47e3c7 100644
--- a/include/net/tcp_memcontrol.h
+++ b/include/net/tcp_memcontrol.h
@@ -9,6 +9,7 @@ struct tcp_memcontrol {
 	/* those two are read-mostly, leave them at the end */
 	long tcp_prot_mem[3];
 	int tcp_memory_pressure;
+	bool accounting;
 };
 
 struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg);
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index 32764a6..cd0b47d 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -49,6 +49,20 @@ static void memcg_tcp_enter_memory_pressure(struct sock *sk)
 }
 EXPORT_SYMBOL(memcg_tcp_enter_memory_pressure);
 
+static void tcp_start_accounting(struct tcp_memcontrol *tcp)
+{
+	if (tcp->accounting)
+		return;
+	tcp->accounting = true;
+	static_key_slow_inc(&memcg_socket_limit_enabled);
+}
+
+static void tcp_end_accounting(struct tcp_memcontrol *tcp)
+{
+	if (tcp->accounting)
+		static_key_slow_dec(&memcg_socket_limit_enabled);
+}
+
 int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
 {
 	/*
@@ -73,6 +87,7 @@ int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
 	tcp->tcp_prot_mem[1] = net->ipv4.sysctl_tcp_mem[1];
 	tcp->tcp_prot_mem[2] = net->ipv4.sysctl_tcp_mem[2];
 	tcp->tcp_memory_pressure = 0;
+	tcp->accounting = false;
 
 	parent_cg = tcp_prot.proto_cgroup(parent);
 	if (parent_cg && mem_cgroup_use_hierarchy(parent))
@@ -110,8 +125,7 @@ void tcp_destroy_cgroup(struct cgroup *cgrp)
 
 	val = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_LIMIT);
 
-	if (val != RESOURCE_MAX)
-		static_key_slow_dec(&memcg_socket_limit_enabled);
+	tcp_end_accounting(tcp);
 }
 EXPORT_SYMBOL(tcp_destroy_cgroup);
 
@@ -142,10 +156,8 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val)
 		tcp->tcp_prot_mem[i] = min_t(long, val >> PAGE_SHIFT,
 					     net->ipv4.sysctl_tcp_mem[i]);
 
-	if (val == RESOURCE_MAX && old_lim != RESOURCE_MAX)
-		static_key_slow_dec(&memcg_socket_limit_enabled);
-	else if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX)
-		static_key_slow_inc(&memcg_socket_limit_enabled);
+	if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX)
+		tcp_start_accounting(tcp);
 
 	return 0;
 }
-- 
1.7.4.1




^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started.
  2012-03-29  6:22 [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes KAMEZAWA Hiroyuki
  2012-03-29  6:26 ` [BUGFIX][PATCH 1/3] memcg/tcp : fix to see memcg's use_hierarchy in tcp memcontrol KAMEZAWA Hiroyuki
  2012-03-29  6:29 ` [BUGFIX][PATCH 2/3] memcg/tcp: fix static_branch handling KAMEZAWA Hiroyuki
@ 2012-03-29  6:31 ` KAMEZAWA Hiroyuki
  2012-03-29  6:51 ` [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes David Miller
  3 siblings, 0 replies; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  6:31 UTC (permalink / raw)
  To: Glauber Costa, Linux Kernel; +Cc: Andrew Morton, davem

tcp memcontrol starts accouting after res->limit is set. So, if a sockets
starts before setting res->limit, there are already used resource.
After setting res->limit, the resource will be uncharged and make res_counter
below 0. This causes warning.

This patch fixes that by adding res_counter_uncharge_nowarn() and ignore the usage.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/res_counter.h |    2 ++
 include/net/sock.h          |    3 ++-
 kernel/res_counter.c        |   18 ++++++++++++++++++
 3 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index da81af0..e081948 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -134,6 +134,8 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
 
 void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
 void res_counter_uncharge(struct res_counter *counter, unsigned long val);
+void res_counter_uncharge_nowarn(struct res_counter *counter,
+				unsigned long val);
 
 /**
  * res_counter_margin - calculate chargeable space of a counter
diff --git a/include/net/sock.h b/include/net/sock.h
index a6ba1f8..a1b3f4802 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1048,7 +1048,8 @@ static inline void memcg_memory_allocated_add(struct cg_proto *prot,
 static inline void memcg_memory_allocated_sub(struct cg_proto *prot,
 					      unsigned long amt)
 {
-	res_counter_uncharge(prot->memory_allocated, amt << PAGE_SHIFT);
+	res_counter_uncharge_nowarn(prot->memory_allocated,
+				amt << PAGE_SHIFT);
 }
 
 static inline u64 memcg_memory_allocated_read(struct cg_proto *prot)
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index d508363..2bb01ac 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -113,6 +113,24 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
 	local_irq_restore(flags);
 }
 
+void res_counter_uncharge_nowarn(struct res_counter *counter,
+					unsigned long val)
+{
+	struct res_counter *c;
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	for (c = counter; c != NULL; c = c->parent) {
+		spin_lock(&c->lock);
+		if (c->usage < val)
+			val = c->usage;
+		res_counter_uncharge_locked(c, val);
+		spin_unlock(&c->lock);
+	}
+	local_irq_restore(flags);
+}
+
 
 static inline unsigned long long *
 res_counter_member(struct res_counter *counter, int member)
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes.
  2012-03-29  6:22 [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes KAMEZAWA Hiroyuki
                   ` (2 preceding siblings ...)
  2012-03-29  6:31 ` [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started KAMEZAWA Hiroyuki
@ 2012-03-29  6:51 ` David Miller
  2012-03-29  6:53   ` KAMEZAWA Hiroyuki
  3 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2012-03-29  6:51 UTC (permalink / raw)
  To: kamezawa.hiroyu; +Cc: glommer, linux-kernel, akpm


Please post networking patches to netdev@vger.kernel.org, most of
the networking developers do not read linux-kernel and also
you need to post it to netdev to get it queued up in our networking
patchwork queue.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes.
  2012-03-29  6:51 ` [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes David Miller
@ 2012-03-29  6:53   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  6:53 UTC (permalink / raw)
  To: David Miller; +Cc: glommer, linux-kernel, akpm

(2012/03/29 15:51), David Miller wrote:

> 
> Please post networking patches to netdev@vger.kernel.org, most of
> the networking developers do not read linux-kernel and also
> you need to post it to netdev to get it queued up in our networking
> patchwork queue.
> 


Ah, thank you. I'll do again. 

Regards,
-Kame


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-03-29  6:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-29  6:22 [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes KAMEZAWA Hiroyuki
2012-03-29  6:26 ` [BUGFIX][PATCH 1/3] memcg/tcp : fix to see memcg's use_hierarchy in tcp memcontrol KAMEZAWA Hiroyuki
2012-03-29  6:29 ` [BUGFIX][PATCH 2/3] memcg/tcp: fix static_branch handling KAMEZAWA Hiroyuki
2012-03-29  6:31 ` [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started KAMEZAWA Hiroyuki
2012-03-29  6:51 ` [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes David Miller
2012-03-29  6:53   ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox