* [BUGFIX][PATCH 1/3] memcg/tcp : fix to see memcg's use_hierarchy in tcp memcontrol
2012-03-29 6:22 [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes KAMEZAWA Hiroyuki
@ 2012-03-29 6:26 ` KAMEZAWA Hiroyuki
2012-03-29 6:29 ` [BUGFIX][PATCH 2/3] memcg/tcp: fix static_branch handling KAMEZAWA Hiroyuki
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29 6:26 UTC (permalink / raw)
To: Glauber Costa, Linux Kernel; +Cc: Andrew Morton, davem
Now, tcp memory control cgroup ignores memcg's use_hierarchy value
and act as use_hierarchy=1 always. After this patch, tcp memcontrol will
work as memcg is designed.
Note:
I know there is a discussion to remove use_hierarchy but this
is BUG, now.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/memcontrol.h | 3 +++
mm/memcontrol.c | 5 +++++
net/ipv4/tcp_memcontrol.c | 2 +-
3 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index f94efd2..e116b7c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -199,6 +199,9 @@ void mem_cgroup_split_huge_fixup(struct page *head);
bool mem_cgroup_bad_page_check(struct page *page);
void mem_cgroup_print_bad_page(struct page *page);
#endif
+
+bool mem_cgroup_use_hierarchy(struct mem_cgroup *memcg);
+
#else /* CONFIG_CGROUP_MEM_RES_CTLR */
struct mem_cgroup;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7d698df..467881f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -867,6 +867,11 @@ struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm)
return memcg;
}
+bool mem_cgroup_use_hierarchy(struct mem_cgroup *memcg)
+{
+ return memcg->use_hierarchy;
+}
+
/**
* mem_cgroup_iter - iterate over memory cgroup hierarchy
* @root: hierarchy root
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index e795272..32764a6 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -75,7 +75,7 @@ int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
tcp->tcp_memory_pressure = 0;
parent_cg = tcp_prot.proto_cgroup(parent);
- if (parent_cg)
+ if (parent_cg && mem_cgroup_use_hierarchy(parent))
res_parent = parent_cg->memory_allocated;
res_counter_init(&tcp->tcp_memory_allocated, res_parent);
--
1.7.4.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* [BUGFIX][PATCH 2/3] memcg/tcp: fix static_branch handling
2012-03-29 6:22 [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes KAMEZAWA Hiroyuki
2012-03-29 6:26 ` [BUGFIX][PATCH 1/3] memcg/tcp : fix to see memcg's use_hierarchy in tcp memcontrol KAMEZAWA Hiroyuki
@ 2012-03-29 6:29 ` KAMEZAWA Hiroyuki
2012-03-29 6:31 ` [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started KAMEZAWA Hiroyuki
2012-03-29 6:51 ` [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes David Miller
3 siblings, 0 replies; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29 6:29 UTC (permalink / raw)
To: Glauber Costa, Linux Kernel; +Cc: Andrew Morton, davem
tcp memcontrol uses static_branch to optimize limit=RESOURCE_MAX case.
If all cgroup's limit=RESOUCE_MAX, resource usage is not accounted.
But it's buggy now.
For example, do following
# while sleep 1;do
echo 9223372036854775807 > /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
echo 300M > /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
done
and run network application under A. tcp's usage is sometimes accounted
and sometimes not accounted because of frequent changes of static_branch.
Then, finally, you can see broken tcp.usage_in_bytes.
WARN_ON() is printed because res_counter->usage goes below 0.
==
kernel: ------------[ cut here ]----------
kernel: WARNING: at kernel/res_counter.c:96 res_counter_uncharge_locked+0x37/0x40()
<snip>
kernel: Pid: 17753, comm: bash Tainted: G W 3.3.0+ #99
kernel: Call Trace:
kernel: <IRQ> [<ffffffff8104cc9f>] warn_slowpath_common+0x7f/0xc0
kernel: [<ffffffff810d7e88>] ? rb_reserve__next_event+0x68/0x470
kernel: [<ffffffff8104ccfa>] warn_slowpath_null+0x1a/0x20
kernel: [<ffffffff810b4e37>] res_counter_uncharge_locked+0x37/0x40
...
==
This patch removes static_branch_slow_dec() at changing res_counter's
limit to RESOUCE_MAX. By this, once accounting started, the accountting
will continue until the cgroup is destroyed.
I think this will not be problem in real use.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/net/tcp_memcontrol.h | 1 +
net/ipv4/tcp_memcontrol.c | 24 ++++++++++++++++++------
2 files changed, 19 insertions(+), 6 deletions(-)
diff --git a/include/net/tcp_memcontrol.h b/include/net/tcp_memcontrol.h
index 48410ff..f47e3c7 100644
--- a/include/net/tcp_memcontrol.h
+++ b/include/net/tcp_memcontrol.h
@@ -9,6 +9,7 @@ struct tcp_memcontrol {
/* those two are read-mostly, leave them at the end */
long tcp_prot_mem[3];
int tcp_memory_pressure;
+ bool accounting;
};
struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg);
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index 32764a6..cd0b47d 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -49,6 +49,20 @@ static void memcg_tcp_enter_memory_pressure(struct sock *sk)
}
EXPORT_SYMBOL(memcg_tcp_enter_memory_pressure);
+static void tcp_start_accounting(struct tcp_memcontrol *tcp)
+{
+ if (tcp->accounting)
+ return;
+ tcp->accounting = true;
+ static_key_slow_inc(&memcg_socket_limit_enabled);
+}
+
+static void tcp_end_accounting(struct tcp_memcontrol *tcp)
+{
+ if (tcp->accounting)
+ static_key_slow_dec(&memcg_socket_limit_enabled);
+}
+
int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
{
/*
@@ -73,6 +87,7 @@ int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
tcp->tcp_prot_mem[1] = net->ipv4.sysctl_tcp_mem[1];
tcp->tcp_prot_mem[2] = net->ipv4.sysctl_tcp_mem[2];
tcp->tcp_memory_pressure = 0;
+ tcp->accounting = false;
parent_cg = tcp_prot.proto_cgroup(parent);
if (parent_cg && mem_cgroup_use_hierarchy(parent))
@@ -110,8 +125,7 @@ void tcp_destroy_cgroup(struct cgroup *cgrp)
val = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_LIMIT);
- if (val != RESOURCE_MAX)
- static_key_slow_dec(&memcg_socket_limit_enabled);
+ tcp_end_accounting(tcp);
}
EXPORT_SYMBOL(tcp_destroy_cgroup);
@@ -142,10 +156,8 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val)
tcp->tcp_prot_mem[i] = min_t(long, val >> PAGE_SHIFT,
net->ipv4.sysctl_tcp_mem[i]);
- if (val == RESOURCE_MAX && old_lim != RESOURCE_MAX)
- static_key_slow_dec(&memcg_socket_limit_enabled);
- else if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX)
- static_key_slow_inc(&memcg_socket_limit_enabled);
+ if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX)
+ tcp_start_accounting(tcp);
return 0;
}
--
1.7.4.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started.
2012-03-29 6:22 [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes KAMEZAWA Hiroyuki
2012-03-29 6:26 ` [BUGFIX][PATCH 1/3] memcg/tcp : fix to see memcg's use_hierarchy in tcp memcontrol KAMEZAWA Hiroyuki
2012-03-29 6:29 ` [BUGFIX][PATCH 2/3] memcg/tcp: fix static_branch handling KAMEZAWA Hiroyuki
@ 2012-03-29 6:31 ` KAMEZAWA Hiroyuki
2012-03-29 6:51 ` [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes David Miller
3 siblings, 0 replies; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29 6:31 UTC (permalink / raw)
To: Glauber Costa, Linux Kernel; +Cc: Andrew Morton, davem
tcp memcontrol starts accouting after res->limit is set. So, if a sockets
starts before setting res->limit, there are already used resource.
After setting res->limit, the resource will be uncharged and make res_counter
below 0. This causes warning.
This patch fixes that by adding res_counter_uncharge_nowarn() and ignore the usage.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/res_counter.h | 2 ++
include/net/sock.h | 3 ++-
kernel/res_counter.c | 18 ++++++++++++++++++
3 files changed, 22 insertions(+), 1 deletions(-)
diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index da81af0..e081948 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -134,6 +134,8 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
void res_counter_uncharge(struct res_counter *counter, unsigned long val);
+void res_counter_uncharge_nowarn(struct res_counter *counter,
+ unsigned long val);
/**
* res_counter_margin - calculate chargeable space of a counter
diff --git a/include/net/sock.h b/include/net/sock.h
index a6ba1f8..a1b3f4802 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1048,7 +1048,8 @@ static inline void memcg_memory_allocated_add(struct cg_proto *prot,
static inline void memcg_memory_allocated_sub(struct cg_proto *prot,
unsigned long amt)
{
- res_counter_uncharge(prot->memory_allocated, amt << PAGE_SHIFT);
+ res_counter_uncharge_nowarn(prot->memory_allocated,
+ amt << PAGE_SHIFT);
}
static inline u64 memcg_memory_allocated_read(struct cg_proto *prot)
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index d508363..2bb01ac 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -113,6 +113,24 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
local_irq_restore(flags);
}
+void res_counter_uncharge_nowarn(struct res_counter *counter,
+ unsigned long val)
+{
+ struct res_counter *c;
+ unsigned long flags;
+
+ local_irq_save(flags);
+
+ for (c = counter; c != NULL; c = c->parent) {
+ spin_lock(&c->lock);
+ if (c->usage < val)
+ val = c->usage;
+ res_counter_uncharge_locked(c, val);
+ spin_unlock(&c->lock);
+ }
+ local_irq_restore(flags);
+}
+
static inline unsigned long long *
res_counter_member(struct res_counter *counter, int member)
--
1.7.4.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes.
2012-03-29 6:22 [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes KAMEZAWA Hiroyuki
` (2 preceding siblings ...)
2012-03-29 6:31 ` [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started KAMEZAWA Hiroyuki
@ 2012-03-29 6:51 ` David Miller
2012-03-29 6:53 ` KAMEZAWA Hiroyuki
3 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2012-03-29 6:51 UTC (permalink / raw)
To: kamezawa.hiroyu; +Cc: glommer, linux-kernel, akpm
Please post networking patches to netdev@vger.kernel.org, most of
the networking developers do not read linux-kernel and also
you need to post it to netdev to get it queued up in our networking
patchwork queue.
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread