* [PATCH 0/2] Proposed fixes for tcp memory pressure
@ 2011-12-15 9:34 Glauber Costa
[not found] ` <1323941672-14324-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
0 siblings, 1 reply; 12+ messages in thread
From: Glauber Costa @ 2011-12-15 9:34 UTC (permalink / raw)
To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA
Hi,
I propose the following two fixes for the issues we're having in
linux-next.
The first one just moves the definition out of the CONFIG_INET,
and should fix the compile problems with allnoconfig.
The second one is a bit bigger, and takes away the socket
registration functions. As I've explained in the Changelog,
I believe this to be the saner way to do it. We need to do allocations
in both create and init time.
This was compile tested with 4 different randconfigs, CGROUPS off,
KMEM cgroup off, INET off, all them on, and boot tested with
everything on.
Thanks.
Glauber Costa (2):
Move limit definitions outside CONFIG_INET
Explicitly call tcp creation and init from memcontrol.c
include/linux/memcontrol.h | 2 +-
include/net/sock.h | 2 --
include/net/tcp_memcontrol.h | 19 ++++++++++++++++++-
mm/memcontrol.c | 13 ++++---------
net/core/sock.c | 37 -------------------------------------
net/ipv4/tcp_memcontrol.c | 11 ++++++-----
6 files changed, 29 insertions(+), 55 deletions(-)
--
1.7.6.4
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] Move limit definitions outside CONFIG_INET
[not found] ` <1323941672-14324-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2011-12-15 9:34 ` Glauber Costa
[not found] ` <1323941672-14324-2-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-12-15 9:34 ` [PATCH 2/2] Explicitly call tcp creation and init from memcontrol.c Glauber Costa
1 sibling, 1 reply; 12+ messages in thread
From: Glauber Costa @ 2011-12-15 9:34 UTC (permalink / raw)
To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
Glauber Costa, Hiroyouki Kamezawa, Eric Dumazet, Stephen Rothwell
They need to be available for other protocols as well, since
they are used in sock.c openly
Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
CC: Hiroyouki Kamezawa <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
CC: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
CC: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Stephen Rothwell <sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org>
---
include/linux/memcontrol.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1513994..9b296ea 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -384,13 +384,13 @@ mem_cgroup_print_bad_page(struct page *page)
}
#endif
-#ifdef CONFIG_INET
enum {
UNDER_LIMIT,
SOFT_LIMIT,
OVER_LIMIT,
};
+#ifdef CONFIG_INET
struct sock;
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
void sock_update_memcg(struct sock *sk);
--
1.7.6.4
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] Explicitly call tcp creation and init from memcontrol.c
[not found] ` <1323941672-14324-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-12-15 9:34 ` [PATCH 1/2] Move limit definitions outside CONFIG_INET Glauber Costa
@ 2011-12-15 9:34 ` Glauber Costa
[not found] ` <1323941672-14324-3-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-12-15 17:00 ` David Miller
1 sibling, 2 replies; 12+ messages in thread
From: Glauber Costa @ 2011-12-15 9:34 UTC (permalink / raw)
To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
Glauber Costa, Hiroyouki Kamezawa, Eric Dumazet, Stephen Rothwell
Walking the proto_list holds a read_lock, which prevents us from doing
allocations. Splitting the tcp create function into create + init is
good, but it is not enough since create_files will do allocations as well
(dentry ones, mostly).
Since this does not involve any protocol state, I propose we call the tcp
functions explicitly from memcontrol.c
With this, we lose by now the ability of doing cgroup memcontrol for
protocols that are loaded as modules. But at least the ones I have in mind
won't really need it (tcp_ipv6 being the only one, but it uses the same data
structures as tcp_ipv4). So I believe this to be the simpler solution to this
problem.
Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
CC: Hiroyouki Kamezawa <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
CC: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
CC: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Stephen Rothwell <sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org>
---
include/net/sock.h | 2 --
include/net/tcp_memcontrol.h | 19 ++++++++++++++++++-
mm/memcontrol.c | 13 ++++---------
net/core/sock.c | 37 -------------------------------------
net/ipv4/tcp_memcontrol.c | 11 ++++++-----
5 files changed, 28 insertions(+), 54 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 1df44e2..6cbee80 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -64,8 +64,6 @@
#include <net/dst.h>
#include <net/checksum.h>
-int mem_cgroup_sockets_init(struct cgroup *cgrp, struct cgroup_subsys *ss);
-void mem_cgroup_sockets_destroy(struct cgroup *cgrp, struct cgroup_subsys *ss);
/*
* This structure really needs to be cleaned up.
* Most of it is for TCP, and not used by any of
diff --git a/include/net/tcp_memcontrol.h b/include/net/tcp_memcontrol.h
index 3512082..f1ff94a 100644
--- a/include/net/tcp_memcontrol.h
+++ b/include/net/tcp_memcontrol.h
@@ -11,9 +11,26 @@ struct tcp_memcontrol {
int tcp_memory_pressure;
};
-struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg);
+struct cgroup;
+struct cgroup_subsys;
+#if defined(CONFIG_INET) && defined(CONFIG_CGROUP_MEM_RES_CTLR_KMEM)
int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss);
void tcp_destroy_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss);
+void tcp_create_cgroup(struct mem_cgroup *memcg, struct mem_cgroup *parent);
+#else
+static inline int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
+{
+ return 0;
+}
+static inline void tcp_destroy_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
+{
+}
+static inline void
+tcp_create_cgroup(struct mem_cgroup *memcg, struct mem_cgroup *parent)
+{
+}
+#endif
+struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg);
unsigned long long tcp_max_memory(const struct mem_cgroup *memcg);
void tcp_prot_mem(struct mem_cgroup *memcg, long val, int idx);
#endif /* _TCP_MEMCG_H */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7266202..e3d8886 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4749,22 +4749,15 @@ static int register_kmem_files(struct cgroup *cont, struct cgroup_subsys *ss)
ret = cgroup_add_files(cont, ss, kmem_cgroup_files,
ARRAY_SIZE(kmem_cgroup_files));
- /*
- * Part of this would be better living in a separate allocation
- * function, leaving us with just the cgroup tree population work.
- * We, however, depend on state such as network's proto_list that
- * is only initialized after cgroup creation. I found the less
- * cumbersome way to deal with it to defer it all to populate time
- */
if (!ret)
- ret = mem_cgroup_sockets_init(cont, ss);
+ tcp_init_cgroup(cont, ss);
return ret;
};
static void kmem_cgroup_destroy(struct cgroup_subsys *ss,
struct cgroup *cont)
{
- mem_cgroup_sockets_destroy(cont, ss);
+ tcp_destroy_cgroup(cont, ss);
}
#else
static int register_kmem_files(struct cgroup *cont, struct cgroup_subsys *ss)
@@ -5093,6 +5086,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
res_counter_init(&memcg->res, &parent->res);
res_counter_init(&memcg->memsw, &parent->memsw);
res_counter_init(&memcg->kmem, &parent->kmem);
+ tcp_create_cgroup(memcg, parent);
/*
* We increment refcnt of the parent to ensure that we can
* safely access it on res_counter_charge/uncharge.
@@ -5104,6 +5098,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
res_counter_init(&memcg->res, NULL);
res_counter_init(&memcg->memsw, NULL);
res_counter_init(&memcg->kmem, NULL);
+ tcp_create_cgroup(memcg, NULL);
}
memcg->last_scanned_child = 0;
memcg->last_scanned_node = MAX_NUMNODES;
diff --git a/net/core/sock.c b/net/core/sock.c
index 5de62d3..103f246 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -138,43 +138,6 @@
static DEFINE_RWLOCK(proto_list_lock);
static LIST_HEAD(proto_list);
-#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
-int mem_cgroup_sockets_init(struct cgroup *cgrp, struct cgroup_subsys *ss)
-{
- struct proto *proto;
- int ret = 0;
-
- read_lock(&proto_list_lock);
- list_for_each_entry(proto, &proto_list, node) {
- if (proto->init_cgroup) {
- ret = proto->init_cgroup(cgrp, ss);
- if (ret)
- goto out;
- }
- }
-
- read_unlock(&proto_list_lock);
- return ret;
-out:
- list_for_each_entry_continue_reverse(proto, &proto_list, node)
- if (proto->destroy_cgroup)
- proto->destroy_cgroup(cgrp, ss);
- read_unlock(&proto_list_lock);
- return ret;
-}
-
-void mem_cgroup_sockets_destroy(struct cgroup *cgrp, struct cgroup_subsys *ss)
-{
- struct proto *proto;
-
- read_lock(&proto_list_lock);
- list_for_each_entry_reverse(proto, &proto_list, node)
- if (proto->destroy_cgroup)
- proto->destroy_cgroup(cgrp, ss);
- read_unlock(&proto_list_lock);
-}
-#endif
-
/*
* Each address family might have different locking rules, so we have
* one slock key per address family:
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index 171d7b6..1433505 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -49,7 +49,7 @@ static void memcg_tcp_enter_memory_pressure(struct sock *sk)
}
EXPORT_SYMBOL(memcg_tcp_enter_memory_pressure);
-int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
+void tcp_create_cgroup(struct mem_cgroup *memcg, struct mem_cgroup *parent)
{
/*
* The root cgroup does not use res_counters, but rather,
@@ -59,13 +59,11 @@ int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
struct res_counter *res_parent = NULL;
struct cg_proto *cg_proto, *parent_cg;
struct tcp_memcontrol *tcp;
- struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
- struct mem_cgroup *parent = parent_mem_cgroup(memcg);
struct net *net = current->nsproxy->net_ns;
cg_proto = tcp_prot.proto_cgroup(memcg);
if (!cg_proto)
- goto create_files;
+ return;
tcp = tcp_from_cgproto(cg_proto);
@@ -87,8 +85,11 @@ int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
cg_proto->memory_allocated = &tcp->tcp_memory_allocated;
cg_proto->sockets_allocated = &tcp->tcp_sockets_allocated;
cg_proto->memcg = memcg;
+}
+EXPORT_SYMBOL(tcp_create_cgroup);
-create_files:
+int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
+{
return cgroup_add_files(cgrp, ss, tcp_files,
ARRAY_SIZE(tcp_files));
}
--
1.7.6.4
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] Explicitly call tcp creation and init from memcontrol.c
[not found] ` <1323941672-14324-3-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2011-12-15 16:13 ` KAMEZAWA Hiroyuki
[not found] ` <20111216011316.8d58bc8f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-15 16:13 UTC (permalink / raw)
To: Glauber Costa
Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
Eric Dumazet, Stephen Rothwell
On Thu, 15 Dec 2011 13:34:32 +0400
Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> wrote:
> Walking the proto_list holds a read_lock, which prevents us from doing
> allocations. Splitting the tcp create function into create + init is
> good, but it is not enough since create_files will do allocations as well
> (dentry ones, mostly).
>
> Since this does not involve any protocol state, I propose we call the tcp
> functions explicitly from memcontrol.c
>
> With this, we lose by now the ability of doing cgroup memcontrol for
> protocols that are loaded as modules. But at least the ones I have in mind
> won't really need it (tcp_ipv6 being the only one, but it uses the same data
> structures as tcp_ipv4). So I believe this to be the simpler solution to this
> problem.
>
> Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> CC: Hiroyouki Kamezawa <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> CC: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> CC: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> CC: Stephen Rothwell <sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org>
Could you remake the patch onto the 'latest' linux-next ?
As Dave mentioned, some bandaids are already applied and this patch hunks.
Thanks,
-Kame
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] Explicitly call tcp creation and init from memcontrol.c
[not found] ` <20111216011316.8d58bc8f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2011-12-15 16:18 ` Glauber Costa
0 siblings, 0 replies; 12+ messages in thread
From: Glauber Costa @ 2011-12-15 16:18 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
Eric Dumazet, Stephen Rothwell
On 12/15/2011 08:13 PM, KAMEZAWA Hiroyuki wrote:
> On Thu, 15 Dec 2011 13:34:32 +0400
> Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> wrote:
>
>> Walking the proto_list holds a read_lock, which prevents us from doing
>> allocations. Splitting the tcp create function into create + init is
>> good, but it is not enough since create_files will do allocations as well
>> (dentry ones, mostly).
>>
>> Since this does not involve any protocol state, I propose we call the tcp
>> functions explicitly from memcontrol.c
>>
>> With this, we lose by now the ability of doing cgroup memcontrol for
>> protocols that are loaded as modules. But at least the ones I have in mind
>> won't really need it (tcp_ipv6 being the only one, but it uses the same data
>> structures as tcp_ipv4). So I believe this to be the simpler solution to this
>> problem.
>>
>> Signed-off-by: Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
>> CC: Hiroyouki Kamezawa<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
>> CC: David S. Miller<davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
>> CC: Eric Dumazet<eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> CC: Stephen Rothwell<sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org>
>
> Could you remake the patch onto the 'latest' linux-next ?
> As Dave mentioned, some bandaids are already applied and this patch hunks.
Sure thing.
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] Explicitly call tcp creation and init from memcontrol.c
2011-12-15 16:13 ` KAMEZAWA Hiroyuki
[not found] ` <20111216011316.8d58bc8f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2011-12-15 16:57 ` David Miller
2011-12-15 22:20 ` KAMEZAWA Hiroyuki
2 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2011-12-15 16:57 UTC (permalink / raw)
To: kamezawa.hiroyu; +Cc: glommer, linux-kernel, netdev, cgroups, eric.dumazet, sfr
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date: Fri, 16 Dec 2011 01:13:16 +0900
> Could you remake the patch onto the 'latest' linux-next ?
> As Dave mentioned, some bandaids are already applied and this patch hunks.
He was correct to submit this against net-next.
The bandaid in linux-next is temporary and Stephen will remove it
once we fix things properly in net-next.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] Move limit definitions outside CONFIG_INET
[not found] ` <1323941672-14324-2-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2011-12-15 16:59 ` David Miller
0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2011-12-15 16:59 UTC (permalink / raw)
To: glommer-bzQdu9zFT3WakBO8gow8eQ
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w, sfr-3FnU+UHB4dNDw9hX6IcOSA
From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Date: Thu, 15 Dec 2011 13:34:31 +0400
> They need to be available for other protocols as well, since
> they are used in sock.c openly
>
> Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Applied.
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] Explicitly call tcp creation and init from memcontrol.c
2011-12-15 9:34 ` [PATCH 2/2] Explicitly call tcp creation and init from memcontrol.c Glauber Costa
[not found] ` <1323941672-14324-3-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2011-12-15 17:00 ` David Miller
[not found] ` <20111215.120028.532844419499092747.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
1 sibling, 1 reply; 12+ messages in thread
From: David Miller @ 2011-12-15 17:00 UTC (permalink / raw)
To: glommer; +Cc: linux-kernel, netdev, cgroups, kamezawa.hiroyu, eric.dumazet, sfr
From: Glauber Costa <glommer@parallels.com>
Date: Thu, 15 Dec 2011 13:34:32 +0400
> Walking the proto_list holds a read_lock, which prevents us from doing
> allocations. Splitting the tcp create function into create + init is
> good, but it is not enough since create_files will do allocations as well
> (dentry ones, mostly).
>
> Since this does not involve any protocol state, I propose we call the tcp
> functions explicitly from memcontrol.c
>
> With this, we lose by now the ability of doing cgroup memcontrol for
> protocols that are loaded as modules. But at least the ones I have in mind
> won't really need it (tcp_ipv6 being the only one, but it uses the same data
> structures as tcp_ipv4). So I believe this to be the simpler solution to this
> problem.
>
> Signed-off-by: Glauber Costa <glommer@parallels.com>
This is an unnecessary limitation, please fix this properly otherwise
DCCP, SCTP, etc. won't be supportable with this stuff.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] Explicitly call tcp creation and init from memcontrol.c
[not found] ` <20111215.120028.532844419499092747.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2011-12-15 21:11 ` Glauber Costa
[not found] ` <4EEA6288.1080405-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
0 siblings, 1 reply; 12+ messages in thread
From: Glauber Costa @ 2011-12-15 21:11 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w, sfr-3FnU+UHB4dNDw9hX6IcOSA
[-- Attachment #1: Type: text/plain, Size: 1344 bytes --]
On 12/15/2011 09:00 PM, David Miller wrote:
> From: Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> Date: Thu, 15 Dec 2011 13:34:32 +0400
>
>> Walking the proto_list holds a read_lock, which prevents us from doing
>> allocations. Splitting the tcp create function into create + init is
>> good, but it is not enough since create_files will do allocations as well
>> (dentry ones, mostly).
>>
>> Since this does not involve any protocol state, I propose we call the tcp
>> functions explicitly from memcontrol.c
>>
>> With this, we lose by now the ability of doing cgroup memcontrol for
>> protocols that are loaded as modules. But at least the ones I have in mind
>> won't really need it (tcp_ipv6 being the only one, but it uses the same data
>> structures as tcp_ipv4). So I believe this to be the simpler solution to this
>> problem.
>>
>> Signed-off-by: Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
>
> This is an unnecessary limitation, please fix this properly otherwise
> DCCP, SCTP, etc. won't be supportable with this stuff.
How about the following patch then ? I am keeping protocols that has the
cgroup stuff on in a separate list, that can be protected by a mutex.
Therefore we can allocate without going into troubles.
Let me know if you still have objections, I'll be happy to address them.
[-- Attachment #2: 0001-fix-sleeping-while-atomic-problem-in-sock-mem_cgroup.patch --]
[-- Type: text/plain, Size: 6504 bytes --]
>From c8fe669a9cbc7c7cbd87ce48d7eb91ed6c96cbde Mon Sep 17 00:00:00 2001
From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Date: Thu, 15 Dec 2011 23:47:06 +0300
Subject: [PATCH] fix sleeping while atomic problem in sock mem_cgroup.
Since we can't scan the proto_list to initialize sock cgroups, as it
holds a rwlock, and we also want to keep the code generic enough to
avoid calling the initialization functions of protocols directly,
I propose we keep the interested parties in a separate list. This list
is protected by a mutex so we can sleep and do the necessary allocations.
Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
CC: Hiroyouki Kamezawa <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
CC: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
CC: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Stephen Rothwell <sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org>
---
include/linux/memcontrol.h | 4 +++
include/net/sock.h | 5 +---
include/net/tcp_memcontrol.h | 2 +
mm/memcontrol.c | 52 ++++++++++++++++++++++++++++++++++++++++++
net/core/sock.c | 48 +++++++++-----------------------------
5 files changed, 70 insertions(+), 41 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 9b296ea..1edd0e3 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -390,6 +390,10 @@ enum {
OVER_LIMIT,
};
+struct proto;
+void register_sock_cgroup(struct proto *prot);
+void unregister_sock_cgroup(struct proto *prot);
+
#ifdef CONFIG_INET
struct sock;
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
diff --git a/include/net/sock.h b/include/net/sock.h
index 6fe0dae..f8237a3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -64,10 +64,6 @@
#include <net/dst.h>
#include <net/checksum.h>
-struct cgroup;
-struct cgroup_subsys;
-int mem_cgroup_sockets_init(struct cgroup *cgrp, struct cgroup_subsys *ss);
-void mem_cgroup_sockets_destroy(struct cgroup *cgrp, struct cgroup_subsys *ss);
/*
* This structure really needs to be cleaned up.
* Most of it is for TCP, and not used by any of
@@ -858,6 +854,7 @@ struct proto {
void (*destroy_cgroup)(struct cgroup *cgrp,
struct cgroup_subsys *ss);
struct cg_proto *(*proto_cgroup)(struct mem_cgroup *memcg);
+ struct list_head cgroup_node;
#endif
};
diff --git a/include/net/tcp_memcontrol.h b/include/net/tcp_memcontrol.h
index 3512082..5aa7c4b 100644
--- a/include/net/tcp_memcontrol.h
+++ b/include/net/tcp_memcontrol.h
@@ -11,6 +11,8 @@ struct tcp_memcontrol {
int tcp_memory_pressure;
};
+struct cgroup;
+struct cgroup_subsys;
struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg);
int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss);
void tcp_destroy_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7266202..6ee250d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4742,6 +4742,58 @@ static struct cftype kmem_cgroup_files[] = {
},
};
+static DEFINE_MUTEX(cgroup_proto_list_lock);
+static LIST_HEAD(cgroup_proto_list);
+
+static int mem_cgroup_sockets_init(struct cgroup *cgrp, struct cgroup_subsys *ss)
+{
+ struct proto *proto;
+ int ret = 0;
+
+ mutex_lock(&cgroup_proto_list_lock);
+ list_for_each_entry(proto, &cgroup_proto_list, cgroup_node) {
+ if (proto->init_cgroup) {
+ ret = proto->init_cgroup(cgrp, ss);
+ if (ret)
+ goto out;
+ }
+ }
+
+ mutex_unlock(&cgroup_proto_list_lock);
+ return ret;
+out:
+ list_for_each_entry_continue_reverse(proto, &cgroup_proto_list, cgroup_node)
+ if (proto->destroy_cgroup)
+ proto->destroy_cgroup(cgrp, ss);
+ mutex_unlock(&cgroup_proto_list_lock);
+ return ret;
+}
+
+static void mem_cgroup_sockets_destroy(struct cgroup *cgrp, struct cgroup_subsys *ss)
+{
+ struct proto *proto;
+
+ mutex_lock(&cgroup_proto_list_lock);
+ list_for_each_entry_reverse(proto, &cgroup_proto_list, cgroup_node)
+ if (proto->destroy_cgroup)
+ proto->destroy_cgroup(cgrp, ss);
+ mutex_unlock(&cgroup_proto_list_lock);
+}
+
+void register_sock_cgroup(struct proto *prot)
+{
+ mutex_lock(&cgroup_proto_list_lock);
+ list_add(&prot->cgroup_node, &cgroup_proto_list);
+ mutex_unlock(&cgroup_proto_list_lock);
+}
+
+void unregister_sock_cgroup(struct proto *prot)
+{
+ mutex_lock(&cgroup_proto_list_lock);
+ list_del(&prot->cgroup_node);
+ mutex_unlock(&cgroup_proto_list_lock);
+}
+
static int register_kmem_files(struct cgroup *cont, struct cgroup_subsys *ss)
{
int ret = 0;
diff --git a/net/core/sock.c b/net/core/sock.c
index 5a6a906..3728b50 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -139,43 +139,6 @@
static DEFINE_RWLOCK(proto_list_lock);
static LIST_HEAD(proto_list);
-#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
-int mem_cgroup_sockets_init(struct cgroup *cgrp, struct cgroup_subsys *ss)
-{
- struct proto *proto;
- int ret = 0;
-
- read_lock(&proto_list_lock);
- list_for_each_entry(proto, &proto_list, node) {
- if (proto->init_cgroup) {
- ret = proto->init_cgroup(cgrp, ss);
- if (ret)
- goto out;
- }
- }
-
- read_unlock(&proto_list_lock);
- return ret;
-out:
- list_for_each_entry_continue_reverse(proto, &proto_list, node)
- if (proto->destroy_cgroup)
- proto->destroy_cgroup(cgrp, ss);
- read_unlock(&proto_list_lock);
- return ret;
-}
-
-void mem_cgroup_sockets_destroy(struct cgroup *cgrp, struct cgroup_subsys *ss)
-{
- struct proto *proto;
-
- read_lock(&proto_list_lock);
- list_for_each_entry_reverse(proto, &proto_list, node)
- if (proto->destroy_cgroup)
- proto->destroy_cgroup(cgrp, ss);
- read_unlock(&proto_list_lock);
-}
-#endif
-
/*
* Each address family might have different locking rules, so we have
* one slock key per address family:
@@ -2483,6 +2446,12 @@ int proto_register(struct proto *prot, int alloc_slab)
list_add(&prot->node, &proto_list);
assign_proto_idx(prot);
write_unlock(&proto_list_lock);
+
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
+ if (prot->proto_cgroup)
+ register_sock_cgroup(prot);
+#endif
+
return 0;
out_free_timewait_sock_slab_name:
@@ -2510,6 +2479,11 @@ void proto_unregister(struct proto *prot)
list_del(&prot->node);
write_unlock(&proto_list_lock);
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
+ if (prot->proto_cgroup != NULL)
+ unregister_sock_cgroup(prot);
+#endif
+
if (prot->slab != NULL) {
kmem_cache_destroy(prot->slab);
prot->slab = NULL;
--
1.7.6.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] Explicitly call tcp creation and init from memcontrol.c
2011-12-15 16:13 ` KAMEZAWA Hiroyuki
[not found] ` <20111216011316.8d58bc8f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2011-12-15 16:57 ` David Miller
@ 2011-12-15 22:20 ` KAMEZAWA Hiroyuki
2011-12-16 2:06 ` Glauber Costa
2 siblings, 1 reply; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-15 22:20 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Glauber Costa, davem, linux-kernel, netdev, cgroups, Eric Dumazet,
Stephen Rothwell
On Fri, 16 Dec 2011 01:13:16 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Thu, 15 Dec 2011 13:34:32 +0400
> Glauber Costa <glommer@parallels.com> wrote:
>
> > Walking the proto_list holds a read_lock, which prevents us from doing
> > allocations. Splitting the tcp create function into create + init is
> > good, but it is not enough since create_files will do allocations as well
> > (dentry ones, mostly).
> >
> > Since this does not involve any protocol state, I propose we call the tcp
> > functions explicitly from memcontrol.c
> >
> > With this, we lose by now the ability of doing cgroup memcontrol for
> > protocols that are loaded as modules. But at least the ones I have in mind
> > won't really need it (tcp_ipv6 being the only one, but it uses the same data
> > structures as tcp_ipv4). So I believe this to be the simpler solution to this
> > problem.
> >
> > Signed-off-by: Glauber Costa <glommer@parallels.com>
> > CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
> > CC: David S. Miller <davem@davemloft.net>
> > CC: Eric Dumazet <eric.dumazet@gmail.com>
> > CC: Stephen Rothwell <sfr@canb.auug.org.au>
>
> Could you remake the patch onto the 'latest' linux-next ?
> As Dave mentioned, some bandaids are already applied and this patch hunks.
Applied patches by hand and did small test for hours.
seems good.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] Explicitly call tcp creation and init from memcontrol.c
[not found] ` <4EEA6288.1080405-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2011-12-15 22:44 ` David Miller
0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2011-12-15 22:44 UTC (permalink / raw)
To: glommer-bzQdu9zFT3WakBO8gow8eQ
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w, sfr-3FnU+UHB4dNDw9hX6IcOSA
From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Date: Fri, 16 Dec 2011 01:11:36 +0400
> How about the following patch then ? I am keeping protocols that has
> the cgroup stuff on in a separate list, that can be protected by a
> mutex. Therefore we can allocate without going into troubles.
This is a significantly better approach, please test and submit this
patch formally.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] Explicitly call tcp creation and init from memcontrol.c
2011-12-15 22:20 ` KAMEZAWA Hiroyuki
@ 2011-12-16 2:06 ` Glauber Costa
0 siblings, 0 replies; 12+ messages in thread
From: Glauber Costa @ 2011-12-16 2:06 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: davem, linux-kernel, netdev, cgroups, Eric Dumazet,
Stephen Rothwell
On 12/16/2011 02:20 AM, KAMEZAWA Hiroyuki wrote:
> On Fri, 16 Dec 2011 01:13:16 +0900
> KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
>> On Thu, 15 Dec 2011 13:34:32 +0400
>> Glauber Costa<glommer@parallels.com> wrote:
>>
>>> Walking the proto_list holds a read_lock, which prevents us from doing
>>> allocations. Splitting the tcp create function into create + init is
>>> good, but it is not enough since create_files will do allocations as well
>>> (dentry ones, mostly).
>>>
>>> Since this does not involve any protocol state, I propose we call the tcp
>>> functions explicitly from memcontrol.c
>>>
>>> With this, we lose by now the ability of doing cgroup memcontrol for
>>> protocols that are loaded as modules. But at least the ones I have in mind
>>> won't really need it (tcp_ipv6 being the only one, but it uses the same data
>>> structures as tcp_ipv4). So I believe this to be the simpler solution to this
>>> problem.
>>>
>>> Signed-off-by: Glauber Costa<glommer@parallels.com>
>>> CC: Hiroyouki Kamezawa<kamezawa.hiroyu@jp.fujitsu.com>
>>> CC: David S. Miller<davem@davemloft.net>
>>> CC: Eric Dumazet<eric.dumazet@gmail.com>
>>> CC: Stephen Rothwell<sfr@canb.auug.org.au>
>>
>> Could you remake the patch onto the 'latest' linux-next ?
>> As Dave mentioned, some bandaids are already applied and this patch hunks.
>
> Applied patches by hand and did small test for hours.
> seems good.
>
> Reviewed-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
Kame,
Thanks. But see Dave's answer to this: He'd like me to follow a slightly
different approach (I've attached a patch earlier)
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-12-16 2:06 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-15 9:34 [PATCH 0/2] Proposed fixes for tcp memory pressure Glauber Costa
[not found] ` <1323941672-14324-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-12-15 9:34 ` [PATCH 1/2] Move limit definitions outside CONFIG_INET Glauber Costa
[not found] ` <1323941672-14324-2-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-12-15 16:59 ` David Miller
2011-12-15 9:34 ` [PATCH 2/2] Explicitly call tcp creation and init from memcontrol.c Glauber Costa
[not found] ` <1323941672-14324-3-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-12-15 16:13 ` KAMEZAWA Hiroyuki
[not found] ` <20111216011316.8d58bc8f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2011-12-15 16:18 ` Glauber Costa
2011-12-15 16:57 ` David Miller
2011-12-15 22:20 ` KAMEZAWA Hiroyuki
2011-12-16 2:06 ` Glauber Costa
2011-12-15 17:00 ` David Miller
[not found] ` <20111215.120028.532844419499092747.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2011-12-15 21:11 ` Glauber Costa
[not found] ` <4EEA6288.1080405-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-12-15 22:44 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).