From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative. Date: Wed, 18 Apr 2012 17:02:54 +0900 Message-ID: <4F8E752E.8040906@jp.fujitsu.com> References: <4F7408B7.9090706@jp.fujitsu.com> <4F740AEF.7090900@jp.fujitsu.com> <4F742983.1080402@parallels.com> <4F750FE8.2030800@jp.fujitsu.com> <4F7F1091.9040204@parallels.com> <4F839CF1.5050104@jp.fujitsu.com> <4F88637D.5020204@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, David Miller , Andrew Morton To: Glauber Costa Return-path: Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:58239 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751389Ab2DRIEo (ORCPT ); Wed, 18 Apr 2012 04:04:44 -0400 Received: from m4.gw.fujitsu.co.jp (unknown [10.0.50.74]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id BB1103EE0AE for ; Wed, 18 Apr 2012 17:04:42 +0900 (JST) Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id A182845DE51 for ; Wed, 18 Apr 2012 17:04:42 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 88C6E45DE4E for ; Wed, 18 Apr 2012 17:04:42 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 79B7EE08002 for ; Wed, 18 Apr 2012 17:04:42 +0900 (JST) Received: from m106.s.css.fujitsu.com (m106.s.css.fujitsu.com [10.240.81.146]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 299131DB803B for ; Wed, 18 Apr 2012 17:04:42 +0900 (JST) In-Reply-To: <4F88637D.5020204@parallels.com> Sender: netdev-owner@vger.kernel.org List-ID: (2012/04/14 2:33), Glauber Costa wrote: > On 04/09/2012 11:37 PM, KAMEZAWA Hiroyuki wrote: >> (2012/04/07 0:49), Glauber Costa wrote: >> >>> On 03/30/2012 05:44 AM, KAMEZAWA Hiroyuki wrote: >>>> Maybe what we can do before lsf/mm summit will be this (avoid warning.) >>>> This patch is onto linus's git tree. Patch description is updated. >>>> >>>> Thanks. >>>> -Kame >>>> == >>>> From 4ab80f84bbcb02a790342426c1de84aeb17fcbe9 Mon Sep 17 00:00:00 2001 >>>> From: KAMEZAWA Hiroyuki >>>> Date: Thu, 29 Mar 2012 14:59:04 +0900 >>>> Subject: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative. >>>> >>>> tcp memcontrol starts accouting after res->limit is set. So, if a sockets >>>> starts before setting res->limit, there are already used resource. >>>> At setting res->limit, accounting starts. The resource will be uncharged >>>> and make res_counter below 0 because they are not charged. >>>> This causes warning. >>>> >>> >>> Kame, >>> >>> Please test the following patch and see if it fixes your problems (I >>> tested locally, and it triggers me no warnings running the test script >>> you provided + an inbound scp -r copy of an iso directory from a remote >>> machine) >>> >>> When you are reviewing, keep in mind that we're likely to have the same >>> problems with slab jump labels - since the slab pages will outlive the >>> cgroup as well, and it might be worthy to keep this in mind, and provide >>> a central point for the jump labels to be set of on cgroup destruction. >>> >> >> >> Hm. What happens in following sequence ? >> >> 1. a memcg is created >> 2. put a task into the memcg, start tcp steam >> 3. set tcp memory limit >> >> The resource used between 2 and 3 will cause the problem finally. >> >> Then, Dave's request >> == >> You must either: >> >> 1) Integrate the socket's existing usage when the limit is set. >> >> 2) Avoid accounting completely for a socket that started before >> the limit was set. >> == >> are not satisfied. So, we need to have a state per sockets, it's accounted >> or not. I'll look into this problem again, today. >> > > Kame, > > Let me know what you think of the attached fix. > I still need to compile test it in other configs to be sure it doesn't > break, etc. But let's agree on it first. > Subject: [PATCH] decrement static keys on real destroy time > > We call the destroy function when a cgroup starts to be removed, > such as by a rmdir event. > > However, because of our reference counters, some objects are still > inflight. Right now, we are decrementing the static_keys at destroy() > time, meaning that if we get rid of the last static_key reference, > some objects will still have charges, but the code to properly > uncharge them won't be run. > > This becomes a problem specially if it is ever enabled again, because > now new charges will be added to the staled charges making keeping > it pretty much impossible. > > We just need to be careful with the static branch activation: > since there is no particular preferred order of their activation, > we need to make sure that we only start using it after all > call sites are active. This is achieved by having a per-memcg > flag that is only updated after static_key_slow_inc() returns. > At this time, we are sure all sites are active. > > This is made per-memcg, not global, for a reason: > it also has the effect of making socket accounting more > consistent. The first memcg to be limited will trigger static_key() > activation, therefore, accounting. But all the others will then be > accounted no matter what. After this patch, only limited memcgs > will have its sockets accounted. > > [v2: changed a tcp limited flag for a generic proto limited flag ] > > Signed-off-by: Glauber Costa > --- > include/net/sock.h | 1 + > mm/memcontrol.c | 20 ++++++++++++++++++-- > net/ipv4/tcp_memcontrol.c | 12 ++++++------ > 3 files changed, 25 insertions(+), 8 deletions(-) > > diff --git a/include/net/sock.h b/include/net/sock.h > index b3ebe6b..f35ff7d 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -913,6 +913,7 @@ struct cg_proto { > struct percpu_counter *sockets_allocated; /* Current number of sockets. */ > int *memory_pressure; > long *sysctl_mem; > + bool limited; please add comment somewhere. Hmm, 'limited' is good name ? > /* > * memcg field is used to find which memcg we belong directly > * Each memcg struct can hold more than one cg_proto, so container_of > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 02b01d2..61f2d31 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -404,6 +404,7 @@ void sock_update_memcg(struct sock *sk) > { > if (mem_cgroup_sockets_enabled) { > struct mem_cgroup *memcg; > + struct cg_proto *cg_proto; > > BUG_ON(!sk->sk_prot->proto_cgroup); > > @@ -423,9 +424,10 @@ void sock_update_memcg(struct sock *sk) > > rcu_read_lock(); > memcg = mem_cgroup_from_task(current); > - if (!mem_cgroup_is_root(memcg)) { > + cg_proto = sk->sk_prot->proto_cgroup(memcg); > + if (!mem_cgroup_is_root(memcg) && cg_proto->limited) { > mem_cgroup_get(memcg); > - sk->sk_cgrp = sk->sk_prot->proto_cgroup(memcg); > + sk->sk_cgrp = cg_proto; > } Ok, then, sk->sk_cgroup is set only after jump_label is enabled and its memcg has limitation. > rcu_read_unlock(); > } > @@ -442,6 +444,14 @@ void sock_release_memcg(struct sock *sk) > } > } > > +static void disarm_static_keys(struct mem_cgroup *memcg) > +{ > +#ifdef CONFIG_INET > + if (memcg->tcp_mem.cg_proto.limited) > + static_key_slow_dec(&memcg_socket_limit_enabled); > +#endif > +} > + > #ifdef CONFIG_INET > struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) > { > @@ -452,6 +462,11 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) > } > EXPORT_SYMBOL(tcp_proto_cgroup); > #endif /* CONFIG_INET */ > +#else > +static inline void disarm_static_keys(struct mem_cgroup *memcg) > +{ > +} > + > #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */ > > static void drain_all_stock_async(struct mem_cgroup *memcg); > @@ -4883,6 +4898,7 @@ static void __mem_cgroup_put(struct mem_cgroup *memcg, int count) > { > if (atomic_sub_and_test(count, &memcg->refcnt)) { > struct mem_cgroup *parent = parent_mem_cgroup(memcg); > + disarm_static_keys(memcg); > __mem_cgroup_free(memcg); > if (parent) > mem_cgroup_put(parent); > diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c > index 1517037..8005667 100644 > --- a/net/ipv4/tcp_memcontrol.c > +++ b/net/ipv4/tcp_memcontrol.c > @@ -54,6 +54,7 @@ int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss) > cg_proto->sysctl_mem = tcp->tcp_prot_mem; > cg_proto->memory_allocated = &tcp->tcp_memory_allocated; > cg_proto->sockets_allocated = &tcp->tcp_sockets_allocated; > + cg_proto->limited = false; > cg_proto->memcg = memcg; > > return 0; > @@ -74,9 +75,6 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg) > percpu_counter_destroy(&tcp->tcp_sockets_allocated); > > val = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_LIMIT); > - > - if (val != RESOURCE_MAX) > - static_key_slow_dec(&memcg_socket_limit_enabled); > } > EXPORT_SYMBOL(tcp_destroy_cgroup); > > @@ -107,10 +105,12 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val) > tcp->tcp_prot_mem[i] = min_t(long, val >> PAGE_SHIFT, > net->ipv4.sysctl_tcp_mem[i]); > > - if (val == RESOURCE_MAX && old_lim != RESOURCE_MAX) > - static_key_slow_dec(&memcg_socket_limit_enabled); > - else if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX) > + if (val == RESOURCE_MAX) > + cg_proto->limited = false; Why we can reset this to be false ? disarm_static_keys() will not work at destroy().... > + else if (val != RESOURCE_MAX && !cg_proto->limited) { > static_key_slow_inc(&memcg_socket_limit_enabled); > + cg_proto->limited = true; > + } > Hmm. don't we need any mutex ? Thanks, -Kame