From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: [PATCH v5 0/2] fix static_key disabling problem in memcg Date: Fri, 11 May 2012 17:11:15 -0300 Message-ID: <1336767077-25351-1-git-send-email-glommer@parallels.com> Return-path: Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: cgroups@vger.kernel.org Cc: linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Li Zefan Hi, Tejun, Kame, This series is composed of the two patches of the last fix, with no changes (only exception is the removal of x = false assignments that Tejun requested, that is done now). Note also that patch 1 of this series was reused by me in the slab accounting patches for memcg. The first patch, that adds a mutex to memcg is dropped. I didn't posted it before so I could wait for Kame to get back from his vacations and properly review it. Kame: Steven Rostedt pointed out that our analysis of the static branch updates were wrong, so the mutex is really not needed. The key to understand that, is that atomic_inc_not_zero will only return right away if the value is not yet zero - as the name implies - but the update in the atomic variable only happens after the code is patched. Therefore, if two callers enters with a key value of zero, both will be held at the jump_label_lock() call, effectively guaranteeing the behavior we need. Glauber Costa (2): Always free struct memcg through schedule_work() decrement static keys on real destroy time include/net/sock.h | 9 ++++++++ mm/memcontrol.c | 50 +++++++++++++++++++++++++++++++++----------- net/ipv4/tcp_memcontrol.c | 32 ++++++++++++++++++++++------ 3 files changed, 71 insertions(+), 20 deletions(-) -- 1.7.7.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: [PATCH v5 1/2] Always free struct memcg through schedule_work() Date: Fri, 11 May 2012 17:11:16 -0300 Message-ID: <1336767077-25351-2-git-send-email-glommer@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> Return-path: In-Reply-To: <1336767077-25351-1-git-send-email-glommer@parallels.com> Sender: netdev-owner@vger.kernel.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: cgroups@vger.kernel.org Cc: linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Glauber Costa , Johannes Weiner , Michal Hocko Right now we free struct memcg with kfree right after a rcu grace period, but defer it if we need to use vfree() to get rid of that memory area. We do that by need, because we need vfree to be called in a process context. This patch unifies this behavior, by ensuring that even kfree will happen in a separate thread. The goal is to have a stable place to call the upcoming jump label destruction function outside the realm of the complicated and quite far-reaching cgroup lock (that can't be held when calling neither the cpu_hotplug.lock nor the jump_label_mutex) Signed-off-by: Glauber Costa CC: Tejun Heo CC: Li Zefan CC: Kamezawa Hiroyuki CC: Johannes Weiner CC: Michal Hocko --- mm/memcontrol.c | 24 +++++++++++++----------- 1 files changed, 13 insertions(+), 11 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 932a734..0b4b4c8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -245,8 +245,8 @@ struct mem_cgroup { */ struct rcu_head rcu_freeing; /* - * But when using vfree(), that cannot be done at - * interrupt time, so we must then queue the work. + * We also need some space for a worker in deferred freeing. + * By the time we call it, rcu_freeing is not longer in use. */ struct work_struct work_freeing; }; @@ -4826,23 +4826,28 @@ out_free: } /* - * Helpers for freeing a vzalloc()ed mem_cgroup by RCU, + * Helpers for freeing a kmalloc()ed/vzalloc()ed mem_cgroup by RCU, * but in process context. The work_freeing structure is overlaid * on the rcu_freeing structure, which itself is overlaid on memsw. */ -static void vfree_work(struct work_struct *work) +static void free_work(struct work_struct *work) { struct mem_cgroup *memcg; + int size = sizeof(struct mem_cgroup); memcg = container_of(work, struct mem_cgroup, work_freeing); - vfree(memcg); + if (size < PAGE_SIZE) + kfree(memcg); + else + vfree(memcg); } -static void vfree_rcu(struct rcu_head *rcu_head) + +static void free_rcu(struct rcu_head *rcu_head) { struct mem_cgroup *memcg; memcg = container_of(rcu_head, struct mem_cgroup, rcu_freeing); - INIT_WORK(&memcg->work_freeing, vfree_work); + INIT_WORK(&memcg->work_freeing, free_work); schedule_work(&memcg->work_freeing); } @@ -4868,10 +4873,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg) free_mem_cgroup_per_zone_info(memcg, node); free_percpu(memcg->stat); - if (sizeof(struct mem_cgroup) < PAGE_SIZE) - kfree_rcu(memcg, rcu_freeing); - else - call_rcu(&memcg->rcu_freeing, vfree_rcu); + call_rcu(&memcg->rcu_freeing, free_rcu); } static void mem_cgroup_get(struct mem_cgroup *memcg) -- 1.7.7.6 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: [PATCH v5 2/2] decrement static keys on real destroy time Date: Fri, 11 May 2012 17:11:17 -0300 Message-ID: <1336767077-25351-3-git-send-email-glommer@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> Return-path: In-Reply-To: <1336767077-25351-1-git-send-email-glommer@parallels.com> Sender: netdev-owner@vger.kernel.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: cgroups@vger.kernel.org Cc: linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Glauber Costa , Johannes Weiner , Michal Hocko We call the destroy function when a cgroup starts to be removed, such as by a rmdir event. However, because of our reference counters, some objects are still inflight. Right now, we are decrementing the static_keys at destroy() time, meaning that if we get rid of the last static_key reference, some objects will still have charges, but the code to properly uncharge them won't be run. This becomes a problem specially if it is ever enabled again, because now new charges will be added to the staled charges making keeping it pretty much impossible. We just need to be careful with the static branch activation: since there is no particular preferred order of their activation, we need to make sure that we only start using it after all call sites are active. This is achieved by having a per-memcg flag that is only updated after static_key_slow_inc() returns. At this time, we are sure all sites are active. This is made per-memcg, not global, for a reason: it also has the effect of making socket accounting more consistent. The first memcg to be limited will trigger static_key() activation, therefore, accounting. But all the others will then be accounted no matter what. After this patch, only limited memcgs will have its sockets accounted. [v2: changed a tcp limited flag for a generic proto limited flag ] [v3: update the current active flag only after the static_key update ] [v4: disarm_static_keys() inside free_work ] [v5: got rid of tcp_limit_mutex, now in the static_key interface ] Signed-off-by: Glauber Costa CC: Tejun Heo CC: Li Zefan CC: Kamezawa Hiroyuki CC: Johannes Weiner CC: Michal Hocko --- include/net/sock.h | 9 +++++++++ mm/memcontrol.c | 26 ++++++++++++++++++++++++-- net/ipv4/tcp_memcontrol.c | 32 +++++++++++++++++++++++++------- 3 files changed, 58 insertions(+), 9 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index b3ebe6b..5c620bd 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -914,6 +914,15 @@ struct cg_proto { int *memory_pressure; long *sysctl_mem; /* + * active means it is currently active, and new sockets should + * be assigned to cgroups. + * + * activated means it was ever activated, and we need to + * disarm the static keys on destruction + */ + bool activated; + bool active; + /* * memcg field is used to find which memcg we belong directly * Each memcg struct can hold more than one cg_proto, so container_of * won't really cut. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0b4b4c8..d1b0849 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -404,6 +404,7 @@ void sock_update_memcg(struct sock *sk) { if (mem_cgroup_sockets_enabled) { struct mem_cgroup *memcg; + struct cg_proto *cg_proto; BUG_ON(!sk->sk_prot->proto_cgroup); @@ -423,9 +424,10 @@ void sock_update_memcg(struct sock *sk) rcu_read_lock(); memcg = mem_cgroup_from_task(current); - if (!mem_cgroup_is_root(memcg)) { + cg_proto = sk->sk_prot->proto_cgroup(memcg); + if (!mem_cgroup_is_root(memcg) && cg_proto->active) { mem_cgroup_get(memcg); - sk->sk_cgrp = sk->sk_prot->proto_cgroup(memcg); + sk->sk_cgrp = cg_proto; } rcu_read_unlock(); } @@ -442,6 +444,14 @@ void sock_release_memcg(struct sock *sk) } } +static void disarm_static_keys(struct mem_cgroup *memcg) +{ +#ifdef CONFIG_INET + if (memcg->tcp_mem.cg_proto.activated) + static_key_slow_dec(&memcg_socket_limit_enabled); +#endif +} + #ifdef CONFIG_INET struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) { @@ -452,6 +462,11 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) } EXPORT_SYMBOL(tcp_proto_cgroup); #endif /* CONFIG_INET */ +#else +static inline void disarm_static_keys(struct mem_cgroup *memcg) +{ +} + #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */ static void drain_all_stock_async(struct mem_cgroup *memcg); @@ -4836,6 +4851,13 @@ static void free_work(struct work_struct *work) int size = sizeof(struct mem_cgroup); memcg = container_of(work, struct mem_cgroup, work_freeing); + /* + * We need to make sure that (at least for now), the jump label + * destruction code runs outside of the cgroup lock. schedule_work() + * will guarantee this happens. Be careful if you need to move this + * disarm_static_keys around + */ + disarm_static_keys(memcg); if (size < PAGE_SIZE) kfree(memcg); else diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c index 1517037..7ea4f79 100644 --- a/net/ipv4/tcp_memcontrol.c +++ b/net/ipv4/tcp_memcontrol.c @@ -74,9 +74,6 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg) percpu_counter_destroy(&tcp->tcp_sockets_allocated); val = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_LIMIT); - - if (val != RESOURCE_MAX) - static_key_slow_dec(&memcg_socket_limit_enabled); } EXPORT_SYMBOL(tcp_destroy_cgroup); @@ -107,10 +104,31 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val) tcp->tcp_prot_mem[i] = min_t(long, val >> PAGE_SHIFT, net->ipv4.sysctl_tcp_mem[i]); - if (val == RESOURCE_MAX && old_lim != RESOURCE_MAX) - static_key_slow_dec(&memcg_socket_limit_enabled); - else if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX) - static_key_slow_inc(&memcg_socket_limit_enabled); + if (val == RESOURCE_MAX) + cg_proto->active = false; + else if (val != RESOURCE_MAX) { + /* + * ->activated needs to be written after the static_key update. + * This is what guarantees that the socket activation function + * is the last one to run. See sock_update_memcg() for details, + * and note that we don't mark any socket as belonging to this + * memcg until that flag is up. + * + * We need to do this, because static_keys will span multiple + * sites, but we can't control their order. If we mark a socket + * as accounted, but the accounting functions are not patched in + * yet, we'll lose accounting. + * + * We never race with the readers in sock_update_memcg(), because + * when this value change, the code to process it is not patched in + * yet. + */ + if (!cg_proto->activated) { + static_key_slow_inc(&memcg_socket_limit_enabled); + cg_proto->activated = true; + } + cg_proto->active = true; + } return 0; } -- 1.7.7.6 From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH v5 1/2] Always free struct memcg through schedule_work() Date: Mon, 14 May 2012 09:56:20 +0900 Message-ID: <4FB05834.1020200@jp.fujitsu.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-2-git-send-email-glommer@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1336767077-25351-2-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Glauber Costa Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko (2012/05/12 5:11), Glauber Costa wrote: > Right now we free struct memcg with kfree right after a > rcu grace period, but defer it if we need to use vfree() to get > rid of that memory area. We do that by need, because we need vfree > to be called in a process context. > > This patch unifies this behavior, by ensuring that even kfree will > happen in a separate thread. The goal is to have a stable place to > call the upcoming jump label destruction function outside the realm > of the complicated and quite far-reaching cgroup lock (that can't be > held when calling neither the cpu_hotplug.lock nor the jump_label_mutex) > > Signed-off-by: Glauber Costa > CC: Tejun Heo > CC: Li Zefan > CC: Kamezawa Hiroyuki > CC: Johannes Weiner > CC: Michal Hocko I think we'll need to revisit this, again. for now, Acked-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Mon, 14 May 2012 09:59:04 +0900 Message-ID: <4FB058D8.6060707@jp.fujitsu.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1336767077-25351-3-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Glauber Costa Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko (2012/05/12 5:11), Glauber Costa wrote: > We call the destroy function when a cgroup starts to be removed, > such as by a rmdir event. > > However, because of our reference counters, some objects are still > inflight. Right now, we are decrementing the static_keys at destroy() > time, meaning that if we get rid of the last static_key reference, > some objects will still have charges, but the code to properly > uncharge them won't be run. > > This becomes a problem specially if it is ever enabled again, because > now new charges will be added to the staled charges making keeping > it pretty much impossible. > > We just need to be careful with the static branch activation: > since there is no particular preferred order of their activation, > we need to make sure that we only start using it after all > call sites are active. This is achieved by having a per-memcg > flag that is only updated after static_key_slow_inc() returns. > At this time, we are sure all sites are active. > > This is made per-memcg, not global, for a reason: > it also has the effect of making socket accounting more > consistent. The first memcg to be limited will trigger static_key() > activation, therefore, accounting. But all the others will then be > accounted no matter what. After this patch, only limited memcgs > will have its sockets accounted. > > [v2: changed a tcp limited flag for a generic proto limited flag ] > [v3: update the current active flag only after the static_key update ] > [v4: disarm_static_keys() inside free_work ] > [v5: got rid of tcp_limit_mutex, now in the static_key interface ] > > Signed-off-by: Glauber Costa > CC: Tejun Heo > CC: Li Zefan > CC: Kamezawa Hiroyuki > CC: Johannes Weiner > CC: Michal Hocko Thank you for your patient works. Acked-by: KAMEZAWA Hiroyuki BTW, what is the relationship between 1/2 and 2/2 ? Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 From: Li Zefan Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Mon, 14 May 2012 09:38:36 +0800 Message-ID: <4FB0621C.3010604@huawei.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7BIT Return-path: In-reply-to: <1336767077-25351-3-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Glauber Costa Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Johannes Weiner , Michal Hocko > +static void disarm_static_keys(struct mem_cgroup *memcg) > +{ > +#ifdef CONFIG_INET > + if (memcg->tcp_mem.cg_proto.activated) > + static_key_slow_dec(&memcg_socket_limit_enabled); > +#endif > +} Move this inside the ifdef/endif below ? Otherwise I think you'll get compile error if !CONFIG_INET... > + > #ifdef CONFIG_INET > struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) > { > @@ -452,6 +462,11 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) > } > EXPORT_SYMBOL(tcp_proto_cgroup); > #endif /* CONFIG_INET */ > +#else > +static inline void disarm_static_keys(struct mem_cgroup *memcg) > +{ > +} > + > #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */ From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Mon, 14 May 2012 11:12:50 -0700 Message-ID: <20120514181250.GD2366@google.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=fkQggLsunezBTwf71vCXa8P3KBzjeVCt3a0OV4ZwrNU=; b=C1dVIgl0/wC+WhfESyr7R01Dw5Ltq7XUJhg4751iCX4Vv5h8DCAe6sySBFXaBCUZ/q 93JgWf6ApgG6C8TgAtq51s4HBP058F7GWFAaHundoDCuvdhZaw3AiPCBXGId71JpnDrY ZIOhVALw656SZEtK/p4VpwIMULB8HTN8J30oBBq2S7JKNBWDdwue7hHmli5mvwasAUqX 3T2gs90z9PLpyHjDF5XZQHpqzaA8NbUFQjCGXASigF+KQ7JG5dD/muHoh8w4EJ4vRSaL 3yaltkbbE+7zOSYT4doltdWpDbl3pc+fCgmAqkGhWi0xalRiud9zPaxIQD0veutO7Jky xq6w== Content-Disposition: inline In-Reply-To: <1336767077-25351-3-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Glauber Costa Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Li Zefan , Johannes Weiner , Michal Hocko On Fri, May 11, 2012 at 05:11:17PM -0300, Glauber Costa wrote: > We call the destroy function when a cgroup starts to be removed, > such as by a rmdir event. > > However, because of our reference counters, some objects are still > inflight. Right now, we are decrementing the static_keys at destroy() > time, meaning that if we get rid of the last static_key reference, > some objects will still have charges, but the code to properly > uncharge them won't be run. > > This becomes a problem specially if it is ever enabled again, because > now new charges will be added to the staled charges making keeping > it pretty much impossible. > > We just need to be careful with the static branch activation: > since there is no particular preferred order of their activation, > we need to make sure that we only start using it after all > call sites are active. This is achieved by having a per-memcg > flag that is only updated after static_key_slow_inc() returns. > At this time, we are sure all sites are active. > > This is made per-memcg, not global, for a reason: > it also has the effect of making socket accounting more > consistent. The first memcg to be limited will trigger static_key() > activation, therefore, accounting. But all the others will then be > accounted no matter what. After this patch, only limited memcgs > will have its sockets accounted. > > [v2: changed a tcp limited flag for a generic proto limited flag ] > [v3: update the current active flag only after the static_key update ] > [v4: disarm_static_keys() inside free_work ] > [v5: got rid of tcp_limit_mutex, now in the static_key interface ] > > Signed-off-by: Glauber Costa > CC: Tejun Heo > CC: Li Zefan > CC: Kamezawa Hiroyuki > CC: Johannes Weiner > CC: Michal Hocko Generally looks sane to me. Please feel free to addmy Reviewed-by. > + if (val == RESOURCE_MAX) > + cg_proto->active = false; > + else if (val != RESOURCE_MAX) { Minor nitpick: CodingStyle says not to omit { } if other branches need them. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 10:03:08 +0400 Message-ID: <4FB3431C.3050402@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB058D8.6060707@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4FB058D8.6060707-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On 05/14/2012 04:59 AM, KAMEZAWA Hiroyuki wrote: > (2012/05/12 5:11), Glauber Costa wrote: > >> We call the destroy function when a cgroup starts to be removed, >> such as by a rmdir event. >> >> However, because of our reference counters, some objects are still >> inflight. Right now, we are decrementing the static_keys at destroy() >> time, meaning that if we get rid of the last static_key reference, >> some objects will still have charges, but the code to properly >> uncharge them won't be run. >> >> This becomes a problem specially if it is ever enabled again, because >> now new charges will be added to the staled charges making keeping >> it pretty much impossible. >> >> We just need to be careful with the static branch activation: >> since there is no particular preferred order of their activation, >> we need to make sure that we only start using it after all >> call sites are active. This is achieved by having a per-memcg >> flag that is only updated after static_key_slow_inc() returns. >> At this time, we are sure all sites are active. >> >> This is made per-memcg, not global, for a reason: >> it also has the effect of making socket accounting more >> consistent. The first memcg to be limited will trigger static_key() >> activation, therefore, accounting. But all the others will then be >> accounted no matter what. After this patch, only limited memcgs >> will have its sockets accounted. >> >> [v2: changed a tcp limited flag for a generic proto limited flag ] >> [v3: update the current active flag only after the static_key update ] >> [v4: disarm_static_keys() inside free_work ] >> [v5: got rid of tcp_limit_mutex, now in the static_key interface ] >> >> Signed-off-by: Glauber Costa >> CC: Tejun Heo >> CC: Li Zefan >> CC: Kamezawa Hiroyuki >> CC: Johannes Weiner >> CC: Michal Hocko > > > Thank you for your patient works. > > Acked-by: KAMEZAWA Hiroyuki > > BTW, what is the relationship between 1/2 and 2/2 ? Can't do jump label patching inside an interrupt handler. They need to happen when we free the structure, and I was about to add a worker myself when I found out we already have one: just we don't always use it. Before we merge it, let me just make sure the issue with config Li pointed out don't exist. I did test it, but since I've reposted this many times with multiple tiny changes - the type that will usually get us killed, I'd be more comfortable with an extra round of testing if someone spotted a possibility. Who is merging this fix, btw ? I find it to be entirely memcg related, even though it touches a file in net (but a file with only memcg code in it) From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 11:03:47 +0400 Message-ID: <4FB35153.3080309@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB0621C.3010604@huawei.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4FB0621C.3010604-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Li Zefan Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Johannes Weiner , Michal Hocko On 05/14/2012 05:38 AM, Li Zefan wrote: >> +static void disarm_static_keys(struct mem_cgroup *memcg) > >> +{ >> +#ifdef CONFIG_INET >> + if (memcg->tcp_mem.cg_proto.activated) >> + static_key_slow_dec(&memcg_socket_limit_enabled); >> +#endif >> +} > > > Move this inside the ifdef/endif below ? > > Otherwise I think you'll get compile error if !CONFIG_INET... I don't fully get it. We are supposed to provide a version of it for CONFIG_CGROUP_MEM_RES_CTLR_KMEM and an empty version for !CONFIG_CGROUP_MEM_RES_CTLR_KMEM Inside the first, we take an action for CONFIG_INET, and no action for !CONFIG_INET. Bear in mind that the slab patches will add another test to that place, and that's why I am doing it this way from the beginning. Well, that said, I not only can be wrong, I very frequently am. But I just compiled this one with and without CONFIG_INET, and it seems to be going alright. >> + >> #ifdef CONFIG_INET >> struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) >> { >> @@ -452,6 +462,11 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) >> } >> EXPORT_SYMBOL(tcp_proto_cgroup); >> #endif /* CONFIG_INET */ >> +#else >> +static inline void disarm_static_keys(struct mem_cgroup *memcg) >> +{ >> +} >> + >> #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */ > > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 11:04:43 +0400 Message-ID: <4FB3518B.3090205@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB058D8.6060707@jp.fujitsu.com> <4FB3431C.3050402@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4FB3431C.3050402-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On 05/16/2012 10:03 AM, Glauber Costa wrote: >> BTW, what is the relationship between 1/2 and 2/2 ? > Can't do jump label patching inside an interrupt handler. They need to > happen when we free the structure, and I was about to add a worker > myself when I found out we already have one: just we don't always use it. > > Before we merge it, let me just make sure the issue with config Li > pointed out don't exist. I did test it, but since I've reposted this > many times with multiple tiny changes - the type that will usually get > us killed, I'd be more comfortable with an extra round of testing if > someone spotted a possibility. > > Who is merging this fix, btw ? > I find it to be entirely memcg related, even though it touches a file in > net (but a file with only memcg code in it) > For the record, I compiled test it many times, and the problem that Li wondered about seems not to exist. From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 17:28:29 +0900 Message-ID: <4FB3652D.2040909@jp.fujitsu.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB058D8.6060707@jp.fujitsu.com> <4FB3431C.3050402@parallels.com> <4FB3518B.3090205@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4FB3518B.3090205-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Glauber Costa Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko , David Miller , Andrew Morton (2012/05/16 16:04), Glauber Costa wrote: > On 05/16/2012 10:03 AM, Glauber Costa wrote: >>> BTW, what is the relationship between 1/2 and 2/2 ? >> Can't do jump label patching inside an interrupt handler. They need to >> happen when we free the structure, and I was about to add a worker >> myself when I found out we already have one: just we don't always use it. >> >> Before we merge it, let me just make sure the issue with config Li >> pointed out don't exist. I did test it, but since I've reposted this >> many times with multiple tiny changes - the type that will usually get >> us killed, I'd be more comfortable with an extra round of testing if >> someone spotted a possibility. >> >> Who is merging this fix, btw ? >> I find it to be entirely memcg related, even though it touches a file in >> net (but a file with only memcg code in it) >> > > For the record, I compiled test it many times, and the problem that Li > wondered about seems not to exist. > Ah...Hmm.....I guess dependency problem will be found in -mm if any rather than netdev... David, can this bug-fix patch goes via -mm tree ? Or will you pick up ? CC'ed David Miller and Andrew Morton. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 12:30:25 +0400 Message-ID: <4FB365A1.6040101@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB058D8.6060707@jp.fujitsu.com> <4FB3431C.3050402@parallels.com> <4FB3518B.3090205@parallels.com> <4FB3652D.2040909@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4FB3652D.2040909-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko , David Miller , Andrew Morton On 05/16/2012 12:28 PM, KAMEZAWA Hiroyuki wrote: >> For the record, I compiled test it many times, and the problem that Li >> > wondered about seems not to exist. >> > > Ah...Hmm.....I guess dependency problem will be found in -mm if any rather than > netdev... Yes. As I said, this only touches stuff in core memcg and the memcg specific file. Any conflicts should come from other memcg fixes that may have got into the tree... > David, can this bug-fix patch goes via -mm tree ? Or will you pick up ? > > CC'ed David Miller and Andrew Morton. > > Thanks, > -Kame > > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 12:37:32 +0400 Message-ID: <4FB3674C.2030604@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB058D8.6060707@jp.fujitsu.com> <4FB3431C.3050402@parallels.com> <4FB3518B.3090205@parallels.com> <4FB3652D.2040909@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4FB3652D.2040909-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko , David Miller , Andrew Morton On 05/16/2012 12:28 PM, KAMEZAWA Hiroyuki wrote: > (2012/05/16 16:04), Glauber Costa wrote: > >> On 05/16/2012 10:03 AM, Glauber Costa wrote: >>>> BTW, what is the relationship between 1/2 and 2/2 ? >>> Can't do jump label patching inside an interrupt handler. They need to >>> happen when we free the structure, and I was about to add a worker >>> myself when I found out we already have one: just we don't always use it. >>> >>> Before we merge it, let me just make sure the issue with config Li >>> pointed out don't exist. I did test it, but since I've reposted this >>> many times with multiple tiny changes - the type that will usually get >>> us killed, I'd be more comfortable with an extra round of testing if >>> someone spotted a possibility. >>> >>> Who is merging this fix, btw ? >>> I find it to be entirely memcg related, even though it touches a file in >>> net (but a file with only memcg code in it) >>> >> >> For the record, I compiled test it many times, and the problem that Li >> wondered about seems not to exist. >> > > Ah...Hmm.....I guess dependency problem will be found in -mm if any rather than > netdev... > > David, can this bug-fix patch goes via -mm tree ? Or will you pick up ? > Another thing: Patch 2 in this series is of course dependent on patch 1 - which lives 100 % in memcg core. Without that, lockdep will scream while disabling the static key. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 13:57:55 -0700 Message-ID: <20120516135755.cdcdf9ba.akpm@linux-foundation.org> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB0621C.3010604@huawei.com> <4FB35153.3080309@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4FB35153.3080309@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Glauber Costa Cc: Li Zefan , cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Johannes Weiner , Michal Hocko On Wed, 16 May 2012 11:03:47 +0400 Glauber Costa wrote: > On 05/14/2012 05:38 AM, Li Zefan wrote: > >> +static void disarm_static_keys(struct mem_cgroup *memcg) > > > >> +{ > >> +#ifdef CONFIG_INET > >> + if (memcg->tcp_mem.cg_proto.activated) > >> + static_key_slow_dec(&memcg_socket_limit_enabled); > >> +#endif > >> +} > > > > > > Move this inside the ifdef/endif below ? > > > > Otherwise I think you'll get compile error if !CONFIG_INET... > > I don't fully get it. > > We are supposed to provide a version of it for > CONFIG_CGROUP_MEM_RES_CTLR_KMEM and an empty version for > !CONFIG_CGROUP_MEM_RES_CTLR_KMEM > > Inside the first, we take an action for CONFIG_INET, and no action for > !CONFIG_INET. > > Bear in mind that the slab patches will add another test to that place, > and that's why I am doing it this way from the beginning. > > Well, that said, I not only can be wrong, I very frequently am. > > But I just compiled this one with and without CONFIG_INET, and it seems > to be going alright. > Yes, the ifdeffings in that area are rather nasty. I wonder if it would be simpler to do away with the ifdef nesting. At the top-level, just do #if defined(CONFIG_CGROUP_MEM_RES_CTLR_KMEM) && defined(CONFIG_INET) static void disarm_static_keys(struct mem_cgroup *memcg) { if (memcg->tcp_mem.cg_proto.activated) static_key_slow_dec(&memcg_socket_limit_enabled); } #else static inline void disarm_static_keys(struct mem_cgroup *memcg) { } #endif The tcp_proto_cgroup() definition could go inside that ifdef as well. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 14:06:37 -0700 Message-ID: <20120516140637.17741df6.akpm@linux-foundation.org> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1336767077-25351-3-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Glauber Costa Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On Fri, 11 May 2012 17:11:17 -0300 Glauber Costa wrote: > We call the destroy function when a cgroup starts to be removed, > such as by a rmdir event. > > However, because of our reference counters, some objects are still > inflight. Right now, we are decrementing the static_keys at destroy() > time, meaning that if we get rid of the last static_key reference, > some objects will still have charges, but the code to properly > uncharge them won't be run. > > This becomes a problem specially if it is ever enabled again, because > now new charges will be added to the staled charges making keeping > it pretty much impossible. > > We just need to be careful with the static branch activation: > since there is no particular preferred order of their activation, > we need to make sure that we only start using it after all > call sites are active. This is achieved by having a per-memcg > flag that is only updated after static_key_slow_inc() returns. > At this time, we are sure all sites are active. > > This is made per-memcg, not global, for a reason: > it also has the effect of making socket accounting more > consistent. The first memcg to be limited will trigger static_key() > activation, therefore, accounting. But all the others will then be > accounted no matter what. After this patch, only limited memcgs > will have its sockets accounted. > > ... > > @@ -107,10 +104,31 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val) > tcp->tcp_prot_mem[i] = min_t(long, val >> PAGE_SHIFT, > net->ipv4.sysctl_tcp_mem[i]); > > - if (val == RESOURCE_MAX && old_lim != RESOURCE_MAX) > - static_key_slow_dec(&memcg_socket_limit_enabled); > - else if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX) > - static_key_slow_inc(&memcg_socket_limit_enabled); > + if (val == RESOURCE_MAX) > + cg_proto->active = false; > + else if (val != RESOURCE_MAX) { > + /* > + * ->activated needs to be written after the static_key update. > + * This is what guarantees that the socket activation function > + * is the last one to run. See sock_update_memcg() for details, > + * and note that we don't mark any socket as belonging to this > + * memcg until that flag is up. > + * > + * We need to do this, because static_keys will span multiple > + * sites, but we can't control their order. If we mark a socket > + * as accounted, but the accounting functions are not patched in > + * yet, we'll lose accounting. > + * > + * We never race with the readers in sock_update_memcg(), because > + * when this value change, the code to process it is not patched in > + * yet. > + */ > + if (!cg_proto->activated) { > + static_key_slow_inc(&memcg_socket_limit_enabled); > + cg_proto->activated = true; > + } If two threads run this code concurrently, they can both see cg_proto->activated==false and they will both run static_key_slow_inc(). Hopefully there's some locking somewhere which prevents this, but it is unobvious. We should comment this, probably at the cg_proto.activated definition site. Or we should fix the bug ;) > + cg_proto->active = true; > + } > > return 0; > } > > ... > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 14:13:42 -0700 Message-ID: <20120516141342.911931e7.akpm@linux-foundation.org> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1336767077-25351-3-git-send-email-glommer@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Glauber Costa Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On Fri, 11 May 2012 17:11:17 -0300 Glauber Costa wrote: > We call the destroy function when a cgroup starts to be removed, > such as by a rmdir event. > > However, because of our reference counters, some objects are still > inflight. Right now, we are decrementing the static_keys at destroy() > time, meaning that if we get rid of the last static_key reference, > some objects will still have charges, but the code to properly > uncharge them won't be run. > > This becomes a problem specially if it is ever enabled again, because > now new charges will be added to the staled charges making keeping > it pretty much impossible. > > We just need to be careful with the static branch activation: > since there is no particular preferred order of their activation, > we need to make sure that we only start using it after all > call sites are active. This is achieved by having a per-memcg > flag that is only updated after static_key_slow_inc() returns. > At this time, we are sure all sites are active. > > This is made per-memcg, not global, for a reason: > it also has the effect of making socket accounting more > consistent. The first memcg to be limited will trigger static_key() > activation, therefore, accounting. But all the others will then be > accounted no matter what. After this patch, only limited memcgs > will have its sockets accounted. So I'm scratching my head over what the actual bug is, and how important it is. AFAICT it will cause charging stats to exhibit some inaccuracy when memcg's are being torn down? I don't know how serious this in in the real world and so can't decide which kernel version(s) we should fix. When fixing bugs, please always fully describe the bug's end-user impact, so that I and others can make these sorts of decisions. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Thu, 17 May 2012 09:07:11 +0900 Message-ID: <4FB4412F.3050909@jp.fujitsu.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516141342.911931e7.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120516141342.911931e7.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Andrew Morton Cc: Glauber Costa , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko (2012/05/17 6:13), Andrew Morton wrote: > On Fri, 11 May 2012 17:11:17 -0300 > Glauber Costa wrote: > >> We call the destroy function when a cgroup starts to be removed, >> such as by a rmdir event. >> >> However, because of our reference counters, some objects are still >> inflight. Right now, we are decrementing the static_keys at destroy() >> time, meaning that if we get rid of the last static_key reference, >> some objects will still have charges, but the code to properly >> uncharge them won't be run. >> >> This becomes a problem specially if it is ever enabled again, because >> now new charges will be added to the staled charges making keeping >> it pretty much impossible. >> >> We just need to be careful with the static branch activation: >> since there is no particular preferred order of their activation, >> we need to make sure that we only start using it after all >> call sites are active. This is achieved by having a per-memcg >> flag that is only updated after static_key_slow_inc() returns. >> At this time, we are sure all sites are active. >> >> This is made per-memcg, not global, for a reason: >> it also has the effect of making socket accounting more >> consistent. The first memcg to be limited will trigger static_key() >> activation, therefore, accounting. But all the others will then be >> accounted no matter what. After this patch, only limited memcgs >> will have its sockets accounted. > > So I'm scratching my head over what the actual bug is, and how > important it is. AFAICT it will cause charging stats to exhibit some > inaccuracy when memcg's are being torn down? > > I don't know how serious this in in the real world and so can't decide > which kernel version(s) we should fix. > > When fixing bugs, please always fully describe the bug's end-user > impact, so that I and others can make these sorts of decisions. > Ah, this was a bug report from me. tcp accounting can be easily broken. Costa, could you include this ? == tcp memcontrol uses static_branch to optimize limit=RESOURCE_MAX case. If all cgroup's limit=RESOUCE_MAX, resource usage is not accounted. But it's buggy now. For example, do following # while sleep 1;do echo 9223372036854775807 > /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes; echo 300M > /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes; done and run network application under A. tcp's usage is sometimes accounted and sometimes not accounted because of frequent changes of static_branch. Then, you can see broken tcp.usage_in_bytes. WARN_ON() is printed because res_counter->usage goes below 0. == kernel: ------------[ cut here ]---------- kernel: WARNING: at kernel/res_counter.c:96 res_counter_uncharge_locked+0x37/0x40() kernel: Pid: 17753, comm: bash Tainted: G W 3.3.0+ #99 kernel: Call Trace: kernel: [] warn_slowpath_common+0x7f/0xc0 kernel: [] ? rb_reserve__next_event+0x68/0x470 kernel: [] warn_slowpath_null+0x1a/0x20 kernel: [] res_counter_uncharge_locked+0x37/0x40 ... == From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Thu, 17 May 2012 07:06:52 +0400 Message-ID: <4FB46B4C.3000307@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120516140637.17741df6.akpm@linux-foundation.org> Sender: netdev-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Andrew Morton Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On 05/17/2012 01:06 AM, Andrew Morton wrote: > On Fri, 11 May 2012 17:11:17 -0300 > Glauber Costa wrote: > >> We call the destroy function when a cgroup starts to be removed, >> such as by a rmdir event. >> >> However, because of our reference counters, some objects are still >> inflight. Right now, we are decrementing the static_keys at destroy() >> time, meaning that if we get rid of the last static_key reference, >> some objects will still have charges, but the code to properly >> uncharge them won't be run. >> >> This becomes a problem specially if it is ever enabled again, because >> now new charges will be added to the staled charges making keeping >> it pretty much impossible. >> >> We just need to be careful with the static branch activation: >> since there is no particular preferred order of their activation, >> we need to make sure that we only start using it after all >> call sites are active. This is achieved by having a per-memcg >> flag that is only updated after static_key_slow_inc() returns. >> At this time, we are sure all sites are active. >> >> This is made per-memcg, not global, for a reason: >> it also has the effect of making socket accounting more >> consistent. The first memcg to be limited will trigger static_key() >> activation, therefore, accounting. But all the others will then be >> accounted no matter what. After this patch, only limited memcgs >> will have its sockets accounted. >> >> ... >> >> @@ -107,10 +104,31 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val) >> tcp->tcp_prot_mem[i] = min_t(long, val>> PAGE_SHIFT, >> net->ipv4.sysctl_tcp_mem[i]); >> >> - if (val == RESOURCE_MAX&& old_lim != RESOURCE_MAX) >> - static_key_slow_dec(&memcg_socket_limit_enabled); >> - else if (old_lim == RESOURCE_MAX&& val != RESOURCE_MAX) >> - static_key_slow_inc(&memcg_socket_limit_enabled); >> + if (val == RESOURCE_MAX) >> + cg_proto->active = false; >> + else if (val != RESOURCE_MAX) { >> + /* >> + * ->activated needs to be written after the static_key update. >> + * This is what guarantees that the socket activation function >> + * is the last one to run. See sock_update_memcg() for details, >> + * and note that we don't mark any socket as belonging to this >> + * memcg until that flag is up. >> + * >> + * We need to do this, because static_keys will span multiple >> + * sites, but we can't control their order. If we mark a socket >> + * as accounted, but the accounting functions are not patched in >> + * yet, we'll lose accounting. >> + * >> + * We never race with the readers in sock_update_memcg(), because >> + * when this value change, the code to process it is not patched in >> + * yet. >> + */ >> + if (!cg_proto->activated) { >> + static_key_slow_inc(&memcg_socket_limit_enabled); >> + cg_proto->activated = true; >> + } > > If two threads run this code concurrently, they can both see > cg_proto->activated==false and they will both run > static_key_slow_inc(). > > Hopefully there's some locking somewhere which prevents this, but it is > unobvious. We should comment this, probably at the cg_proto.activated > definition site. Or we should fix the bug ;) > If that happens, locking in static_key_slow_inc will prevent any damage. My previous version had explicit code to prevent that, but we were pointed out that this is already part of the static_key expectations, so that was dropped. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Thu, 17 May 2012 07:09:29 +0400 Message-ID: <4FB46BE9.6080503@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516141342.911931e7.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120516141342.911931e7.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Andrew Morton Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On 05/17/2012 01:13 AM, Andrew Morton wrote: > On Fri, 11 May 2012 17:11:17 -0300 > Glauber Costa wrote: > >> We call the destroy function when a cgroup starts to be removed, >> such as by a rmdir event. >> >> However, because of our reference counters, some objects are still >> inflight. Right now, we are decrementing the static_keys at destroy() >> time, meaning that if we get rid of the last static_key reference, >> some objects will still have charges, but the code to properly >> uncharge them won't be run. >> >> This becomes a problem specially if it is ever enabled again, because >> now new charges will be added to the staled charges making keeping >> it pretty much impossible. >> >> We just need to be careful with the static branch activation: >> since there is no particular preferred order of their activation, >> we need to make sure that we only start using it after all >> call sites are active. This is achieved by having a per-memcg >> flag that is only updated after static_key_slow_inc() returns. >> At this time, we are sure all sites are active. >> >> This is made per-memcg, not global, for a reason: >> it also has the effect of making socket accounting more >> consistent. The first memcg to be limited will trigger static_key() >> activation, therefore, accounting. But all the others will then be >> accounted no matter what. After this patch, only limited memcgs >> will have its sockets accounted. > > So I'm scratching my head over what the actual bug is, and how > important it is. AFAICT it will cause charging stats to exhibit some > inaccuracy when memcg's are being torn down? > > I don't know how serious this in in the real world and so can't decide > which kernel version(s) we should fix. > > When fixing bugs, please always fully describe the bug's end-user > impact, so that I and others can make these sorts of decisions. Hi Andrew. I believe that was described in patch 0/2 ? In any case, this is something we need fixed, but it is not -stable material or anything. The bug leads to misaccounting when we quickly enable and disable limit in a loop. We have a synthetic script to demonstrate that. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 22:37:15 -0700 Message-ID: <20120516223715.5d1b4385.akpm@linux-foundation.org> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> <4FB46B4C.3000307@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4FB46B4C.3000307-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Glauber Costa Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On Thu, 17 May 2012 07:06:52 +0400 Glauber Costa wrote: > ... > >> + else if (val != RESOURCE_MAX) { > >> + /* > >> + * ->activated needs to be written after the static_key update. > >> + * This is what guarantees that the socket activation function > >> + * is the last one to run. See sock_update_memcg() for details, > >> + * and note that we don't mark any socket as belonging to this > >> + * memcg until that flag is up. > >> + * > >> + * We need to do this, because static_keys will span multiple > >> + * sites, but we can't control their order. If we mark a socket > >> + * as accounted, but the accounting functions are not patched in > >> + * yet, we'll lose accounting. > >> + * > >> + * We never race with the readers in sock_update_memcg(), because > >> + * when this value change, the code to process it is not patched in > >> + * yet. > >> + */ > >> + if (!cg_proto->activated) { > >> + static_key_slow_inc(&memcg_socket_limit_enabled); > >> + cg_proto->activated = true; > >> + } > > > > If two threads run this code concurrently, they can both see > > cg_proto->activated==false and they will both run > > static_key_slow_inc(). > > > > Hopefully there's some locking somewhere which prevents this, but it is > > unobvious. We should comment this, probably at the cg_proto.activated > > definition site. Or we should fix the bug ;) > > > If that happens, locking in static_key_slow_inc will prevent any damage. > My previous version had explicit code to prevent that, but we were > pointed out that this is already part of the static_key expectations, so > that was dropped. This makes no sense. If two threads run that code concurrently, key->enabled gets incremented twice. Nobody anywhere has a record that this happened so it cannot be undone. key->enabled is now in an unknown state. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Thu, 17 May 2012 13:52:13 +0400 Message-ID: <4FB4CA4D.50608@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> <4FB46B4C.3000307@parallels.com> <20120516223715.5d1b4385.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120516223715.5d1b4385.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Andrew Morton Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On 05/17/2012 09:37 AM, Andrew Morton wrote: >> > If that happens, locking in static_key_slow_inc will prevent any damage. >> > My previous version had explicit code to prevent that, but we were >> > pointed out that this is already part of the static_key expectations, so >> > that was dropped. > This makes no sense. If two threads run that code concurrently, > key->enabled gets incremented twice. Nobody anywhere has a record that > this happened so it cannot be undone. key->enabled is now in an > unknown state. Kame, Tejun, Andrew is right. It seems we will need that mutex after all. Just this is not a race, and neither something that should belong in the static_branch interface. We want to make sure that enabled is not updated before the jump label update, because we need a specific ordering guarantee at the patched sites. And *that*, the interface guarantees, and we were wrong to believe it did not. That is a correction issue for the accounting, and that part is right. But when we disarm it, we'll need to make sure that happened only once, otherwise we may never unpatch it. That, or we'd need that to be a counter. The jump label interface does not - and should not - keep track of how many updates happened to a key. That's the role of whoever is using it. If you agree with the above, I'll send this patch again with the correction. Andrew, thank you very much. Do you spot anything else here? From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Thu, 17 May 2012 19:18:09 +0900 Message-ID: <4FB4D061.10406@jp.fujitsu.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> <4FB46B4C.3000307@parallels.com> <20120516223715.5d1b4385.akpm@linux-foundation.org> <4FB4CA4D.50608@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4FB4CA4D.50608@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Glauber Costa Cc: Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko (2012/05/17 18:52), Glauber Costa wrote: > On 05/17/2012 09:37 AM, Andrew Morton wrote: >>>> If that happens, locking in static_key_slow_inc will prevent any damage. >>>> My previous version had explicit code to prevent that, but we were >>>> pointed out that this is already part of the static_key expectations, so >>>> that was dropped. >> This makes no sense. If two threads run that code concurrently, >> key->enabled gets incremented twice. Nobody anywhere has a record that >> this happened so it cannot be undone. key->enabled is now in an >> unknown state. > > Kame, Tejun, > > Andrew is right. It seems we will need that mutex after all. Just this > is not a race, and neither something that should belong in the > static_branch interface. > Hmm....how about having res_counter_xchg_limit(res, &old_limit, new_limit); if (!cg_proto->updated && old_limit == RESOURCE_MAX) ....update labels... Then, no mutex overhead maybe and activated will be updated only once. Ah, but please fix in a way you like. Above is an example. Thanks, -Kame (*) I'm sorry I won't be able to read e-mails, tomorrow. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Thu, 17 May 2012 14:22:05 +0400 Message-ID: <4FB4D14D.4020303@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> <4FB46B4C.3000307@parallels.com> <20120516223715.5d1b4385.akpm@linux-foundation.org> <4FB4CA4D.50608@parallels.com> <4FB4D061.10406@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4FB4D061.10406-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: KAMEZAWA Hiroyuki Cc: Andrew Morton , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On 05/17/2012 02:18 PM, KAMEZAWA Hiroyuki wrote: > (2012/05/17 18:52), Glauber Costa wrote: > >> On 05/17/2012 09:37 AM, Andrew Morton wrote: >>>>> If that happens, locking in static_key_slow_inc will prevent any damage. >>>>> My previous version had explicit code to prevent that, but we were >>>>> pointed out that this is already part of the static_key expectations, so >>>>> that was dropped. >>> This makes no sense. If two threads run that code concurrently, >>> key->enabled gets incremented twice. Nobody anywhere has a record that >>> this happened so it cannot be undone. key->enabled is now in an >>> unknown state. >> >> Kame, Tejun, >> >> Andrew is right. It seems we will need that mutex after all. Just this >> is not a race, and neither something that should belong in the >> static_branch interface. >> > > > Hmm....how about having > > res_counter_xchg_limit(res,&old_limit, new_limit); > > if (!cg_proto->updated&& old_limit == RESOURCE_MAX) > ....update labels... > > Then, no mutex overhead maybe and activated will be updated only once. > Ah, but please fix in a way you like. Above is an example. I think a mutex is a lot cleaner than adding a new function to the res_counter interface. We could do a counter, and then later decrement the key until the counter reaches zero, but between those two, I still think a mutex here is preferable. Only that, instead of coming up with a mutex of ours, we could export and reuse set_limit_mutex from memcontrol.c > Thanks, > -Kame > (*) I'm sorry I won't be able to read e-mails, tomorrow. > Ok Kame. I am not in a terrible hurry to fix this, it doesn't seem to be hurting any real workload. From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Thu, 17 May 2012 19:27:09 +0900 Message-ID: <4FB4D27D.7020009@jp.fujitsu.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> <4FB46B4C.3000307@parallels.com> <20120516223715.5d1b4385.akpm@linux-foundation.org> <4FB4CA4D.50608@parallels.com> <4FB4D061.10406@jp.fujitsu.com> <4FB4D14D.4020303@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4FB4D14D.4020303-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Glauber Costa Cc: Andrew Morton , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko (2012/05/17 19:22), Glauber Costa wrote: > On 05/17/2012 02:18 PM, KAMEZAWA Hiroyuki wrote: >> (2012/05/17 18:52), Glauber Costa wrote: >> >>> On 05/17/2012 09:37 AM, Andrew Morton wrote: >>>>>> If that happens, locking in static_key_slow_inc will prevent any damage. >>>>>> My previous version had explicit code to prevent that, but we were >>>>>> pointed out that this is already part of the static_key expectations, so >>>>>> that was dropped. >>>> This makes no sense. If two threads run that code concurrently, >>>> key->enabled gets incremented twice. Nobody anywhere has a record that >>>> this happened so it cannot be undone. key->enabled is now in an >>>> unknown state. >>> >>> Kame, Tejun, >>> >>> Andrew is right. It seems we will need that mutex after all. Just this >>> is not a race, and neither something that should belong in the >>> static_branch interface. >>> >> >> >> Hmm....how about having >> >> res_counter_xchg_limit(res,&old_limit, new_limit); >> >> if (!cg_proto->updated&& old_limit == RESOURCE_MAX) >> ....update labels... >> >> Then, no mutex overhead maybe and activated will be updated only once. >> Ah, but please fix in a way you like. Above is an example. > > I think a mutex is a lot cleaner than adding a new function to the > res_counter interface. > > We could do a counter, and then later decrement the key until the > counter reaches zero, but between those two, I still think a mutex here > is preferable. > > Only that, instead of coming up with a mutex of ours, we could export > and reuse set_limit_mutex from memcontrol.c > ok, please. thx, -Kame > >> Thanks, >> -Kame >> (*) I'm sorry I won't be able to read e-mails, tomorrow. >> > Ok Kame. I am not in a terrible hurry to fix this, it doesn't seem to be > hurting any real workload. > > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Thu, 17 May 2012 08:19:47 -0700 Message-ID: <20120517151947.GI21275@google.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> <4FB46B4C.3000307@parallels.com> <20120516223715.5d1b4385.akpm@linux-foundation.org> <4FB4CA4D.50608@parallels.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=VpxeSh5Sog8MOVaa0E6Hj8mUA/BItUiXDmUI/OkYYJg=; b=T5r6v1w99UevrNNs7IYhLe/Hd3k2CbVfQxd6xdONk/U5Unl5ySryLcCy3x+1a9/o/H DGXCKYcEsySExMeEp+48AZF4UJKsJmbIiMVGguAbmDQZ9qt5QzIahbJk33+6c8VXO9iu zNmarlBM9OWgMEu8y72YwfJLuHkQNrs8AY8iK0tjs248hPcEcpDC+/j0zpinKkNVSZ7F u37K991NfZZ3POv1oHYPgjDU3PcrwRQeEpQhoXMx+jQazoQ3pOoeA26aL21T1P8HnwMl J7IIgFxXTiBkCuRR6L/imLme4J6XK2qag+pZzJbuLXJ09oLiywdSzMFvh953REM8ytAe eNPA== Content-Disposition: inline In-Reply-To: <4FB4CA4D.50608@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Glauber Costa Cc: Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Li Zefan , Johannes Weiner , Michal Hocko On Thu, May 17, 2012 at 01:52:13PM +0400, Glauber Costa wrote: > Andrew is right. It seems we will need that mutex after all. Just > this is not a race, and neither something that should belong in the > static_branch interface. Yeah, with a completely different comment. It just needs to wrap ->activated alteration and static key inc/dec, right? Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Thu, 17 May 2012 10:02:53 -0700 Message-ID: <20120517100253.3b4a1a20.akpm@linux-foundation.org> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> <4FB46B4C.3000307@parallels.com> <20120516223715.5d1b4385.akpm@linux-foundation.org> <4FB4CA4D.50608@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4FB4CA4D.50608@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Glauber Costa Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On Thu, 17 May 2012 13:52:13 +0400 Glauber Costa wrote: > Andrew is right. It seems we will need that mutex after all. Just this > is not a race, and neither something that should belong in the > static_branch interface. Well, a mutex is one way. Or you could do something like if (!test_and_set_bit(CGPROTO_ACTIVATED, &cg_proto->flags)) static_key_slow_inc(&memcg_socket_limit_enabled); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx157.postini.com [74.125.245.157]) by kanga.kvack.org (Postfix) with SMTP id DFCF68D0001 for ; Fri, 11 May 2012 16:13:36 -0400 (EDT) From: Glauber Costa Subject: [PATCH v5 1/2] Always free struct memcg through schedule_work() Date: Fri, 11 May 2012 17:11:16 -0300 Message-Id: <1336767077-25351-2-git-send-email-glommer@parallels.com> In-Reply-To: <1336767077-25351-1-git-send-email-glommer@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: To: cgroups@vger.kernel.org Cc: linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Glauber Costa , Johannes Weiner , Michal Hocko Right now we free struct memcg with kfree right after a rcu grace period, but defer it if we need to use vfree() to get rid of that memory area. We do that by need, because we need vfree to be called in a process context. This patch unifies this behavior, by ensuring that even kfree will happen in a separate thread. The goal is to have a stable place to call the upcoming jump label destruction function outside the realm of the complicated and quite far-reaching cgroup lock (that can't be held when calling neither the cpu_hotplug.lock nor the jump_label_mutex) Signed-off-by: Glauber Costa CC: Tejun Heo CC: Li Zefan CC: Kamezawa Hiroyuki CC: Johannes Weiner CC: Michal Hocko --- mm/memcontrol.c | 24 +++++++++++++----------- 1 files changed, 13 insertions(+), 11 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 932a734..0b4b4c8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -245,8 +245,8 @@ struct mem_cgroup { */ struct rcu_head rcu_freeing; /* - * But when using vfree(), that cannot be done at - * interrupt time, so we must then queue the work. + * We also need some space for a worker in deferred freeing. + * By the time we call it, rcu_freeing is not longer in use. */ struct work_struct work_freeing; }; @@ -4826,23 +4826,28 @@ out_free: } /* - * Helpers for freeing a vzalloc()ed mem_cgroup by RCU, + * Helpers for freeing a kmalloc()ed/vzalloc()ed mem_cgroup by RCU, * but in process context. The work_freeing structure is overlaid * on the rcu_freeing structure, which itself is overlaid on memsw. */ -static void vfree_work(struct work_struct *work) +static void free_work(struct work_struct *work) { struct mem_cgroup *memcg; + int size = sizeof(struct mem_cgroup); memcg = container_of(work, struct mem_cgroup, work_freeing); - vfree(memcg); + if (size < PAGE_SIZE) + kfree(memcg); + else + vfree(memcg); } -static void vfree_rcu(struct rcu_head *rcu_head) + +static void free_rcu(struct rcu_head *rcu_head) { struct mem_cgroup *memcg; memcg = container_of(rcu_head, struct mem_cgroup, rcu_freeing); - INIT_WORK(&memcg->work_freeing, vfree_work); + INIT_WORK(&memcg->work_freeing, free_work); schedule_work(&memcg->work_freeing); } @@ -4868,10 +4873,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg) free_mem_cgroup_per_zone_info(memcg, node); free_percpu(memcg->stat); - if (sizeof(struct mem_cgroup) < PAGE_SIZE) - kfree_rcu(memcg, rcu_freeing); - else - call_rcu(&memcg->rcu_freeing, vfree_rcu); + call_rcu(&memcg->rcu_freeing, free_rcu); } static void mem_cgroup_get(struct mem_cgroup *memcg) -- 1.7.7.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx204.postini.com [74.125.245.204]) by kanga.kvack.org (Postfix) with SMTP id 8A1FC8D0009 for ; Fri, 11 May 2012 16:13:41 -0400 (EDT) From: Glauber Costa Subject: [PATCH v5 2/2] decrement static keys on real destroy time Date: Fri, 11 May 2012 17:11:17 -0300 Message-Id: <1336767077-25351-3-git-send-email-glommer@parallels.com> In-Reply-To: <1336767077-25351-1-git-send-email-glommer@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: To: cgroups@vger.kernel.org Cc: linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Glauber Costa , Johannes Weiner , Michal Hocko We call the destroy function when a cgroup starts to be removed, such as by a rmdir event. However, because of our reference counters, some objects are still inflight. Right now, we are decrementing the static_keys at destroy() time, meaning that if we get rid of the last static_key reference, some objects will still have charges, but the code to properly uncharge them won't be run. This becomes a problem specially if it is ever enabled again, because now new charges will be added to the staled charges making keeping it pretty much impossible. We just need to be careful with the static branch activation: since there is no particular preferred order of their activation, we need to make sure that we only start using it after all call sites are active. This is achieved by having a per-memcg flag that is only updated after static_key_slow_inc() returns. At this time, we are sure all sites are active. This is made per-memcg, not global, for a reason: it also has the effect of making socket accounting more consistent. The first memcg to be limited will trigger static_key() activation, therefore, accounting. But all the others will then be accounted no matter what. After this patch, only limited memcgs will have its sockets accounted. [v2: changed a tcp limited flag for a generic proto limited flag ] [v3: update the current active flag only after the static_key update ] [v4: disarm_static_keys() inside free_work ] [v5: got rid of tcp_limit_mutex, now in the static_key interface ] Signed-off-by: Glauber Costa CC: Tejun Heo CC: Li Zefan CC: Kamezawa Hiroyuki CC: Johannes Weiner CC: Michal Hocko --- include/net/sock.h | 9 +++++++++ mm/memcontrol.c | 26 ++++++++++++++++++++++++-- net/ipv4/tcp_memcontrol.c | 32 +++++++++++++++++++++++++------- 3 files changed, 58 insertions(+), 9 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index b3ebe6b..5c620bd 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -914,6 +914,15 @@ struct cg_proto { int *memory_pressure; long *sysctl_mem; /* + * active means it is currently active, and new sockets should + * be assigned to cgroups. + * + * activated means it was ever activated, and we need to + * disarm the static keys on destruction + */ + bool activated; + bool active; + /* * memcg field is used to find which memcg we belong directly * Each memcg struct can hold more than one cg_proto, so container_of * won't really cut. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0b4b4c8..d1b0849 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -404,6 +404,7 @@ void sock_update_memcg(struct sock *sk) { if (mem_cgroup_sockets_enabled) { struct mem_cgroup *memcg; + struct cg_proto *cg_proto; BUG_ON(!sk->sk_prot->proto_cgroup); @@ -423,9 +424,10 @@ void sock_update_memcg(struct sock *sk) rcu_read_lock(); memcg = mem_cgroup_from_task(current); - if (!mem_cgroup_is_root(memcg)) { + cg_proto = sk->sk_prot->proto_cgroup(memcg); + if (!mem_cgroup_is_root(memcg) && cg_proto->active) { mem_cgroup_get(memcg); - sk->sk_cgrp = sk->sk_prot->proto_cgroup(memcg); + sk->sk_cgrp = cg_proto; } rcu_read_unlock(); } @@ -442,6 +444,14 @@ void sock_release_memcg(struct sock *sk) } } +static void disarm_static_keys(struct mem_cgroup *memcg) +{ +#ifdef CONFIG_INET + if (memcg->tcp_mem.cg_proto.activated) + static_key_slow_dec(&memcg_socket_limit_enabled); +#endif +} + #ifdef CONFIG_INET struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) { @@ -452,6 +462,11 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) } EXPORT_SYMBOL(tcp_proto_cgroup); #endif /* CONFIG_INET */ +#else +static inline void disarm_static_keys(struct mem_cgroup *memcg) +{ +} + #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */ static void drain_all_stock_async(struct mem_cgroup *memcg); @@ -4836,6 +4851,13 @@ static void free_work(struct work_struct *work) int size = sizeof(struct mem_cgroup); memcg = container_of(work, struct mem_cgroup, work_freeing); + /* + * We need to make sure that (at least for now), the jump label + * destruction code runs outside of the cgroup lock. schedule_work() + * will guarantee this happens. Be careful if you need to move this + * disarm_static_keys around + */ + disarm_static_keys(memcg); if (size < PAGE_SIZE) kfree(memcg); else diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c index 1517037..7ea4f79 100644 --- a/net/ipv4/tcp_memcontrol.c +++ b/net/ipv4/tcp_memcontrol.c @@ -74,9 +74,6 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg) percpu_counter_destroy(&tcp->tcp_sockets_allocated); val = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_LIMIT); - - if (val != RESOURCE_MAX) - static_key_slow_dec(&memcg_socket_limit_enabled); } EXPORT_SYMBOL(tcp_destroy_cgroup); @@ -107,10 +104,31 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val) tcp->tcp_prot_mem[i] = min_t(long, val >> PAGE_SHIFT, net->ipv4.sysctl_tcp_mem[i]); - if (val == RESOURCE_MAX && old_lim != RESOURCE_MAX) - static_key_slow_dec(&memcg_socket_limit_enabled); - else if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX) - static_key_slow_inc(&memcg_socket_limit_enabled); + if (val == RESOURCE_MAX) + cg_proto->active = false; + else if (val != RESOURCE_MAX) { + /* + * ->activated needs to be written after the static_key update. + * This is what guarantees that the socket activation function + * is the last one to run. See sock_update_memcg() for details, + * and note that we don't mark any socket as belonging to this + * memcg until that flag is up. + * + * We need to do this, because static_keys will span multiple + * sites, but we can't control their order. If we mark a socket + * as accounted, but the accounting functions are not patched in + * yet, we'll lose accounting. + * + * We never race with the readers in sock_update_memcg(), because + * when this value change, the code to process it is not patched in + * yet. + */ + if (!cg_proto->activated) { + static_key_slow_inc(&memcg_socket_limit_enabled); + cg_proto->activated = true; + } + cg_proto->active = true; + } return 0; } -- 1.7.7.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx103.postini.com [74.125.245.103]) by kanga.kvack.org (Postfix) with SMTP id 713116B0083 for ; Sun, 13 May 2012 20:58:34 -0400 (EDT) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id 9159F3EE0AE for ; Mon, 14 May 2012 09:58:32 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 7909745DD74 for ; Mon, 14 May 2012 09:58:32 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 4F19445DD78 for ; Mon, 14 May 2012 09:58:32 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 3FAFB1DB803E for ; Mon, 14 May 2012 09:58:32 +0900 (JST) Received: from ml13.s.css.fujitsu.com (ml13.s.css.fujitsu.com [10.240.81.133]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id E9FE21DB8038 for ; Mon, 14 May 2012 09:58:31 +0900 (JST) Message-ID: <4FB05834.1020200@jp.fujitsu.com> Date: Mon, 14 May 2012 09:56:20 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH v5 1/2] Always free struct memcg through schedule_work() References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-2-git-send-email-glommer@parallels.com> In-Reply-To: <1336767077-25351-2-git-send-email-glommer@parallels.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko (2012/05/12 5:11), Glauber Costa wrote: > Right now we free struct memcg with kfree right after a > rcu grace period, but defer it if we need to use vfree() to get > rid of that memory area. We do that by need, because we need vfree > to be called in a process context. > > This patch unifies this behavior, by ensuring that even kfree will > happen in a separate thread. The goal is to have a stable place to > call the upcoming jump label destruction function outside the realm > of the complicated and quite far-reaching cgroup lock (that can't be > held when calling neither the cpu_hotplug.lock nor the jump_label_mutex) > > Signed-off-by: Glauber Costa > CC: Tejun Heo > CC: Li Zefan > CC: Kamezawa Hiroyuki > CC: Johannes Weiner > CC: Michal Hocko I think we'll need to revisit this, again. for now, Acked-by: KAMEZAWA Hiroyuki -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx110.postini.com [74.125.245.110]) by kanga.kvack.org (Postfix) with SMTP id B22456B0083 for ; Sun, 13 May 2012 21:01:05 -0400 (EDT) Received: from m3.gw.fujitsu.co.jp (unknown [10.0.50.73]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id CE1F13EE0C5 for ; Mon, 14 May 2012 10:01:03 +0900 (JST) Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id B38D645DEB2 for ; Mon, 14 May 2012 10:01:03 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 9CFDE45DEA6 for ; Mon, 14 May 2012 10:01:03 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 8F0D51DB803C for ; Mon, 14 May 2012 10:01:03 +0900 (JST) Received: from m105.s.css.fujitsu.com (m105.s.css.fujitsu.com [10.240.81.145]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 42FA31DB803B for ; Mon, 14 May 2012 10:01:03 +0900 (JST) Message-ID: <4FB058D8.6060707@jp.fujitsu.com> Date: Mon, 14 May 2012 09:59:04 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> In-Reply-To: <1336767077-25351-3-git-send-email-glommer@parallels.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko (2012/05/12 5:11), Glauber Costa wrote: > We call the destroy function when a cgroup starts to be removed, > such as by a rmdir event. > > However, because of our reference counters, some objects are still > inflight. Right now, we are decrementing the static_keys at destroy() > time, meaning that if we get rid of the last static_key reference, > some objects will still have charges, but the code to properly > uncharge them won't be run. > > This becomes a problem specially if it is ever enabled again, because > now new charges will be added to the staled charges making keeping > it pretty much impossible. > > We just need to be careful with the static branch activation: > since there is no particular preferred order of their activation, > we need to make sure that we only start using it after all > call sites are active. This is achieved by having a per-memcg > flag that is only updated after static_key_slow_inc() returns. > At this time, we are sure all sites are active. > > This is made per-memcg, not global, for a reason: > it also has the effect of making socket accounting more > consistent. The first memcg to be limited will trigger static_key() > activation, therefore, accounting. But all the others will then be > accounted no matter what. After this patch, only limited memcgs > will have its sockets accounted. > > [v2: changed a tcp limited flag for a generic proto limited flag ] > [v3: update the current active flag only after the static_key update ] > [v4: disarm_static_keys() inside free_work ] > [v5: got rid of tcp_limit_mutex, now in the static_key interface ] > > Signed-off-by: Glauber Costa > CC: Tejun Heo > CC: Li Zefan > CC: Kamezawa Hiroyuki > CC: Johannes Weiner > CC: Michal Hocko Thank you for your patient works. Acked-by: KAMEZAWA Hiroyuki BTW, what is the relationship between 1/2 and 2/2 ? Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx132.postini.com [74.125.245.132]) by kanga.kvack.org (Postfix) with SMTP id F3EFF6B004D for ; Sun, 13 May 2012 21:39:22 -0400 (EDT) Received: from huawei.com (szxga04-in [172.24.2.12]) by szxga04-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0M3Z001HHOKRPJ@szxga04-in.huawei.com> for linux-mm@kvack.org; Mon, 14 May 2012 09:38:51 +0800 (CST) Received: from szxrg01-dlp.huawei.com ([172.24.2.119]) by szxga04-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0M3Z003MOOKRY2@szxga04-in.huawei.com> for linux-mm@kvack.org; Mon, 14 May 2012 09:38:51 +0800 (CST) Date: Mon, 14 May 2012 09:38:36 +0800 From: Li Zefan Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time In-reply-to: <1336767077-25351-3-git-send-email-glommer@parallels.com> Message-id: <4FB0621C.3010604@huawei.com> MIME-version: 1.0 Content-type: text/plain; charset=GB2312 Content-transfer-encoding: 7BIT References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Johannes Weiner , Michal Hocko > +static void disarm_static_keys(struct mem_cgroup *memcg) > +{ > +#ifdef CONFIG_INET > + if (memcg->tcp_mem.cg_proto.activated) > + static_key_slow_dec(&memcg_socket_limit_enabled); > +#endif > +} Move this inside the ifdef/endif below ? Otherwise I think you'll get compile error if !CONFIG_INET... > + > #ifdef CONFIG_INET > struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) > { > @@ -452,6 +462,11 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) > } > EXPORT_SYMBOL(tcp_proto_cgroup); > #endif /* CONFIG_INET */ > +#else > +static inline void disarm_static_keys(struct mem_cgroup *memcg) > +{ > +} > + > #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx203.postini.com [74.125.245.203]) by kanga.kvack.org (Postfix) with SMTP id F0AF36B00F5 for ; Mon, 14 May 2012 14:12:55 -0400 (EDT) Received: by dakp5 with SMTP id p5so9024570dak.14 for ; Mon, 14 May 2012 11:12:55 -0700 (PDT) Date: Mon, 14 May 2012 11:12:50 -0700 From: Tejun Heo Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Message-ID: <20120514181250.GD2366@google.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1336767077-25351-3-git-send-email-glommer@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Li Zefan , Johannes Weiner , Michal Hocko On Fri, May 11, 2012 at 05:11:17PM -0300, Glauber Costa wrote: > We call the destroy function when a cgroup starts to be removed, > such as by a rmdir event. > > However, because of our reference counters, some objects are still > inflight. Right now, we are decrementing the static_keys at destroy() > time, meaning that if we get rid of the last static_key reference, > some objects will still have charges, but the code to properly > uncharge them won't be run. > > This becomes a problem specially if it is ever enabled again, because > now new charges will be added to the staled charges making keeping > it pretty much impossible. > > We just need to be careful with the static branch activation: > since there is no particular preferred order of their activation, > we need to make sure that we only start using it after all > call sites are active. This is achieved by having a per-memcg > flag that is only updated after static_key_slow_inc() returns. > At this time, we are sure all sites are active. > > This is made per-memcg, not global, for a reason: > it also has the effect of making socket accounting more > consistent. The first memcg to be limited will trigger static_key() > activation, therefore, accounting. But all the others will then be > accounted no matter what. After this patch, only limited memcgs > will have its sockets accounted. > > [v2: changed a tcp limited flag for a generic proto limited flag ] > [v3: update the current active flag only after the static_key update ] > [v4: disarm_static_keys() inside free_work ] > [v5: got rid of tcp_limit_mutex, now in the static_key interface ] > > Signed-off-by: Glauber Costa > CC: Tejun Heo > CC: Li Zefan > CC: Kamezawa Hiroyuki > CC: Johannes Weiner > CC: Michal Hocko Generally looks sane to me. Please feel free to addmy Reviewed-by. > + if (val == RESOURCE_MAX) > + cg_proto->active = false; > + else if (val != RESOURCE_MAX) { Minor nitpick: CodingStyle says not to omit { } if other branches need them. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx156.postini.com [74.125.245.156]) by kanga.kvack.org (Postfix) with SMTP id D6EA06B004D for ; Wed, 16 May 2012 02:05:22 -0400 (EDT) Message-ID: <4FB3431C.3050402@parallels.com> Date: Wed, 16 May 2012 10:03:08 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB058D8.6060707@jp.fujitsu.com> In-Reply-To: <4FB058D8.6060707@jp.fujitsu.com> Content-Type: text/plain; charset="ISO-2022-JP" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On 05/14/2012 04:59 AM, KAMEZAWA Hiroyuki wrote: > (2012/05/12 5:11), Glauber Costa wrote: > >> We call the destroy function when a cgroup starts to be removed, >> such as by a rmdir event. >> >> However, because of our reference counters, some objects are still >> inflight. Right now, we are decrementing the static_keys at destroy() >> time, meaning that if we get rid of the last static_key reference, >> some objects will still have charges, but the code to properly >> uncharge them won't be run. >> >> This becomes a problem specially if it is ever enabled again, because >> now new charges will be added to the staled charges making keeping >> it pretty much impossible. >> >> We just need to be careful with the static branch activation: >> since there is no particular preferred order of their activation, >> we need to make sure that we only start using it after all >> call sites are active. This is achieved by having a per-memcg >> flag that is only updated after static_key_slow_inc() returns. >> At this time, we are sure all sites are active. >> >> This is made per-memcg, not global, for a reason: >> it also has the effect of making socket accounting more >> consistent. The first memcg to be limited will trigger static_key() >> activation, therefore, accounting. But all the others will then be >> accounted no matter what. After this patch, only limited memcgs >> will have its sockets accounted. >> >> [v2: changed a tcp limited flag for a generic proto limited flag ] >> [v3: update the current active flag only after the static_key update ] >> [v4: disarm_static_keys() inside free_work ] >> [v5: got rid of tcp_limit_mutex, now in the static_key interface ] >> >> Signed-off-by: Glauber Costa >> CC: Tejun Heo >> CC: Li Zefan >> CC: Kamezawa Hiroyuki >> CC: Johannes Weiner >> CC: Michal Hocko > > > Thank you for your patient works. > > Acked-by: KAMEZAWA Hiroyuki > > BTW, what is the relationship between 1/2 and 2/2 ? Can't do jump label patching inside an interrupt handler. They need to happen when we free the structure, and I was about to add a worker myself when I found out we already have one: just we don't always use it. Before we merge it, let me just make sure the issue with config Li pointed out don't exist. I did test it, but since I've reposted this many times with multiple tiny changes - the type that will usually get us killed, I'd be more comfortable with an extra round of testing if someone spotted a possibility. Who is merging this fix, btw ? I find it to be entirely memcg related, even though it touches a file in net (but a file with only memcg code in it) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx202.postini.com [74.125.245.202]) by kanga.kvack.org (Postfix) with SMTP id 8A9CA6B004D for ; Wed, 16 May 2012 03:05:59 -0400 (EDT) Message-ID: <4FB35153.3080309@parallels.com> Date: Wed, 16 May 2012 11:03:47 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB0621C.3010604@huawei.com> In-Reply-To: <4FB0621C.3010604@huawei.com> Content-Type: text/plain; charset="GB2312" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Li Zefan Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Johannes Weiner , Michal Hocko On 05/14/2012 05:38 AM, Li Zefan wrote: >> +static void disarm_static_keys(struct mem_cgroup *memcg) > >> +{ >> +#ifdef CONFIG_INET >> + if (memcg->tcp_mem.cg_proto.activated) >> + static_key_slow_dec(&memcg_socket_limit_enabled); >> +#endif >> +} > > > Move this inside the ifdef/endif below ? > > Otherwise I think you'll get compile error if !CONFIG_INET... I don't fully get it. We are supposed to provide a version of it for CONFIG_CGROUP_MEM_RES_CTLR_KMEM and an empty version for !CONFIG_CGROUP_MEM_RES_CTLR_KMEM Inside the first, we take an action for CONFIG_INET, and no action for !CONFIG_INET. Bear in mind that the slab patches will add another test to that place, and that's why I am doing it this way from the beginning. Well, that said, I not only can be wrong, I very frequently am. But I just compiled this one with and without CONFIG_INET, and it seems to be going alright. >> + >> #ifdef CONFIG_INET >> struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) >> { >> @@ -452,6 +462,11 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) >> } >> EXPORT_SYMBOL(tcp_proto_cgroup); >> #endif /* CONFIG_INET */ >> +#else >> +static inline void disarm_static_keys(struct mem_cgroup *memcg) >> +{ >> +} >> + >> #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */ > > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx176.postini.com [74.125.245.176]) by kanga.kvack.org (Postfix) with SMTP id 44F266B004D for ; Wed, 16 May 2012 03:06:59 -0400 (EDT) Message-ID: <4FB3518B.3090205@parallels.com> Date: Wed, 16 May 2012 11:04:43 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB058D8.6060707@jp.fujitsu.com> <4FB3431C.3050402@parallels.com> In-Reply-To: <4FB3431C.3050402@parallels.com> Content-Type: text/plain; charset="ISO-2022-JP" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On 05/16/2012 10:03 AM, Glauber Costa wrote: >> BTW, what is the relationship between 1/2 and 2/2 ? > Can't do jump label patching inside an interrupt handler. They need to > happen when we free the structure, and I was about to add a worker > myself when I found out we already have one: just we don't always use it. > > Before we merge it, let me just make sure the issue with config Li > pointed out don't exist. I did test it, but since I've reposted this > many times with multiple tiny changes - the type that will usually get > us killed, I'd be more comfortable with an extra round of testing if > someone spotted a possibility. > > Who is merging this fix, btw ? > I find it to be entirely memcg related, even though it touches a file in > net (but a file with only memcg code in it) > For the record, I compiled test it many times, and the problem that Li wondered about seems not to exist. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx125.postini.com [74.125.245.125]) by kanga.kvack.org (Postfix) with SMTP id 6A4366B004D for ; Wed, 16 May 2012 04:30:29 -0400 (EDT) Received: from m4.gw.fujitsu.co.jp (unknown [10.0.50.74]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id EDB753EE0C7 for ; Wed, 16 May 2012 17:30:27 +0900 (JST) Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id CF85445DE4D for ; Wed, 16 May 2012 17:30:27 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id AC66C45DE61 for ; Wed, 16 May 2012 17:30:27 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 9F3FE1DB802F for ; Wed, 16 May 2012 17:30:27 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.240.81.134]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 1C7161DB803F for ; Wed, 16 May 2012 17:30:27 +0900 (JST) Message-ID: <4FB3652D.2040909@jp.fujitsu.com> Date: Wed, 16 May 2012 17:28:29 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB058D8.6060707@jp.fujitsu.com> <4FB3431C.3050402@parallels.com> <4FB3518B.3090205@parallels.com> In-Reply-To: <4FB3518B.3090205@parallels.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko , David Miller , Andrew Morton (2012/05/16 16:04), Glauber Costa wrote: > On 05/16/2012 10:03 AM, Glauber Costa wrote: >>> BTW, what is the relationship between 1/2 and 2/2 ? >> Can't do jump label patching inside an interrupt handler. They need to >> happen when we free the structure, and I was about to add a worker >> myself when I found out we already have one: just we don't always use it. >> >> Before we merge it, let me just make sure the issue with config Li >> pointed out don't exist. I did test it, but since I've reposted this >> many times with multiple tiny changes - the type that will usually get >> us killed, I'd be more comfortable with an extra round of testing if >> someone spotted a possibility. >> >> Who is merging this fix, btw ? >> I find it to be entirely memcg related, even though it touches a file in >> net (but a file with only memcg code in it) >> > > For the record, I compiled test it many times, and the problem that Li > wondered about seems not to exist. > Ah...Hmm.....I guess dependency problem will be found in -mm if any rather than netdev... David, can this bug-fix patch goes via -mm tree ? Or will you pick up ? CC'ed David Miller and Andrew Morton. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx178.postini.com [74.125.245.178]) by kanga.kvack.org (Postfix) with SMTP id D66B46B004D for ; Wed, 16 May 2012 04:32:36 -0400 (EDT) Message-ID: <4FB365A1.6040101@parallels.com> Date: Wed, 16 May 2012 12:30:25 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB058D8.6060707@jp.fujitsu.com> <4FB3431C.3050402@parallels.com> <4FB3518B.3090205@parallels.com> <4FB3652D.2040909@jp.fujitsu.com> In-Reply-To: <4FB3652D.2040909@jp.fujitsu.com> Content-Type: text/plain; charset="ISO-2022-JP" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko , David Miller , Andrew Morton On 05/16/2012 12:28 PM, KAMEZAWA Hiroyuki wrote: >> For the record, I compiled test it many times, and the problem that Li >> > wondered about seems not to exist. >> > > Ah...Hmm.....I guess dependency problem will be found in -mm if any rather than > netdev... Yes. As I said, this only touches stuff in core memcg and the memcg specific file. Any conflicts should come from other memcg fixes that may have got into the tree... > David, can this bug-fix patch goes via -mm tree ? Or will you pick up ? > > CC'ed David Miller and Andrew Morton. > > Thanks, > -Kame > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx117.postini.com [74.125.245.117]) by kanga.kvack.org (Postfix) with SMTP id 8CC1E6B004D for ; Wed, 16 May 2012 04:39:40 -0400 (EDT) Message-ID: <4FB3674C.2030604@parallels.com> Date: Wed, 16 May 2012 12:37:32 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB058D8.6060707@jp.fujitsu.com> <4FB3431C.3050402@parallels.com> <4FB3518B.3090205@parallels.com> <4FB3652D.2040909@jp.fujitsu.com> In-Reply-To: <4FB3652D.2040909@jp.fujitsu.com> Content-Type: text/plain; charset="ISO-2022-JP" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko , David Miller , Andrew Morton On 05/16/2012 12:28 PM, KAMEZAWA Hiroyuki wrote: > (2012/05/16 16:04), Glauber Costa wrote: > >> On 05/16/2012 10:03 AM, Glauber Costa wrote: >>>> BTW, what is the relationship between 1/2 and 2/2 ? >>> Can't do jump label patching inside an interrupt handler. They need to >>> happen when we free the structure, and I was about to add a worker >>> myself when I found out we already have one: just we don't always use it. >>> >>> Before we merge it, let me just make sure the issue with config Li >>> pointed out don't exist. I did test it, but since I've reposted this >>> many times with multiple tiny changes - the type that will usually get >>> us killed, I'd be more comfortable with an extra round of testing if >>> someone spotted a possibility. >>> >>> Who is merging this fix, btw ? >>> I find it to be entirely memcg related, even though it touches a file in >>> net (but a file with only memcg code in it) >>> >> >> For the record, I compiled test it many times, and the problem that Li >> wondered about seems not to exist. >> > > Ah...Hmm.....I guess dependency problem will be found in -mm if any rather than > netdev... > > David, can this bug-fix patch goes via -mm tree ? Or will you pick up ? > Another thing: Patch 2 in this series is of course dependent on patch 1 - which lives 100 % in memcg core. Without that, lockdep will scream while disabling the static key. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx199.postini.com [74.125.245.199]) by kanga.kvack.org (Postfix) with SMTP id 760046B0082 for ; Wed, 16 May 2012 17:06:40 -0400 (EDT) Date: Wed, 16 May 2012 14:06:37 -0700 From: Andrew Morton Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Message-Id: <20120516140637.17741df6.akpm@linux-foundation.org> In-Reply-To: <1336767077-25351-3-git-send-email-glommer@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On Fri, 11 May 2012 17:11:17 -0300 Glauber Costa wrote: > We call the destroy function when a cgroup starts to be removed, > such as by a rmdir event. > > However, because of our reference counters, some objects are still > inflight. Right now, we are decrementing the static_keys at destroy() > time, meaning that if we get rid of the last static_key reference, > some objects will still have charges, but the code to properly > uncharge them won't be run. > > This becomes a problem specially if it is ever enabled again, because > now new charges will be added to the staled charges making keeping > it pretty much impossible. > > We just need to be careful with the static branch activation: > since there is no particular preferred order of their activation, > we need to make sure that we only start using it after all > call sites are active. This is achieved by having a per-memcg > flag that is only updated after static_key_slow_inc() returns. > At this time, we are sure all sites are active. > > This is made per-memcg, not global, for a reason: > it also has the effect of making socket accounting more > consistent. The first memcg to be limited will trigger static_key() > activation, therefore, accounting. But all the others will then be > accounted no matter what. After this patch, only limited memcgs > will have its sockets accounted. > > ... > > @@ -107,10 +104,31 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val) > tcp->tcp_prot_mem[i] = min_t(long, val >> PAGE_SHIFT, > net->ipv4.sysctl_tcp_mem[i]); > > - if (val == RESOURCE_MAX && old_lim != RESOURCE_MAX) > - static_key_slow_dec(&memcg_socket_limit_enabled); > - else if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX) > - static_key_slow_inc(&memcg_socket_limit_enabled); > + if (val == RESOURCE_MAX) > + cg_proto->active = false; > + else if (val != RESOURCE_MAX) { > + /* > + * ->activated needs to be written after the static_key update. > + * This is what guarantees that the socket activation function > + * is the last one to run. See sock_update_memcg() for details, > + * and note that we don't mark any socket as belonging to this > + * memcg until that flag is up. > + * > + * We need to do this, because static_keys will span multiple > + * sites, but we can't control their order. If we mark a socket > + * as accounted, but the accounting functions are not patched in > + * yet, we'll lose accounting. > + * > + * We never race with the readers in sock_update_memcg(), because > + * when this value change, the code to process it is not patched in > + * yet. > + */ > + if (!cg_proto->activated) { > + static_key_slow_inc(&memcg_socket_limit_enabled); > + cg_proto->activated = true; > + } If two threads run this code concurrently, they can both see cg_proto->activated==false and they will both run static_key_slow_inc(). Hopefully there's some locking somewhere which prevents this, but it is unobvious. We should comment this, probably at the cg_proto.activated definition site. Or we should fix the bug ;) > + cg_proto->active = true; > + } > > return 0; > } > > ... > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx115.postini.com [74.125.245.115]) by kanga.kvack.org (Postfix) with SMTP id 8362B6B0082 for ; Wed, 16 May 2012 20:09:11 -0400 (EDT) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 908D73EE0C1 for ; Thu, 17 May 2012 09:09:09 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 7477045DE5F for ; Thu, 17 May 2012 09:09:09 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 5116645DE4D for ; Thu, 17 May 2012 09:09:09 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 41A8B1DB802C for ; Thu, 17 May 2012 09:09:09 +0900 (JST) Received: from m107.s.css.fujitsu.com (m107.s.css.fujitsu.com [10.240.81.147]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id E0C6C1DB803A for ; Thu, 17 May 2012 09:09:08 +0900 (JST) Message-ID: <4FB4412F.3050909@jp.fujitsu.com> Date: Thu, 17 May 2012 09:07:11 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516141342.911931e7.akpm@linux-foundation.org> In-Reply-To: <20120516141342.911931e7.akpm@linux-foundation.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Glauber Costa , cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko (2012/05/17 6:13), Andrew Morton wrote: > On Fri, 11 May 2012 17:11:17 -0300 > Glauber Costa wrote: > >> We call the destroy function when a cgroup starts to be removed, >> such as by a rmdir event. >> >> However, because of our reference counters, some objects are still >> inflight. Right now, we are decrementing the static_keys at destroy() >> time, meaning that if we get rid of the last static_key reference, >> some objects will still have charges, but the code to properly >> uncharge them won't be run. >> >> This becomes a problem specially if it is ever enabled again, because >> now new charges will be added to the staled charges making keeping >> it pretty much impossible. >> >> We just need to be careful with the static branch activation: >> since there is no particular preferred order of their activation, >> we need to make sure that we only start using it after all >> call sites are active. This is achieved by having a per-memcg >> flag that is only updated after static_key_slow_inc() returns. >> At this time, we are sure all sites are active. >> >> This is made per-memcg, not global, for a reason: >> it also has the effect of making socket accounting more >> consistent. The first memcg to be limited will trigger static_key() >> activation, therefore, accounting. But all the others will then be >> accounted no matter what. After this patch, only limited memcgs >> will have its sockets accounted. > > So I'm scratching my head over what the actual bug is, and how > important it is. AFAICT it will cause charging stats to exhibit some > inaccuracy when memcg's are being torn down? > > I don't know how serious this in in the real world and so can't decide > which kernel version(s) we should fix. > > When fixing bugs, please always fully describe the bug's end-user > impact, so that I and others can make these sorts of decisions. > Ah, this was a bug report from me. tcp accounting can be easily broken. Costa, could you include this ? == tcp memcontrol uses static_branch to optimize limit=RESOURCE_MAX case. If all cgroup's limit=RESOUCE_MAX, resource usage is not accounted. But it's buggy now. For example, do following # while sleep 1;do echo 9223372036854775807 > /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes; echo 300M > /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes; done and run network application under A. tcp's usage is sometimes accounted and sometimes not accounted because of frequent changes of static_branch. Then, you can see broken tcp.usage_in_bytes. WARN_ON() is printed because res_counter->usage goes below 0. == kernel: ------------[ cut here ]---------- kernel: WARNING: at kernel/res_counter.c:96 res_counter_uncharge_locked+0x37/0x40() kernel: Pid: 17753, comm: bash Tainted: G W 3.3.0+ #99 kernel: Call Trace: kernel: [] warn_slowpath_common+0x7f/0xc0 kernel: [] ? rb_reserve__next_event+0x68/0x470 kernel: [] warn_slowpath_null+0x1a/0x20 kernel: [] res_counter_uncharge_locked+0x37/0x40 ... == -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx162.postini.com [74.125.245.162]) by kanga.kvack.org (Postfix) with SMTP id 8EDCF6B0082 for ; Wed, 16 May 2012 23:09:03 -0400 (EDT) Message-ID: <4FB46B4C.3000307@parallels.com> Date: Thu, 17 May 2012 07:06:52 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> In-Reply-To: <20120516140637.17741df6.akpm@linux-foundation.org> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On 05/17/2012 01:06 AM, Andrew Morton wrote: > On Fri, 11 May 2012 17:11:17 -0300 > Glauber Costa wrote: > >> We call the destroy function when a cgroup starts to be removed, >> such as by a rmdir event. >> >> However, because of our reference counters, some objects are still >> inflight. Right now, we are decrementing the static_keys at destroy() >> time, meaning that if we get rid of the last static_key reference, >> some objects will still have charges, but the code to properly >> uncharge them won't be run. >> >> This becomes a problem specially if it is ever enabled again, because >> now new charges will be added to the staled charges making keeping >> it pretty much impossible. >> >> We just need to be careful with the static branch activation: >> since there is no particular preferred order of their activation, >> we need to make sure that we only start using it after all >> call sites are active. This is achieved by having a per-memcg >> flag that is only updated after static_key_slow_inc() returns. >> At this time, we are sure all sites are active. >> >> This is made per-memcg, not global, for a reason: >> it also has the effect of making socket accounting more >> consistent. The first memcg to be limited will trigger static_key() >> activation, therefore, accounting. But all the others will then be >> accounted no matter what. After this patch, only limited memcgs >> will have its sockets accounted. >> >> ... >> >> @@ -107,10 +104,31 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val) >> tcp->tcp_prot_mem[i] = min_t(long, val>> PAGE_SHIFT, >> net->ipv4.sysctl_tcp_mem[i]); >> >> - if (val == RESOURCE_MAX&& old_lim != RESOURCE_MAX) >> - static_key_slow_dec(&memcg_socket_limit_enabled); >> - else if (old_lim == RESOURCE_MAX&& val != RESOURCE_MAX) >> - static_key_slow_inc(&memcg_socket_limit_enabled); >> + if (val == RESOURCE_MAX) >> + cg_proto->active = false; >> + else if (val != RESOURCE_MAX) { >> + /* >> + * ->activated needs to be written after the static_key update. >> + * This is what guarantees that the socket activation function >> + * is the last one to run. See sock_update_memcg() for details, >> + * and note that we don't mark any socket as belonging to this >> + * memcg until that flag is up. >> + * >> + * We need to do this, because static_keys will span multiple >> + * sites, but we can't control their order. If we mark a socket >> + * as accounted, but the accounting functions are not patched in >> + * yet, we'll lose accounting. >> + * >> + * We never race with the readers in sock_update_memcg(), because >> + * when this value change, the code to process it is not patched in >> + * yet. >> + */ >> + if (!cg_proto->activated) { >> + static_key_slow_inc(&memcg_socket_limit_enabled); >> + cg_proto->activated = true; >> + } > > If two threads run this code concurrently, they can both see > cg_proto->activated==false and they will both run > static_key_slow_inc(). > > Hopefully there's some locking somewhere which prevents this, but it is > unobvious. We should comment this, probably at the cg_proto.activated > definition site. Or we should fix the bug ;) > If that happens, locking in static_key_slow_inc will prevent any damage. My previous version had explicit code to prevent that, but we were pointed out that this is already part of the static_key expectations, so that was dropped. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx206.postini.com [74.125.245.206]) by kanga.kvack.org (Postfix) with SMTP id 3F1FA6B0082 for ; Wed, 16 May 2012 23:11:35 -0400 (EDT) Message-ID: <4FB46BE9.6080503@parallels.com> Date: Thu, 17 May 2012 07:09:29 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516141342.911931e7.akpm@linux-foundation.org> In-Reply-To: <20120516141342.911931e7.akpm@linux-foundation.org> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On 05/17/2012 01:13 AM, Andrew Morton wrote: > On Fri, 11 May 2012 17:11:17 -0300 > Glauber Costa wrote: > >> We call the destroy function when a cgroup starts to be removed, >> such as by a rmdir event. >> >> However, because of our reference counters, some objects are still >> inflight. Right now, we are decrementing the static_keys at destroy() >> time, meaning that if we get rid of the last static_key reference, >> some objects will still have charges, but the code to properly >> uncharge them won't be run. >> >> This becomes a problem specially if it is ever enabled again, because >> now new charges will be added to the staled charges making keeping >> it pretty much impossible. >> >> We just need to be careful with the static branch activation: >> since there is no particular preferred order of their activation, >> we need to make sure that we only start using it after all >> call sites are active. This is achieved by having a per-memcg >> flag that is only updated after static_key_slow_inc() returns. >> At this time, we are sure all sites are active. >> >> This is made per-memcg, not global, for a reason: >> it also has the effect of making socket accounting more >> consistent. The first memcg to be limited will trigger static_key() >> activation, therefore, accounting. But all the others will then be >> accounted no matter what. After this patch, only limited memcgs >> will have its sockets accounted. > > So I'm scratching my head over what the actual bug is, and how > important it is. AFAICT it will cause charging stats to exhibit some > inaccuracy when memcg's are being torn down? > > I don't know how serious this in in the real world and so can't decide > which kernel version(s) we should fix. > > When fixing bugs, please always fully describe the bug's end-user > impact, so that I and others can make these sorts of decisions. Hi Andrew. I believe that was described in patch 0/2 ? In any case, this is something we need fixed, but it is not -stable material or anything. The bug leads to misaccounting when we quickly enable and disable limit in a loop. We have a synthetic script to demonstrate that. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx131.postini.com [74.125.245.131]) by kanga.kvack.org (Postfix) with SMTP id 826406B0082 for ; Thu, 17 May 2012 01:36:29 -0400 (EDT) Date: Wed, 16 May 2012 22:37:15 -0700 From: Andrew Morton Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Message-Id: <20120516223715.5d1b4385.akpm@linux-foundation.org> In-Reply-To: <4FB46B4C.3000307@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> <4FB46B4C.3000307@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On Thu, 17 May 2012 07:06:52 +0400 Glauber Costa wrote: > ... > >> + else if (val != RESOURCE_MAX) { > >> + /* > >> + * ->activated needs to be written after the static_key update. > >> + * This is what guarantees that the socket activation function > >> + * is the last one to run. See sock_update_memcg() for details, > >> + * and note that we don't mark any socket as belonging to this > >> + * memcg until that flag is up. > >> + * > >> + * We need to do this, because static_keys will span multiple > >> + * sites, but we can't control their order. If we mark a socket > >> + * as accounted, but the accounting functions are not patched in > >> + * yet, we'll lose accounting. > >> + * > >> + * We never race with the readers in sock_update_memcg(), because > >> + * when this value change, the code to process it is not patched in > >> + * yet. > >> + */ > >> + if (!cg_proto->activated) { > >> + static_key_slow_inc(&memcg_socket_limit_enabled); > >> + cg_proto->activated = true; > >> + } > > > > If two threads run this code concurrently, they can both see > > cg_proto->activated==false and they will both run > > static_key_slow_inc(). > > > > Hopefully there's some locking somewhere which prevents this, but it is > > unobvious. We should comment this, probably at the cg_proto.activated > > definition site. Or we should fix the bug ;) > > > If that happens, locking in static_key_slow_inc will prevent any damage. > My previous version had explicit code to prevent that, but we were > pointed out that this is already part of the static_key expectations, so > that was dropped. This makes no sense. If two threads run that code concurrently, key->enabled gets incremented twice. Nobody anywhere has a record that this happened so it cannot be undone. key->enabled is now in an unknown state. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx130.postini.com [74.125.245.130]) by kanga.kvack.org (Postfix) with SMTP id 2884D6B0044 for ; Thu, 17 May 2012 05:54:22 -0400 (EDT) Message-ID: <4FB4CA4D.50608@parallels.com> Date: Thu, 17 May 2012 13:52:13 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> <4FB46B4C.3000307@parallels.com> <20120516223715.5d1b4385.akpm@linux-foundation.org> In-Reply-To: <20120516223715.5d1b4385.akpm@linux-foundation.org> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On 05/17/2012 09:37 AM, Andrew Morton wrote: >> > If that happens, locking in static_key_slow_inc will prevent any damage. >> > My previous version had explicit code to prevent that, but we were >> > pointed out that this is already part of the static_key expectations, so >> > that was dropped. > This makes no sense. If two threads run that code concurrently, > key->enabled gets incremented twice. Nobody anywhere has a record that > this happened so it cannot be undone. key->enabled is now in an > unknown state. Kame, Tejun, Andrew is right. It seems we will need that mutex after all. Just this is not a race, and neither something that should belong in the static_branch interface. We want to make sure that enabled is not updated before the jump label update, because we need a specific ordering guarantee at the patched sites. And *that*, the interface guarantees, and we were wrong to believe it did not. That is a correction issue for the accounting, and that part is right. But when we disarm it, we'll need to make sure that happened only once, otherwise we may never unpatch it. That, or we'd need that to be a counter. The jump label interface does not - and should not - keep track of how many updates happened to a key. That's the role of whoever is using it. If you agree with the above, I'll send this patch again with the correction. Andrew, thank you very much. Do you spot anything else here? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx138.postini.com [74.125.245.138]) by kanga.kvack.org (Postfix) with SMTP id 5D3A26B0083 for ; Thu, 17 May 2012 06:24:14 -0400 (EDT) Message-ID: <4FB4D14D.4020303@parallels.com> Date: Thu, 17 May 2012 14:22:05 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> <4FB46B4C.3000307@parallels.com> <20120516223715.5d1b4385.akpm@linux-foundation.org> <4FB4CA4D.50608@parallels.com> <4FB4D061.10406@jp.fujitsu.com> In-Reply-To: <4FB4D061.10406@jp.fujitsu.com> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko On 05/17/2012 02:18 PM, KAMEZAWA Hiroyuki wrote: > (2012/05/17 18:52), Glauber Costa wrote: > >> On 05/17/2012 09:37 AM, Andrew Morton wrote: >>>>> If that happens, locking in static_key_slow_inc will prevent any damage. >>>>> My previous version had explicit code to prevent that, but we were >>>>> pointed out that this is already part of the static_key expectations, so >>>>> that was dropped. >>> This makes no sense. If two threads run that code concurrently, >>> key->enabled gets incremented twice. Nobody anywhere has a record that >>> this happened so it cannot be undone. key->enabled is now in an >>> unknown state. >> >> Kame, Tejun, >> >> Andrew is right. It seems we will need that mutex after all. Just this >> is not a race, and neither something that should belong in the >> static_branch interface. >> > > > Hmm....how about having > > res_counter_xchg_limit(res,&old_limit, new_limit); > > if (!cg_proto->updated&& old_limit == RESOURCE_MAX) > ....update labels... > > Then, no mutex overhead maybe and activated will be updated only once. > Ah, but please fix in a way you like. Above is an example. I think a mutex is a lot cleaner than adding a new function to the res_counter interface. We could do a counter, and then later decrement the key until the counter reaches zero, but between those two, I still think a mutex here is preferable. Only that, instead of coming up with a mutex of ours, we could export and reuse set_limit_mutex from memcontrol.c > Thanks, > -Kame > (*) I'm sorry I won't be able to read e-mails, tomorrow. > Ok Kame. I am not in a terrible hurry to fix this, it doesn't seem to be hurting any real workload. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx122.postini.com [74.125.245.122]) by kanga.kvack.org (Postfix) with SMTP id C02086B0082 for ; Thu, 17 May 2012 06:29:09 -0400 (EDT) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id DE5983EE0BC for ; Thu, 17 May 2012 19:29:07 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id C70AD45DE71 for ; Thu, 17 May 2012 19:29:07 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id A9F4945DE73 for ; Thu, 17 May 2012 19:29:07 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 708D81DB8042 for ; Thu, 17 May 2012 19:29:07 +0900 (JST) Received: from ml13.s.css.fujitsu.com (ml13.s.css.fujitsu.com [10.240.81.133]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 274301DB8040 for ; Thu, 17 May 2012 19:29:07 +0900 (JST) Message-ID: <4FB4D27D.7020009@jp.fujitsu.com> Date: Thu, 17 May 2012 19:27:09 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> <4FB46B4C.3000307@parallels.com> <20120516223715.5d1b4385.akpm@linux-foundation.org> <4FB4CA4D.50608@parallels.com> <4FB4D061.10406@jp.fujitsu.com> <4FB4D14D.4020303@parallels.com> In-Reply-To: <4FB4D14D.4020303@parallels.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, netdev@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko (2012/05/17 19:22), Glauber Costa wrote: > On 05/17/2012 02:18 PM, KAMEZAWA Hiroyuki wrote: >> (2012/05/17 18:52), Glauber Costa wrote: >> >>> On 05/17/2012 09:37 AM, Andrew Morton wrote: >>>>>> If that happens, locking in static_key_slow_inc will prevent any damage. >>>>>> My previous version had explicit code to prevent that, but we were >>>>>> pointed out that this is already part of the static_key expectations, so >>>>>> that was dropped. >>>> This makes no sense. If two threads run that code concurrently, >>>> key->enabled gets incremented twice. Nobody anywhere has a record that >>>> this happened so it cannot be undone. key->enabled is now in an >>>> unknown state. >>> >>> Kame, Tejun, >>> >>> Andrew is right. It seems we will need that mutex after all. Just this >>> is not a race, and neither something that should belong in the >>> static_branch interface. >>> >> >> >> Hmm....how about having >> >> res_counter_xchg_limit(res,&old_limit, new_limit); >> >> if (!cg_proto->updated&& old_limit == RESOURCE_MAX) >> ....update labels... >> >> Then, no mutex overhead maybe and activated will be updated only once. >> Ah, but please fix in a way you like. Above is an example. > > I think a mutex is a lot cleaner than adding a new function to the > res_counter interface. > > We could do a counter, and then later decrement the key until the > counter reaches zero, but between those two, I still think a mutex here > is preferable. > > Only that, instead of coming up with a mutex of ours, we could export > and reuse set_limit_mutex from memcontrol.c > ok, please. thx, -Kame > >> Thanks, >> -Kame >> (*) I'm sorry I won't be able to read e-mails, tomorrow. >> > Ok Kame. I am not in a terrible hurry to fix this, it doesn't seem to be > hurting any real workload. > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: [PATCH v5 0/2] fix static_key disabling problem in memcg Date: Fri, 11 May 2012 17:11:15 -0300 Message-ID: <1336767077-25351-1-git-send-email-glommer@parallels.com> Cc: , , , , Tejun Heo , Li Zefan To: Return-path: Sender: owner-linux-mm@kvack.org List-Id: netdev.vger.kernel.org Hi, Tejun, Kame, This series is composed of the two patches of the last fix, with no changes (only exception is the removal of x = false assignments that Tejun requested, that is done now). Note also that patch 1 of this series was reused by me in the slab accounting patches for memcg. The first patch, that adds a mutex to memcg is dropped. I didn't posted it before so I could wait for Kame to get back from his vacations and properly review it. Kame: Steven Rostedt pointed out that our analysis of the static branch updates were wrong, so the mutex is really not needed. The key to understand that, is that atomic_inc_not_zero will only return right away if the value is not yet zero - as the name implies - but the update in the atomic variable only happens after the code is patched. Therefore, if two callers enters with a key value of zero, both will be held at the jump_label_lock() call, effectively guaranteeing the behavior we need. Glauber Costa (2): Always free struct memcg through schedule_work() decrement static keys on real destroy time include/net/sock.h | 9 ++++++++ mm/memcontrol.c | 50 +++++++++++++++++++++++++++++++++----------- net/ipv4/tcp_memcontrol.c | 32 ++++++++++++++++++++++------ 3 files changed, 71 insertions(+), 20 deletions(-) -- 1.7.7.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: [PATCH v5 1/2] Always free struct memcg through schedule_work() Date: Fri, 11 May 2012 17:11:16 -0300 Message-ID: <1336767077-25351-2-git-send-email-glommer@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> Cc: , , , , Tejun Heo , Li Zefan , Glauber Costa , Johannes Weiner , Michal Hocko To: Return-path: Received: from mailhub.sw.ru ([195.214.232.25]:41429 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754380Ab2EKUNV (ORCPT ); Fri, 11 May 2012 16:13:21 -0400 In-Reply-To: <1336767077-25351-1-git-send-email-glommer@parallels.com> Sender: netdev-owner@vger.kernel.org List-ID: Right now we free struct memcg with kfree right after a rcu grace period, but defer it if we need to use vfree() to get rid of that memory area. We do that by need, because we need vfree to be called in a process context. This patch unifies this behavior, by ensuring that even kfree will happen in a separate thread. The goal is to have a stable place to call the upcoming jump label destruction function outside the realm of the complicated and quite far-reaching cgroup lock (that can't be held when calling neither the cpu_hotplug.lock nor the jump_label_mutex) Signed-off-by: Glauber Costa CC: Tejun Heo CC: Li Zefan CC: Kamezawa Hiroyuki CC: Johannes Weiner CC: Michal Hocko --- mm/memcontrol.c | 24 +++++++++++++----------- 1 files changed, 13 insertions(+), 11 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 932a734..0b4b4c8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -245,8 +245,8 @@ struct mem_cgroup { */ struct rcu_head rcu_freeing; /* - * But when using vfree(), that cannot be done at - * interrupt time, so we must then queue the work. + * We also need some space for a worker in deferred freeing. + * By the time we call it, rcu_freeing is not longer in use. */ struct work_struct work_freeing; }; @@ -4826,23 +4826,28 @@ out_free: } /* - * Helpers for freeing a vzalloc()ed mem_cgroup by RCU, + * Helpers for freeing a kmalloc()ed/vzalloc()ed mem_cgroup by RCU, * but in process context. The work_freeing structure is overlaid * on the rcu_freeing structure, which itself is overlaid on memsw. */ -static void vfree_work(struct work_struct *work) +static void free_work(struct work_struct *work) { struct mem_cgroup *memcg; + int size = sizeof(struct mem_cgroup); memcg = container_of(work, struct mem_cgroup, work_freeing); - vfree(memcg); + if (size < PAGE_SIZE) + kfree(memcg); + else + vfree(memcg); } -static void vfree_rcu(struct rcu_head *rcu_head) + +static void free_rcu(struct rcu_head *rcu_head) { struct mem_cgroup *memcg; memcg = container_of(rcu_head, struct mem_cgroup, rcu_freeing); - INIT_WORK(&memcg->work_freeing, vfree_work); + INIT_WORK(&memcg->work_freeing, free_work); schedule_work(&memcg->work_freeing); } @@ -4868,10 +4873,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg) free_mem_cgroup_per_zone_info(memcg, node); free_percpu(memcg->stat); - if (sizeof(struct mem_cgroup) < PAGE_SIZE) - kfree_rcu(memcg, rcu_freeing); - else - call_rcu(&memcg->rcu_freeing, vfree_rcu); + call_rcu(&memcg->rcu_freeing, free_rcu); } static void mem_cgroup_get(struct mem_cgroup *memcg) -- 1.7.7.6 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: [PATCH v5 2/2] decrement static keys on real destroy time Date: Fri, 11 May 2012 17:11:17 -0300 Message-ID: <1336767077-25351-3-git-send-email-glommer@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> Cc: , , , , Tejun Heo , Li Zefan , Glauber Costa , Johannes Weiner , Michal Hocko To: Return-path: Received: from mailhub.sw.ru ([195.214.232.25]:28658 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754380Ab2EKUN0 (ORCPT ); Fri, 11 May 2012 16:13:26 -0400 In-Reply-To: <1336767077-25351-1-git-send-email-glommer@parallels.com> Sender: netdev-owner@vger.kernel.org List-ID: We call the destroy function when a cgroup starts to be removed, such as by a rmdir event. However, because of our reference counters, some objects are still inflight. Right now, we are decrementing the static_keys at destroy() time, meaning that if we get rid of the last static_key reference, some objects will still have charges, but the code to properly uncharge them won't be run. This becomes a problem specially if it is ever enabled again, because now new charges will be added to the staled charges making keeping it pretty much impossible. We just need to be careful with the static branch activation: since there is no particular preferred order of their activation, we need to make sure that we only start using it after all call sites are active. This is achieved by having a per-memcg flag that is only updated after static_key_slow_inc() returns. At this time, we are sure all sites are active. This is made per-memcg, not global, for a reason: it also has the effect of making socket accounting more consistent. The first memcg to be limited will trigger static_key() activation, therefore, accounting. But all the others will then be accounted no matter what. After this patch, only limited memcgs will have its sockets accounted. [v2: changed a tcp limited flag for a generic proto limited flag ] [v3: update the current active flag only after the static_key update ] [v4: disarm_static_keys() inside free_work ] [v5: got rid of tcp_limit_mutex, now in the static_key interface ] Signed-off-by: Glauber Costa CC: Tejun Heo CC: Li Zefan CC: Kamezawa Hiroyuki CC: Johannes Weiner CC: Michal Hocko --- include/net/sock.h | 9 +++++++++ mm/memcontrol.c | 26 ++++++++++++++++++++++++-- net/ipv4/tcp_memcontrol.c | 32 +++++++++++++++++++++++++------- 3 files changed, 58 insertions(+), 9 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index b3ebe6b..5c620bd 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -914,6 +914,15 @@ struct cg_proto { int *memory_pressure; long *sysctl_mem; /* + * active means it is currently active, and new sockets should + * be assigned to cgroups. + * + * activated means it was ever activated, and we need to + * disarm the static keys on destruction + */ + bool activated; + bool active; + /* * memcg field is used to find which memcg we belong directly * Each memcg struct can hold more than one cg_proto, so container_of * won't really cut. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0b4b4c8..d1b0849 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -404,6 +404,7 @@ void sock_update_memcg(struct sock *sk) { if (mem_cgroup_sockets_enabled) { struct mem_cgroup *memcg; + struct cg_proto *cg_proto; BUG_ON(!sk->sk_prot->proto_cgroup); @@ -423,9 +424,10 @@ void sock_update_memcg(struct sock *sk) rcu_read_lock(); memcg = mem_cgroup_from_task(current); - if (!mem_cgroup_is_root(memcg)) { + cg_proto = sk->sk_prot->proto_cgroup(memcg); + if (!mem_cgroup_is_root(memcg) && cg_proto->active) { mem_cgroup_get(memcg); - sk->sk_cgrp = sk->sk_prot->proto_cgroup(memcg); + sk->sk_cgrp = cg_proto; } rcu_read_unlock(); } @@ -442,6 +444,14 @@ void sock_release_memcg(struct sock *sk) } } +static void disarm_static_keys(struct mem_cgroup *memcg) +{ +#ifdef CONFIG_INET + if (memcg->tcp_mem.cg_proto.activated) + static_key_slow_dec(&memcg_socket_limit_enabled); +#endif +} + #ifdef CONFIG_INET struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) { @@ -452,6 +462,11 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) } EXPORT_SYMBOL(tcp_proto_cgroup); #endif /* CONFIG_INET */ +#else +static inline void disarm_static_keys(struct mem_cgroup *memcg) +{ +} + #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */ static void drain_all_stock_async(struct mem_cgroup *memcg); @@ -4836,6 +4851,13 @@ static void free_work(struct work_struct *work) int size = sizeof(struct mem_cgroup); memcg = container_of(work, struct mem_cgroup, work_freeing); + /* + * We need to make sure that (at least for now), the jump label + * destruction code runs outside of the cgroup lock. schedule_work() + * will guarantee this happens. Be careful if you need to move this + * disarm_static_keys around + */ + disarm_static_keys(memcg); if (size < PAGE_SIZE) kfree(memcg); else diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c index 1517037..7ea4f79 100644 --- a/net/ipv4/tcp_memcontrol.c +++ b/net/ipv4/tcp_memcontrol.c @@ -74,9 +74,6 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg) percpu_counter_destroy(&tcp->tcp_sockets_allocated); val = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_LIMIT); - - if (val != RESOURCE_MAX) - static_key_slow_dec(&memcg_socket_limit_enabled); } EXPORT_SYMBOL(tcp_destroy_cgroup); @@ -107,10 +104,31 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val) tcp->tcp_prot_mem[i] = min_t(long, val >> PAGE_SHIFT, net->ipv4.sysctl_tcp_mem[i]); - if (val == RESOURCE_MAX && old_lim != RESOURCE_MAX) - static_key_slow_dec(&memcg_socket_limit_enabled); - else if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX) - static_key_slow_inc(&memcg_socket_limit_enabled); + if (val == RESOURCE_MAX) + cg_proto->active = false; + else if (val != RESOURCE_MAX) { + /* + * ->activated needs to be written after the static_key update. + * This is what guarantees that the socket activation function + * is the last one to run. See sock_update_memcg() for details, + * and note that we don't mark any socket as belonging to this + * memcg until that flag is up. + * + * We need to do this, because static_keys will span multiple + * sites, but we can't control their order. If we mark a socket + * as accounted, but the accounting functions are not patched in + * yet, we'll lose accounting. + * + * We never race with the readers in sock_update_memcg(), because + * when this value change, the code to process it is not patched in + * yet. + */ + if (!cg_proto->activated) { + static_key_slow_inc(&memcg_socket_limit_enabled); + cg_proto->activated = true; + } + cg_proto->active = true; + } return 0; } -- 1.7.7.6 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 10:03:08 +0400 Message-ID: <4FB3431C.3050402@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB058D8.6060707@jp.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-2022-JP" Content-Transfer-Encoding: 7bit Cc: , , , , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko To: KAMEZAWA Hiroyuki Return-path: In-Reply-To: <4FB058D8.6060707-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: netdev.vger.kernel.org On 05/14/2012 04:59 AM, KAMEZAWA Hiroyuki wrote: > (2012/05/12 5:11), Glauber Costa wrote: > >> We call the destroy function when a cgroup starts to be removed, >> such as by a rmdir event. >> >> However, because of our reference counters, some objects are still >> inflight. Right now, we are decrementing the static_keys at destroy() >> time, meaning that if we get rid of the last static_key reference, >> some objects will still have charges, but the code to properly >> uncharge them won't be run. >> >> This becomes a problem specially if it is ever enabled again, because >> now new charges will be added to the staled charges making keeping >> it pretty much impossible. >> >> We just need to be careful with the static branch activation: >> since there is no particular preferred order of their activation, >> we need to make sure that we only start using it after all >> call sites are active. This is achieved by having a per-memcg >> flag that is only updated after static_key_slow_inc() returns. >> At this time, we are sure all sites are active. >> >> This is made per-memcg, not global, for a reason: >> it also has the effect of making socket accounting more >> consistent. The first memcg to be limited will trigger static_key() >> activation, therefore, accounting. But all the others will then be >> accounted no matter what. After this patch, only limited memcgs >> will have its sockets accounted. >> >> [v2: changed a tcp limited flag for a generic proto limited flag ] >> [v3: update the current active flag only after the static_key update ] >> [v4: disarm_static_keys() inside free_work ] >> [v5: got rid of tcp_limit_mutex, now in the static_key interface ] >> >> Signed-off-by: Glauber Costa >> CC: Tejun Heo >> CC: Li Zefan >> CC: Kamezawa Hiroyuki >> CC: Johannes Weiner >> CC: Michal Hocko > > > Thank you for your patient works. > > Acked-by: KAMEZAWA Hiroyuki > > BTW, what is the relationship between 1/2 and 2/2 ? Can't do jump label patching inside an interrupt handler. They need to happen when we free the structure, and I was about to add a worker myself when I found out we already have one: just we don't always use it. Before we merge it, let me just make sure the issue with config Li pointed out don't exist. I did test it, but since I've reposted this many times with multiple tiny changes - the type that will usually get us killed, I'd be more comfortable with an extra round of testing if someone spotted a possibility. Who is merging this fix, btw ? I find it to be entirely memcg related, even though it touches a file in net (but a file with only memcg code in it) From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 11:03:47 +0400 Message-ID: <4FB35153.3080309@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB0621C.3010604@huawei.com> Mime-Version: 1.0 Content-Type: text/plain; charset="GB2312" Content-Transfer-Encoding: 7bit Cc: , , , , , Tejun Heo , Johannes Weiner , Michal Hocko To: Li Zefan Return-path: In-Reply-To: <4FB0621C.3010604-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: netdev.vger.kernel.org On 05/14/2012 05:38 AM, Li Zefan wrote: >> +static void disarm_static_keys(struct mem_cgroup *memcg) > >> +{ >> +#ifdef CONFIG_INET >> + if (memcg->tcp_mem.cg_proto.activated) >> + static_key_slow_dec(&memcg_socket_limit_enabled); >> +#endif >> +} > > > Move this inside the ifdef/endif below ? > > Otherwise I think you'll get compile error if !CONFIG_INET... I don't fully get it. We are supposed to provide a version of it for CONFIG_CGROUP_MEM_RES_CTLR_KMEM and an empty version for !CONFIG_CGROUP_MEM_RES_CTLR_KMEM Inside the first, we take an action for CONFIG_INET, and no action for !CONFIG_INET. Bear in mind that the slab patches will add another test to that place, and that's why I am doing it this way from the beginning. Well, that said, I not only can be wrong, I very frequently am. But I just compiled this one with and without CONFIG_INET, and it seems to be going alright. >> + >> #ifdef CONFIG_INET >> struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) >> { >> @@ -452,6 +462,11 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg) >> } >> EXPORT_SYMBOL(tcp_proto_cgroup); >> #endif /* CONFIG_INET */ >> +#else >> +static inline void disarm_static_keys(struct mem_cgroup *memcg) >> +{ >> +} >> + >> #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */ > > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 11:04:43 +0400 Message-ID: <4FB3518B.3090205@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB058D8.6060707@jp.fujitsu.com> <4FB3431C.3050402@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-2022-JP" Content-Transfer-Encoding: 7bit Cc: , , , , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko To: KAMEZAWA Hiroyuki Return-path: In-Reply-To: <4FB3431C.3050402-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: netdev.vger.kernel.org On 05/16/2012 10:03 AM, Glauber Costa wrote: >> BTW, what is the relationship between 1/2 and 2/2 ? > Can't do jump label patching inside an interrupt handler. They need to > happen when we free the structure, and I was about to add a worker > myself when I found out we already have one: just we don't always use it. > > Before we merge it, let me just make sure the issue with config Li > pointed out don't exist. I did test it, but since I've reposted this > many times with multiple tiny changes - the type that will usually get > us killed, I'd be more comfortable with an extra round of testing if > someone spotted a possibility. > > Who is merging this fix, btw ? > I find it to be entirely memcg related, even though it touches a file in > net (but a file with only memcg code in it) > For the record, I compiled test it many times, and the problem that Li wondered about seems not to exist. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 12:30:25 +0400 Message-ID: <4FB365A1.6040101@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB058D8.6060707@jp.fujitsu.com> <4FB3431C.3050402@parallels.com> <4FB3518B.3090205@parallels.com> <4FB3652D.2040909@jp.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-2022-JP" Content-Transfer-Encoding: 7bit Cc: , , , , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko , David Miller , Andrew Morton To: KAMEZAWA Hiroyuki Return-path: In-Reply-To: <4FB3652D.2040909-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: netdev.vger.kernel.org On 05/16/2012 12:28 PM, KAMEZAWA Hiroyuki wrote: >> For the record, I compiled test it many times, and the problem that Li >> > wondered about seems not to exist. >> > > Ah...Hmm.....I guess dependency problem will be found in -mm if any rather than > netdev... Yes. As I said, this only touches stuff in core memcg and the memcg specific file. Any conflicts should come from other memcg fixes that may have got into the tree... > David, can this bug-fix patch goes via -mm tree ? Or will you pick up ? > > CC'ed David Miller and Andrew Morton. > > Thanks, > -Kame > > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 12:37:32 +0400 Message-ID: <4FB3674C.2030604@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB058D8.6060707@jp.fujitsu.com> <4FB3431C.3050402@parallels.com> <4FB3518B.3090205@parallels.com> <4FB3652D.2040909@jp.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-2022-JP" Content-Transfer-Encoding: 7bit Cc: , , , , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko , David Miller , Andrew Morton To: KAMEZAWA Hiroyuki Return-path: In-Reply-To: <4FB3652D.2040909-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: netdev.vger.kernel.org On 05/16/2012 12:28 PM, KAMEZAWA Hiroyuki wrote: > (2012/05/16 16:04), Glauber Costa wrote: > >> On 05/16/2012 10:03 AM, Glauber Costa wrote: >>>> BTW, what is the relationship between 1/2 and 2/2 ? >>> Can't do jump label patching inside an interrupt handler. They need to >>> happen when we free the structure, and I was about to add a worker >>> myself when I found out we already have one: just we don't always use it. >>> >>> Before we merge it, let me just make sure the issue with config Li >>> pointed out don't exist. I did test it, but since I've reposted this >>> many times with multiple tiny changes - the type that will usually get >>> us killed, I'd be more comfortable with an extra round of testing if >>> someone spotted a possibility. >>> >>> Who is merging this fix, btw ? >>> I find it to be entirely memcg related, even though it touches a file in >>> net (but a file with only memcg code in it) >>> >> >> For the record, I compiled test it many times, and the problem that Li >> wondered about seems not to exist. >> > > Ah...Hmm.....I guess dependency problem will be found in -mm if any rather than > netdev... > > David, can this bug-fix patch goes via -mm tree ? Or will you pick up ? > Another thing: Patch 2 in this series is of course dependent on patch 1 - which lives 100 % in memcg core. Without that, lockdep will scream while disabling the static key. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 13:57:55 -0700 Message-ID: <20120516135755.cdcdf9ba.akpm@linux-foundation.org> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <4FB0621C.3010604@huawei.com> <4FB35153.3080309@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Li Zefan , , , , , , Tejun Heo , Johannes Weiner , Michal Hocko To: Glauber Costa Return-path: In-Reply-To: <4FB35153.3080309@parallels.com> Sender: owner-linux-mm@kvack.org List-Id: netdev.vger.kernel.org On Wed, 16 May 2012 11:03:47 +0400 Glauber Costa wrote: > On 05/14/2012 05:38 AM, Li Zefan wrote: > >> +static void disarm_static_keys(struct mem_cgroup *memcg) > > > >> +{ > >> +#ifdef CONFIG_INET > >> + if (memcg->tcp_mem.cg_proto.activated) > >> + static_key_slow_dec(&memcg_socket_limit_enabled); > >> +#endif > >> +} > > > > > > Move this inside the ifdef/endif below ? > > > > Otherwise I think you'll get compile error if !CONFIG_INET... > > I don't fully get it. > > We are supposed to provide a version of it for > CONFIG_CGROUP_MEM_RES_CTLR_KMEM and an empty version for > !CONFIG_CGROUP_MEM_RES_CTLR_KMEM > > Inside the first, we take an action for CONFIG_INET, and no action for > !CONFIG_INET. > > Bear in mind that the slab patches will add another test to that place, > and that's why I am doing it this way from the beginning. > > Well, that said, I not only can be wrong, I very frequently am. > > But I just compiled this one with and without CONFIG_INET, and it seems > to be going alright. > Yes, the ifdeffings in that area are rather nasty. I wonder if it would be simpler to do away with the ifdef nesting. At the top-level, just do #if defined(CONFIG_CGROUP_MEM_RES_CTLR_KMEM) && defined(CONFIG_INET) static void disarm_static_keys(struct mem_cgroup *memcg) { if (memcg->tcp_mem.cg_proto.activated) static_key_slow_dec(&memcg_socket_limit_enabled); } #else static inline void disarm_static_keys(struct mem_cgroup *memcg) { } #endif The tcp_proto_cgroup() definition could go inside that ifdef as well. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 14:13:42 -0700 Message-ID: <20120516141342.911931e7.akpm@linux-foundation.org> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: , , , , , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko To: Glauber Costa Return-path: In-Reply-To: <1336767077-25351-3-git-send-email-glommer@parallels.com> Sender: owner-linux-mm@kvack.org List-Id: netdev.vger.kernel.org On Fri, 11 May 2012 17:11:17 -0300 Glauber Costa wrote: > We call the destroy function when a cgroup starts to be removed, > such as by a rmdir event. > > However, because of our reference counters, some objects are still > inflight. Right now, we are decrementing the static_keys at destroy() > time, meaning that if we get rid of the last static_key reference, > some objects will still have charges, but the code to properly > uncharge them won't be run. > > This becomes a problem specially if it is ever enabled again, because > now new charges will be added to the staled charges making keeping > it pretty much impossible. > > We just need to be careful with the static branch activation: > since there is no particular preferred order of their activation, > we need to make sure that we only start using it after all > call sites are active. This is achieved by having a per-memcg > flag that is only updated after static_key_slow_inc() returns. > At this time, we are sure all sites are active. > > This is made per-memcg, not global, for a reason: > it also has the effect of making socket accounting more > consistent. The first memcg to be limited will trigger static_key() > activation, therefore, accounting. But all the others will then be > accounted no matter what. After this patch, only limited memcgs > will have its sockets accounted. So I'm scratching my head over what the actual bug is, and how important it is. AFAICT it will cause charging stats to exhibit some inaccuracy when memcg's are being torn down? I don't know how serious this in in the real world and so can't decide which kernel version(s) we should fix. When fixing bugs, please always fully describe the bug's end-user impact, so that I and others can make these sorts of decisions. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 14:06:37 -0700 Message-ID: <20120516140637.17741df6.akpm@linux-foundation.org> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: , , , , , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko To: Glauber Costa Return-path: In-Reply-To: <1336767077-25351-3-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: netdev.vger.kernel.org On Fri, 11 May 2012 17:11:17 -0300 Glauber Costa wrote: > We call the destroy function when a cgroup starts to be removed, > such as by a rmdir event. > > However, because of our reference counters, some objects are still > inflight. Right now, we are decrementing the static_keys at destroy() > time, meaning that if we get rid of the last static_key reference, > some objects will still have charges, but the code to properly > uncharge them won't be run. > > This becomes a problem specially if it is ever enabled again, because > now new charges will be added to the staled charges making keeping > it pretty much impossible. > > We just need to be careful with the static branch activation: > since there is no particular preferred order of their activation, > we need to make sure that we only start using it after all > call sites are active. This is achieved by having a per-memcg > flag that is only updated after static_key_slow_inc() returns. > At this time, we are sure all sites are active. > > This is made per-memcg, not global, for a reason: > it also has the effect of making socket accounting more > consistent. The first memcg to be limited will trigger static_key() > activation, therefore, accounting. But all the others will then be > accounted no matter what. After this patch, only limited memcgs > will have its sockets accounted. > > ... > > @@ -107,10 +104,31 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val) > tcp->tcp_prot_mem[i] = min_t(long, val >> PAGE_SHIFT, > net->ipv4.sysctl_tcp_mem[i]); > > - if (val == RESOURCE_MAX && old_lim != RESOURCE_MAX) > - static_key_slow_dec(&memcg_socket_limit_enabled); > - else if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX) > - static_key_slow_inc(&memcg_socket_limit_enabled); > + if (val == RESOURCE_MAX) > + cg_proto->active = false; > + else if (val != RESOURCE_MAX) { > + /* > + * ->activated needs to be written after the static_key update. > + * This is what guarantees that the socket activation function > + * is the last one to run. See sock_update_memcg() for details, > + * and note that we don't mark any socket as belonging to this > + * memcg until that flag is up. > + * > + * We need to do this, because static_keys will span multiple > + * sites, but we can't control their order. If we mark a socket > + * as accounted, but the accounting functions are not patched in > + * yet, we'll lose accounting. > + * > + * We never race with the readers in sock_update_memcg(), because > + * when this value change, the code to process it is not patched in > + * yet. > + */ > + if (!cg_proto->activated) { > + static_key_slow_inc(&memcg_socket_limit_enabled); > + cg_proto->activated = true; > + } If two threads run this code concurrently, they can both see cg_proto->activated==false and they will both run static_key_slow_inc(). Hopefully there's some locking somewhere which prevents this, but it is unobvious. We should comment this, probably at the cg_proto.activated definition site. Or we should fix the bug ;) > + cg_proto->active = true; > + } > > return 0; > } > > ... > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Thu, 17 May 2012 07:06:52 +0400 Message-ID: <4FB46B4C.3000307@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: , , , , , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko To: Andrew Morton Return-path: Received: from mx2.parallels.com ([64.131.90.16]:41893 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760477Ab2EQDIz (ORCPT ); Wed, 16 May 2012 23:08:55 -0400 In-Reply-To: <20120516140637.17741df6.akpm@linux-foundation.org> Sender: netdev-owner@vger.kernel.org List-ID: On 05/17/2012 01:06 AM, Andrew Morton wrote: > On Fri, 11 May 2012 17:11:17 -0300 > Glauber Costa wrote: > >> We call the destroy function when a cgroup starts to be removed, >> such as by a rmdir event. >> >> However, because of our reference counters, some objects are still >> inflight. Right now, we are decrementing the static_keys at destroy() >> time, meaning that if we get rid of the last static_key reference, >> some objects will still have charges, but the code to properly >> uncharge them won't be run. >> >> This becomes a problem specially if it is ever enabled again, because >> now new charges will be added to the staled charges making keeping >> it pretty much impossible. >> >> We just need to be careful with the static branch activation: >> since there is no particular preferred order of their activation, >> we need to make sure that we only start using it after all >> call sites are active. This is achieved by having a per-memcg >> flag that is only updated after static_key_slow_inc() returns. >> At this time, we are sure all sites are active. >> >> This is made per-memcg, not global, for a reason: >> it also has the effect of making socket accounting more >> consistent. The first memcg to be limited will trigger static_key() >> activation, therefore, accounting. But all the others will then be >> accounted no matter what. After this patch, only limited memcgs >> will have its sockets accounted. >> >> ... >> >> @@ -107,10 +104,31 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val) >> tcp->tcp_prot_mem[i] = min_t(long, val>> PAGE_SHIFT, >> net->ipv4.sysctl_tcp_mem[i]); >> >> - if (val == RESOURCE_MAX&& old_lim != RESOURCE_MAX) >> - static_key_slow_dec(&memcg_socket_limit_enabled); >> - else if (old_lim == RESOURCE_MAX&& val != RESOURCE_MAX) >> - static_key_slow_inc(&memcg_socket_limit_enabled); >> + if (val == RESOURCE_MAX) >> + cg_proto->active = false; >> + else if (val != RESOURCE_MAX) { >> + /* >> + * ->activated needs to be written after the static_key update. >> + * This is what guarantees that the socket activation function >> + * is the last one to run. See sock_update_memcg() for details, >> + * and note that we don't mark any socket as belonging to this >> + * memcg until that flag is up. >> + * >> + * We need to do this, because static_keys will span multiple >> + * sites, but we can't control their order. If we mark a socket >> + * as accounted, but the accounting functions are not patched in >> + * yet, we'll lose accounting. >> + * >> + * We never race with the readers in sock_update_memcg(), because >> + * when this value change, the code to process it is not patched in >> + * yet. >> + */ >> + if (!cg_proto->activated) { >> + static_key_slow_inc(&memcg_socket_limit_enabled); >> + cg_proto->activated = true; >> + } > > If two threads run this code concurrently, they can both see > cg_proto->activated==false and they will both run > static_key_slow_inc(). > > Hopefully there's some locking somewhere which prevents this, but it is > unobvious. We should comment this, probably at the cg_proto.activated > definition site. Or we should fix the bug ;) > If that happens, locking in static_key_slow_inc will prevent any damage. My previous version had explicit code to prevent that, but we were pointed out that this is already part of the static_key expectations, so that was dropped. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Thu, 17 May 2012 07:09:29 +0400 Message-ID: <4FB46BE9.6080503@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516141342.911931e7.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: , , , , , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko To: Andrew Morton Return-path: In-Reply-To: <20120516141342.911931e7.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: netdev.vger.kernel.org On 05/17/2012 01:13 AM, Andrew Morton wrote: > On Fri, 11 May 2012 17:11:17 -0300 > Glauber Costa wrote: > >> We call the destroy function when a cgroup starts to be removed, >> such as by a rmdir event. >> >> However, because of our reference counters, some objects are still >> inflight. Right now, we are decrementing the static_keys at destroy() >> time, meaning that if we get rid of the last static_key reference, >> some objects will still have charges, but the code to properly >> uncharge them won't be run. >> >> This becomes a problem specially if it is ever enabled again, because >> now new charges will be added to the staled charges making keeping >> it pretty much impossible. >> >> We just need to be careful with the static branch activation: >> since there is no particular preferred order of their activation, >> we need to make sure that we only start using it after all >> call sites are active. This is achieved by having a per-memcg >> flag that is only updated after static_key_slow_inc() returns. >> At this time, we are sure all sites are active. >> >> This is made per-memcg, not global, for a reason: >> it also has the effect of making socket accounting more >> consistent. The first memcg to be limited will trigger static_key() >> activation, therefore, accounting. But all the others will then be >> accounted no matter what. After this patch, only limited memcgs >> will have its sockets accounted. > > So I'm scratching my head over what the actual bug is, and how > important it is. AFAICT it will cause charging stats to exhibit some > inaccuracy when memcg's are being torn down? > > I don't know how serious this in in the real world and so can't decide > which kernel version(s) we should fix. > > When fixing bugs, please always fully describe the bug's end-user > impact, so that I and others can make these sorts of decisions. Hi Andrew. I believe that was described in patch 0/2 ? In any case, this is something we need fixed, but it is not -stable material or anything. The bug leads to misaccounting when we quickly enable and disable limit in a loop. We have a synthetic script to demonstrate that. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Wed, 16 May 2012 22:37:15 -0700 Message-ID: <20120516223715.5d1b4385.akpm@linux-foundation.org> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> <4FB46B4C.3000307@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: , , , , , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko To: Glauber Costa Return-path: In-Reply-To: <4FB46B4C.3000307-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: netdev.vger.kernel.org On Thu, 17 May 2012 07:06:52 +0400 Glauber Costa wrote: > ... > >> + else if (val != RESOURCE_MAX) { > >> + /* > >> + * ->activated needs to be written after the static_key update. > >> + * This is what guarantees that the socket activation function > >> + * is the last one to run. See sock_update_memcg() for details, > >> + * and note that we don't mark any socket as belonging to this > >> + * memcg until that flag is up. > >> + * > >> + * We need to do this, because static_keys will span multiple > >> + * sites, but we can't control their order. If we mark a socket > >> + * as accounted, but the accounting functions are not patched in > >> + * yet, we'll lose accounting. > >> + * > >> + * We never race with the readers in sock_update_memcg(), because > >> + * when this value change, the code to process it is not patched in > >> + * yet. > >> + */ > >> + if (!cg_proto->activated) { > >> + static_key_slow_inc(&memcg_socket_limit_enabled); > >> + cg_proto->activated = true; > >> + } > > > > If two threads run this code concurrently, they can both see > > cg_proto->activated==false and they will both run > > static_key_slow_inc(). > > > > Hopefully there's some locking somewhere which prevents this, but it is > > unobvious. We should comment this, probably at the cg_proto.activated > > definition site. Or we should fix the bug ;) > > > If that happens, locking in static_key_slow_inc will prevent any damage. > My previous version had explicit code to prevent that, but we were > pointed out that this is already part of the static_key expectations, so > that was dropped. This makes no sense. If two threads run that code concurrently, key->enabled gets incremented twice. Nobody anywhere has a record that this happened so it cannot be undone. key->enabled is now in an unknown state. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Thu, 17 May 2012 13:52:13 +0400 Message-ID: <4FB4CA4D.50608@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> <4FB46B4C.3000307@parallels.com> <20120516223715.5d1b4385.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: , , , , , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko To: Andrew Morton Return-path: In-Reply-To: <20120516223715.5d1b4385.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: netdev.vger.kernel.org On 05/17/2012 09:37 AM, Andrew Morton wrote: >> > If that happens, locking in static_key_slow_inc will prevent any damage. >> > My previous version had explicit code to prevent that, but we were >> > pointed out that this is already part of the static_key expectations, so >> > that was dropped. > This makes no sense. If two threads run that code concurrently, > key->enabled gets incremented twice. Nobody anywhere has a record that > this happened so it cannot be undone. key->enabled is now in an > unknown state. Kame, Tejun, Andrew is right. It seems we will need that mutex after all. Just this is not a race, and neither something that should belong in the static_branch interface. We want to make sure that enabled is not updated before the jump label update, because we need a specific ordering guarantee at the patched sites. And *that*, the interface guarantees, and we were wrong to believe it did not. That is a correction issue for the accounting, and that part is right. But when we disarm it, we'll need to make sure that happened only once, otherwise we may never unpatch it. That, or we'd need that to be a counter. The jump label interface does not - and should not - keep track of how many updates happened to a key. That's the role of whoever is using it. If you agree with the above, I'll send this patch again with the correction. Andrew, thank you very much. Do you spot anything else here? From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Thu, 17 May 2012 14:22:05 +0400 Message-ID: <4FB4D14D.4020303@parallels.com> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> <4FB46B4C.3000307@parallels.com> <20120516223715.5d1b4385.akpm@linux-foundation.org> <4FB4CA4D.50608@parallels.com> <4FB4D061.10406@jp.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: Andrew Morton , , , , , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko To: KAMEZAWA Hiroyuki Return-path: In-Reply-To: <4FB4D061.10406-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: netdev.vger.kernel.org On 05/17/2012 02:18 PM, KAMEZAWA Hiroyuki wrote: > (2012/05/17 18:52), Glauber Costa wrote: > >> On 05/17/2012 09:37 AM, Andrew Morton wrote: >>>>> If that happens, locking in static_key_slow_inc will prevent any damage. >>>>> My previous version had explicit code to prevent that, but we were >>>>> pointed out that this is already part of the static_key expectations, so >>>>> that was dropped. >>> This makes no sense. If two threads run that code concurrently, >>> key->enabled gets incremented twice. Nobody anywhere has a record that >>> this happened so it cannot be undone. key->enabled is now in an >>> unknown state. >> >> Kame, Tejun, >> >> Andrew is right. It seems we will need that mutex after all. Just this >> is not a race, and neither something that should belong in the >> static_branch interface. >> > > > Hmm....how about having > > res_counter_xchg_limit(res,&old_limit, new_limit); > > if (!cg_proto->updated&& old_limit == RESOURCE_MAX) > ....update labels... > > Then, no mutex overhead maybe and activated will be updated only once. > Ah, but please fix in a way you like. Above is an example. I think a mutex is a lot cleaner than adding a new function to the res_counter interface. We could do a counter, and then later decrement the key until the counter reaches zero, but between those two, I still think a mutex here is preferable. Only that, instead of coming up with a mutex of ours, we could export and reuse set_limit_mutex from memcontrol.c > Thanks, > -Kame > (*) I'm sorry I won't be able to read e-mails, tomorrow. > Ok Kame. I am not in a terrible hurry to fix this, it doesn't seem to be hurting any real workload. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [PATCH v5 2/2] decrement static keys on real destroy time Date: Thu, 17 May 2012 10:02:53 -0700 Message-ID: <20120517100253.3b4a1a20.akpm@linux-foundation.org> References: <1336767077-25351-1-git-send-email-glommer@parallels.com> <1336767077-25351-3-git-send-email-glommer@parallels.com> <20120516140637.17741df6.akpm@linux-foundation.org> <4FB46B4C.3000307@parallels.com> <20120516223715.5d1b4385.akpm@linux-foundation.org> <4FB4CA4D.50608@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: , , , , , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko To: Glauber Costa Return-path: In-Reply-To: <4FB4CA4D.50608@parallels.com> Sender: owner-linux-mm@kvack.org List-Id: netdev.vger.kernel.org On Thu, 17 May 2012 13:52:13 +0400 Glauber Costa wrote: > Andrew is right. It seems we will need that mutex after all. Just this > is not a race, and neither something that should belong in the > static_branch interface. Well, a mutex is one way. Or you could do something like if (!test_and_set_bit(CGPROTO_ACTIVATED, &cg_proto->flags)) static_key_slow_inc(&memcg_socket_limit_enabled); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org