From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with ESMTP id 18C886B0088 for ; Wed, 25 Nov 2009 18:08:09 -0500 (EST) Received: from spaceape10.eur.corp.google.com (spaceape10.eur.corp.google.com [172.28.16.144]) by smtp-out.google.com with ESMTP id nAPN86pK005581 for ; Wed, 25 Nov 2009 15:08:07 -0800 Received: from pxi11 (pxi11.prod.google.com [10.243.27.11]) by spaceape10.eur.corp.google.com with ESMTP id nAPN82Uj005363 for ; Wed, 25 Nov 2009 15:08:03 -0800 Received: by pxi11 with SMTP id 11so137209pxi.9 for ; Wed, 25 Nov 2009 15:08:02 -0800 (PST) Date: Wed, 25 Nov 2009 15:08:00 -0800 (PST) From: David Rientjes Subject: memcg: slab control Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org To: Balbir Singh , Pavel Emelyanov , KAMEZAWA Hiroyuki Cc: Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: Hi, I wanted to see what the current ideas are concerning kernel memory accounting as it relates to the memory controller. Eventually we'll want the ability to restrict cgroups to a hard slab limit. That'll require accounting to map slab allocations back to user tasks so that we can enforce a policy based on the cgroup's aggregated slab usage similiar to how the memory controller currently does for user memory. Is this currently being thought about within the memcg community? We'd like to start a discussion and get everybody's requirements and interests on the table and then become actively involved in the development of such a feature. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with SMTP id AA06A6B0099 for ; Wed, 25 Nov 2009 20:17:36 -0500 (EST) Received: from m6.gw.fujitsu.co.jp ([10.0.50.76]) by fgwmail5.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id nAQ1HXWR018142 for (envelope-from kamezawa.hiroyu@jp.fujitsu.com); Thu, 26 Nov 2009 10:17:33 +0900 Received: from smail (m6 [127.0.0.1]) by outgoing.m6.gw.fujitsu.co.jp (Postfix) with ESMTP id 8BDDC45DE56 for ; Thu, 26 Nov 2009 10:17:33 +0900 (JST) Received: from s6.gw.fujitsu.co.jp (s6.gw.fujitsu.co.jp [10.0.50.96]) by m6.gw.fujitsu.co.jp (Postfix) with ESMTP id 691A945DE4F for ; Thu, 26 Nov 2009 10:17:33 +0900 (JST) Received: from s6.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s6.gw.fujitsu.co.jp (Postfix) with ESMTP id 3C1DD1DB8038 for ; Thu, 26 Nov 2009 10:17:33 +0900 (JST) Received: from m108.s.css.fujitsu.com (m108.s.css.fujitsu.com [10.249.87.108]) by s6.gw.fujitsu.co.jp (Postfix) with ESMTP id E676F1DB8037 for ; Thu, 26 Nov 2009 10:17:32 +0900 (JST) Date: Thu, 26 Nov 2009 10:14:14 +0900 From: KAMEZAWA Hiroyuki Subject: Re: memcg: slab control Message-Id: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: References: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: David Rientjes Cc: Balbir Singh , Pavel Emelyanov , Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: On Wed, 25 Nov 2009 15:08:00 -0800 (PST) David Rientjes wrote: > Hi, > > I wanted to see what the current ideas are concerning kernel memory > accounting as it relates to the memory controller. Eventually we'll want > the ability to restrict cgroups to a hard slab limit. That'll require > accounting to map slab allocations back to user tasks so that we can > enforce a policy based on the cgroup's aggregated slab usage similiar to > how the memory controller currently does for user memory. > > Is this currently being thought about within the memcg community? Not yet. But I always recommend people to implement another memcg (slabcg) for kernel memory. Because - It must have much lower cost than memcg, good perfomance and scalability. system-wide shared counter is nonsense. - slab is not base on LRU. So, another used-memory maintainance scheme should be used. - You can reuse page_cgroup even if slabcg is independent from memcg. But, considering user-side, all people will not welcome dividing memcg and slabcg. So, tieing it to current memcg is ok for me. like... == struct mem_cgroup { .... .... struct slab_cgroup slabcg; (or struct slab_cgroup *slabcg) } == But we have to use another counter and another scheme, another implemenation than memcg, which has good scalability and more fuzzy/lazy controls. (For example, trigger slab-shrink when usage exceeds hiwatermark, not limit.) Scalable accounting is the first wall in front of us. Second one will be how-to-shrink. About information recording, we can reuse page_cgroup and we'll not have much difficulty. I hope, at implementing slabcg, we'll not meet very complicated racy cases as what we met in memcg. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with SMTP id 77CC36B009B for ; Wed, 25 Nov 2009 20:20:00 -0500 (EST) Received: from m4.gw.fujitsu.co.jp ([10.0.50.74]) by fgwmail5.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id nAQ1Jq4Q019093 for (envelope-from kamezawa.hiroyu@jp.fujitsu.com); Thu, 26 Nov 2009 10:19:52 +0900 Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 8241745DE6F for ; Thu, 26 Nov 2009 10:19:52 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 602AF45DE60 for ; Thu, 26 Nov 2009 10:19:52 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 3BFD51DB803B for ; Thu, 26 Nov 2009 10:19:52 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.249.87.104]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id E2D271DB8037 for ; Thu, 26 Nov 2009 10:19:51 +0900 (JST) Date: Thu, 26 Nov 2009 10:17:04 +0900 From: KAMEZAWA Hiroyuki Subject: Re: memcg: slab control Message-Id: <20091126101704.879a1b15.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: References: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: David Rientjes Cc: Balbir Singh , Pavel Emelyanov , Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: On Wed, 25 Nov 2009 15:08:00 -0800 (PST) David Rientjes wrote: > Hi, > > I wanted to see what the current ideas are concerning kernel memory > accounting as it relates to the memory controller. Eventually we'll want > the ability to restrict cgroups to a hard slab limit. That'll require > accounting to map slab allocations back to user tasks so that we can > enforce a policy based on the cgroup's aggregated slab usage similiar to > how the memory controller currently does for user memory. > > Is this currently being thought about within the memcg community? We'd > like to start a discussion and get everybody's requirements and interests > on the table and then become actively involved in the development of such > a feature. > BTW, how much percent of pages are used for slab in Google system ? Because memory size is going bigger and bigger, ratio of slab usage is going smaller, I think. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with SMTP id D43056B00A0 for ; Wed, 25 Nov 2009 21:35:52 -0500 (EST) Received: from m2.gw.fujitsu.co.jp ([10.0.50.72]) by fgwmail7.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id nAQ2Zd0M006175 for (envelope-from kosaki.motohiro@jp.fujitsu.com); Thu, 26 Nov 2009 11:35:39 +0900 Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 4425C45DE64 for ; Thu, 26 Nov 2009 11:35:39 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 0C5CF45DE55 for ; Thu, 26 Nov 2009 11:35:39 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id D4185E78001 for ; Thu, 26 Nov 2009 11:35:38 +0900 (JST) Received: from ml13.s.css.fujitsu.com (ml13.s.css.fujitsu.com [10.249.87.103]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 67A711DB803E for ; Thu, 26 Nov 2009 11:35:37 +0900 (JST) From: KOSAKI Motohiro Subject: Re: memcg: slab control In-Reply-To: References: Message-Id: <20091126113209.5A68.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Date: Thu, 26 Nov 2009 11:35:36 +0900 (JST) Sender: owner-linux-mm@kvack.org To: David Rientjes Cc: kosaki.motohiro@jp.fujitsu.com, Balbir Singh , Pavel Emelyanov , KAMEZAWA Hiroyuki , Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: Hi > Hi, > > I wanted to see what the current ideas are concerning kernel memory > accounting as it relates to the memory controller. Eventually we'll want > the ability to restrict cgroups to a hard slab limit. That'll require > accounting to map slab allocations back to user tasks so that we can > enforce a policy based on the cgroup's aggregated slab usage similiar to > how the memory controller currently does for user memory. > > Is this currently being thought about within the memcg community? We'd > like to start a discussion and get everybody's requirements and interests > on the table and then become actively involved in the development of such > a feature. I don't think memory hard isolation is bad idea. however, slab restriction is too strange. some device use slab frequently, another someone use get_free_pages() directly. only slab restriction will not make expected result from admin view. Probably, we need to implement generic memory reservation framework. it mihgt help implemnt rt-task memory reservation and userland oom manager. It is only my personal opinion... Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with ESMTP id CF0446B0044 for ; Thu, 26 Nov 2009 03:50:40 -0500 (EST) Received: from d28relay01.in.ibm.com (d28relay01.in.ibm.com [9.184.220.58]) by e28smtp01.in.ibm.com (8.14.3/8.13.1) with ESMTP id nAQ8oYgd017726 for ; Thu, 26 Nov 2009 14:20:34 +0530 Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66]) by d28relay01.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id nAQ8oXZW2695286 for ; Thu, 26 Nov 2009 14:20:34 +0530 Received: from d28av04.in.ibm.com (loopback [127.0.0.1]) by d28av04.in.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id nAQ8oWv2026930 for ; Thu, 26 Nov 2009 19:50:33 +1100 Date: Thu, 26 Nov 2009 14:20:31 +0530 From: Balbir Singh Subject: Re: memcg: slab control Message-ID: <20091126085031.GG2970@balbir.in.ibm.com> Reply-To: balbir@linux.vnet.ibm.com References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org To: KAMEZAWA Hiroyuki Cc: David Rientjes , Pavel Emelyanov , Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: * KAMEZAWA Hiroyuki [2009-11-26 10:14:14]: > On Wed, 25 Nov 2009 15:08:00 -0800 (PST) > David Rientjes wrote: > > > Hi, > > > > I wanted to see what the current ideas are concerning kernel memory > > accounting as it relates to the memory controller. Eventually we'll want > > the ability to restrict cgroups to a hard slab limit. That'll require > > accounting to map slab allocations back to user tasks so that we can > > enforce a policy based on the cgroup's aggregated slab usage similiar to > > how the memory controller currently does for user memory. > > > > Is this currently being thought about within the memcg community? > > Not yet. But I always recommend people to implement another memcg (slabcg) for > kernel memory. Because > > - It must have much lower cost than memcg, good perfomance and scalability. > system-wide shared counter is nonsense. > We've solved those issues mostly! Anyway, I agree that we need another slabcg, Pavel did some work in that area and posted patches, but they were mostly based and limited to SLUB (IIRC). > - slab is not base on LRU. So, another used-memory maintainance scheme should > be used. > > - You can reuse page_cgroup even if slabcg is independent from memcg. > > > But, considering user-side, all people will not welcome dividing memcg and slabcg. > So, tieing it to current memcg is ok for me. > like... > == > struct mem_cgroup { > .... > .... > struct slab_cgroup slabcg; (or struct slab_cgroup *slabcg) > } > == > > But we have to use another counter and another scheme, another implemenation > than memcg, which has good scalability and more fuzzy/lazy controls. > (For example, trigger slab-shrink when usage exceeds hiwatermark, not limit.) > That depends on requirements, hiwatermark is more like a soft limit than a hard limit and there might be need for hard limits. > Scalable accounting is the first wall in front of us. Second one will be > how-to-shrink. About information recording, we can reuse page_cgroup and > we'll not have much difficulty. > > I hope, at implementing slabcg, we'll not meet very complicated > racy cases as what we met in memcg. > I think it will be because there is no swapping involved, OOM and rare race conditions. There is limited slab reclaim possible, but otherwise I think it is easier to write a slab controller IMHO. -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with SMTP id 0C5A46B0062 for ; Thu, 26 Nov 2009 03:58:58 -0500 (EST) Received: from m4.gw.fujitsu.co.jp ([10.0.50.74]) by fgwmail5.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id nAQ8wuC4021578 for (envelope-from kamezawa.hiroyu@jp.fujitsu.com); Thu, 26 Nov 2009 17:58:56 +0900 Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 1262A45DE4D for ; Thu, 26 Nov 2009 17:58:56 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id D3D9445DE70 for ; Thu, 26 Nov 2009 17:58:55 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 9CD7BE1800B for ; Thu, 26 Nov 2009 17:58:55 +0900 (JST) Received: from m107.s.css.fujitsu.com (m107.s.css.fujitsu.com [10.249.87.107]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 4571DE18008 for ; Thu, 26 Nov 2009 17:58:55 +0900 (JST) Date: Thu, 26 Nov 2009 17:56:06 +0900 From: KAMEZAWA Hiroyuki Subject: Re: memcg: slab control Message-Id: <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20091126085031.GG2970@balbir.in.ibm.com> References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: balbir@linux.vnet.ibm.com Cc: David Rientjes , Pavel Emelyanov , Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: On Thu, 26 Nov 2009 14:20:31 +0530 Balbir Singh wrote: > * KAMEZAWA Hiroyuki [2009-11-26 10:14:14]: > > > On Wed, 25 Nov 2009 15:08:00 -0800 (PST) > > David Rientjes wrote: > > > > > Hi, > > > > > > I wanted to see what the current ideas are concerning kernel memory > > > accounting as it relates to the memory controller. Eventually we'll want > > > the ability to restrict cgroups to a hard slab limit. That'll require > > > accounting to map slab allocations back to user tasks so that we can > > > enforce a policy based on the cgroup's aggregated slab usage similiar to > > > how the memory controller currently does for user memory. > > > > > > Is this currently being thought about within the memcg community? > > > > Not yet. But I always recommend people to implement another memcg (slabcg) for > > kernel memory. Because > > > > - It must have much lower cost than memcg, good perfomance and scalability. > > system-wide shared counter is nonsense. > > > > We've solved those issues mostly! yes. but our solution is for page faults. resolution of slab allocation is much more fine grained and often. > Anyway, I agree that we need another > slabcg, Pavel did some work in that area and posted patches, but they > were mostly based and limited to SLUB (IIRC). > > > - slab is not base on LRU. So, another used-memory maintainance scheme should > > be used. > > > > - You can reuse page_cgroup even if slabcg is independent from memcg. > > > > > > But, considering user-side, all people will not welcome dividing memcg and slabcg. > > So, tieing it to current memcg is ok for me. > > like... > > == > > struct mem_cgroup { > > .... > > .... > > struct slab_cgroup slabcg; (or struct slab_cgroup *slabcg) > > } > > == > > > > But we have to use another counter and another scheme, another implemenation > > than memcg, which has good scalability and more fuzzy/lazy controls. > > (For example, trigger slab-shrink when usage exceeds hiwatermark, not limit.) > > > > That depends on requirements, hiwatermark is more like a soft limit > than a hard limit and there might be need for hard limits. > My point is that most of the kernel codes cannot work well when kmalloc(small area) returns NULL. > > Scalable accounting is the first wall in front of us. Second one will be > > how-to-shrink. About information recording, we can reuse page_cgroup and > > we'll not have much difficulty. > > > > I hope, at implementing slabcg, we'll not meet very complicated > > racy cases as what we met in memcg. > > > > I think it will be because there is no swapping involved, OOM and rare > race conditions. There is limited slab reclaim possible, but otherwise > I think it is easier to write a slab controller IMHO. > yes ;) Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with ESMTP id 9824B6B007B for ; Thu, 26 Nov 2009 04:11:09 -0500 (EST) Message-ID: <4B0E461C.50606@parallels.com> Date: Thu, 26 Nov 2009 12:10:52 +0300 From: Pavel Emelyanov MIME-Version: 1.0 Subject: Re: memcg: slab control References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: KAMEZAWA Hiroyuki , balbir@linux.vnet.ibm.com, David Rientjes Cc: Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: >> Anyway, I agree that we need another >> slabcg, Pavel did some work in that area and posted patches, but they >> were mostly based and limited to SLUB (IIRC). I'm ready to resurrect the patches and port them for slab. But before doing it we should answer one question. Consider we have two kmalloc-s in a kernel code - one is user-space triggerable and the other one is not. From my POV we should account for the former one, but should not for the latter. If so - how should we patch the kernel to achieve that goal? > My point is that most of the kernel codes cannot work well when kmalloc(small area) > returns NULL. :) That's not so actually. As our experience shows kernel lives fine when kmalloc returns NULL (this doesn't include drivers though). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with SMTP id 35A6C6B007E for ; Thu, 26 Nov 2009 04:36:31 -0500 (EST) Received: from m4.gw.fujitsu.co.jp ([10.0.50.74]) by fgwmail7.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id nAQ9aSli019155 for (envelope-from kamezawa.hiroyu@jp.fujitsu.com); Thu, 26 Nov 2009 18:36:28 +0900 Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 26AFD45DE60 for ; Thu, 26 Nov 2009 18:36:28 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 0357A45DE4D for ; Thu, 26 Nov 2009 18:36:28 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id E1264E18001 for ; Thu, 26 Nov 2009 18:36:27 +0900 (JST) Received: from m107.s.css.fujitsu.com (m107.s.css.fujitsu.com [10.249.87.107]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 94283E18004 for ; Thu, 26 Nov 2009 18:36:24 +0900 (JST) Date: Thu, 26 Nov 2009 18:33:35 +0900 From: KAMEZAWA Hiroyuki Subject: Re: memcg: slab control Message-Id: <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <4B0E461C.50606@parallels.com> References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Pavel Emelyanov Cc: balbir@linux.vnet.ibm.com, David Rientjes , Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: On Thu, 26 Nov 2009 12:10:52 +0300 Pavel Emelyanov wrote: > >> Anyway, I agree that we need another > >> slabcg, Pavel did some work in that area and posted patches, but they > >> were mostly based and limited to SLUB (IIRC). > > I'm ready to resurrect the patches and port them for slab. > But before doing it we should answer one question. > > Consider we have two kmalloc-s in a kernel code - one is > user-space triggerable and the other one is not. From my > POV we should account for the former one, but should not > for the latter. > > If so - how should we patch the kernel to achieve that goal? > > > My point is that most of the kernel codes cannot work well when kmalloc(small area) > > returns NULL. > > :) That's not so actually. As our experience shows kernel lives fine > when kmalloc returns NULL (this doesn't include drivers though). > One issue it comes to my mind is that file system can return -EIO because kmalloc() returns NULL. the kernel may work fine but terrible to users ;) Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with ESMTP id 61E466B0083 for ; Thu, 26 Nov 2009 04:56:20 -0500 (EST) Message-ID: <4B0E50B1.20602@parallels.com> Date: Thu, 26 Nov 2009 12:56:01 +0300 From: Pavel Emelyanov MIME-Version: 1.0 Subject: Re: memcg: slab control References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: KAMEZAWA Hiroyuki Cc: balbir@linux.vnet.ibm.com, David Rientjes , Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: KAMEZAWA Hiroyuki wrote: > On Thu, 26 Nov 2009 12:10:52 +0300 > Pavel Emelyanov wrote: > >>>> Anyway, I agree that we need another >>>> slabcg, Pavel did some work in that area and posted patches, but they >>>> were mostly based and limited to SLUB (IIRC). >> I'm ready to resurrect the patches and port them for slab. >> But before doing it we should answer one question. >> >> Consider we have two kmalloc-s in a kernel code - one is >> user-space triggerable and the other one is not. From my >> POV we should account for the former one, but should not >> for the latter. >> >> If so - how should we patch the kernel to achieve that goal? >> >>> My point is that most of the kernel codes cannot work well when kmalloc(small area) >>> returns NULL. >> :) That's not so actually. As our experience shows kernel lives fine >> when kmalloc returns NULL (this doesn't include drivers though). >> > One issue it comes to my mind is that file system can return -EIO because > kmalloc() returns NULL. the kernel may work fine but terrible to users ;) That relates to my question above - we should not account for all kmalloc-s. In particular - we don't account for bio-s and buffer-head-s since their amount is not under direct user control. Yes, you can request for heavy IO, but first, kernel sends your task to sleep under certain conditions and second, bio-s are destroyed as soon as they are finished and thus bio-s and buffer-head-s cannot be used to eat all the kernel memory. > > Thanks, > -Kame > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id 9135B6B0087 for ; Thu, 26 Nov 2009 05:01:37 -0500 (EST) Received: from wpaz9.hot.corp.google.com (wpaz9.hot.corp.google.com [172.24.198.73]) by smtp-out.google.com with ESMTP id nAQA1Wtm025785 for ; Thu, 26 Nov 2009 10:01:33 GMT Received: from pwi15 (pwi15.prod.google.com [10.241.219.15]) by wpaz9.hot.corp.google.com with ESMTP id nAQA1TCU028482 for ; Thu, 26 Nov 2009 02:01:30 -0800 Received: by pwi15 with SMTP id 15so410942pwi.4 for ; Thu, 26 Nov 2009 02:01:29 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20091126101704.879a1b15.kamezawa.hiroyu@jp.fujitsu.com> References: <20091126101704.879a1b15.kamezawa.hiroyu@jp.fujitsu.com> Date: Thu, 26 Nov 2009 02:01:29 -0800 Message-ID: Subject: Re: memcg: slab control From: Suleiman Souhlal Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org To: KAMEZAWA Hiroyuki Cc: David Rientjes , Balbir Singh , Pavel Emelyanov , Ying Han , linux-mm@kvack.org List-ID: Hello, On 11/25/09, KAMEZAWA Hiroyuki wrote: > BTW, how much percent of pages are used for slab in Google system ? > Because memory size is going bigger and bigger, ratio of slab usage is going > smaller, I think. It varies. The amount of slab on systems can go from negligible to being a significant portion of the total memory (in network intensive workloads, for example). -- Suleiman -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id 9600C6B0088 for ; Thu, 26 Nov 2009 05:13:35 -0500 (EST) Received: from wpaz24.hot.corp.google.com (wpaz24.hot.corp.google.com [172.24.198.88]) by smtp-out.google.com with ESMTP id nAQADKH2028348 for ; Thu, 26 Nov 2009 10:13:21 GMT Received: from pzk5 (pzk5.prod.google.com [10.243.19.133]) by wpaz24.hot.corp.google.com with ESMTP id nAQADIoh026094 for ; Thu, 26 Nov 2009 02:13:18 -0800 Received: by pzk5 with SMTP id 5so413828pzk.18 for ; Thu, 26 Nov 2009 02:13:17 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20091126085031.GG2970@balbir.in.ibm.com> References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> Date: Thu, 26 Nov 2009 02:13:17 -0800 Message-ID: Subject: Re: memcg: slab control From: Suleiman Souhlal Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org To: balbir@linux.vnet.ibm.com Cc: KAMEZAWA Hiroyuki , David Rientjes , Pavel Emelyanov , Ying Han , linux-mm@kvack.org List-ID: On 11/26/09, Balbir Singh wrote: > I think it is easier to write a slab controller IMHO. One potential problem I can think of with writing a slab controller would be that the user would have to estimate what fraction of the amount of memory slab should be allowed to use, which might not be ideal. If you wanted to limit a cgroup to a total of 1GB of memory, you might not care if the job wants to use 0.9 GB of user memory and 0.1GB of slab or if it wants to use 0.9GB of slab and 0.1GB of user memory.. Because of this, it might be more practical to integrate the slab accounting in memcg. -- Suleiman -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail202.messagelabs.com (mail202.messagelabs.com [216.82.254.227]) by kanga.kvack.org (Postfix) with ESMTP id 009616B008C for ; Thu, 26 Nov 2009 05:24:15 -0500 (EST) Received: from wpaz9.hot.corp.google.com (wpaz9.hot.corp.google.com [172.24.198.73]) by smtp-out.google.com with ESMTP id nAQAOBeV017212 for ; Thu, 26 Nov 2009 10:24:12 GMT Received: from pxi16 (pxi16.prod.google.com [10.243.27.16]) by wpaz9.hot.corp.google.com with ESMTP id nAQAO8wd019301 for ; Thu, 26 Nov 2009 02:24:09 -0800 Received: by pxi16 with SMTP id 16so442270pxi.29 for ; Thu, 26 Nov 2009 02:24:08 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <4B0E50B1.20602@parallels.com> References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> <4B0E50B1.20602@parallels.com> Date: Thu, 26 Nov 2009 02:24:08 -0800 Message-ID: Subject: Re: memcg: slab control From: Suleiman Souhlal Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org To: Pavel Emelyanov Cc: KAMEZAWA Hiroyuki , balbir@linux.vnet.ibm.com, David Rientjes , Ying Han , linux-mm@kvack.org List-ID: On 11/26/09, Pavel Emelyanov wrote: > KAMEZAWA Hiroyuki wrote: > > On Thu, 26 Nov 2009 12:10:52 +0300 > > Pavel Emelyanov wrote: > > > >>>> Anyway, I agree that we need another > >>>> slabcg, Pavel did some work in that area and posted patches, but they > >>>> were mostly based and limited to SLUB (IIRC). > >> I'm ready to resurrect the patches and port them for slab. > >> But before doing it we should answer one question. > >> > >> Consider we have two kmalloc-s in a kernel code - one is > >> user-space triggerable and the other one is not. From my > >> POV we should account for the former one, but should not > >> for the latter. > >> > >> If so - how should we patch the kernel to achieve that goal? > >> > >>> My point is that most of the kernel codes cannot work well when kmalloc(small area) > >>> returns NULL. > >> :) That's not so actually. As our experience shows kernel lives fine > >> when kmalloc returns NULL (this doesn't include drivers though). > >> > > One issue it comes to my mind is that file system can return -EIO because > > kmalloc() returns NULL. the kernel may work fine but terrible to users ;) > > > That relates to my question above - we should not account for all > kmalloc-s. In particular - we don't account for bio-s and buffer-head-s > since their amount is not under direct user control. Yes, you can > request for heavy IO, but first, kernel sends your task to sleep under > certain conditions and second, bio-s are destroyed as soon as they are > finished and thus bio-s and buffer-head-s cannot be used to eat all the > kernel memory. Aren't there patches to make the kernel track which cgroup caused which disk I/O? If so, it should be possible to charge the bios to the right cgroup. Maybe one way to decide which kernel allocations should be accounted would be to look at the calling context: If the allocation is done in user context (syscall), then it could be counted towards that user, while if the allocation is done in interrupt or kthread context, it shouldn't be accounted. Of course, this wouldn't be perfect, but it might be a good enough approximation. -- Suleiman -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with ESMTP id 50C936B0096 for ; Thu, 26 Nov 2009 07:32:02 -0500 (EST) Message-ID: <4B0E7530.8050304@parallels.com> Date: Thu, 26 Nov 2009 15:31:44 +0300 From: Pavel Emelyanov MIME-Version: 1.0 Subject: Re: memcg: slab control References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> <4B0E50B1.20602@parallels.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Suleiman Souhlal Cc: KAMEZAWA Hiroyuki , balbir@linux.vnet.ibm.com, David Rientjes , Ying Han , linux-mm@kvack.org List-ID: > Aren't there patches to make the kernel track which cgroup caused > which disk I/O? If so, it should be possible to charge the bios to the > right cgroup. > > Maybe one way to decide which kernel allocations should be accounted > would be to look at the calling context: If the allocation is done in > user context (syscall), then it could be counted towards that user, > while if the allocation is done in interrupt or kthread context, it > shouldn't be accounted. > > Of course, this wouldn't be perfect, but it might be a good enough > approximation. I disagree. Bio-s are allocated in user context for all typical reads (unless we requested aio) and are allocated either in pdflush context or (!) in arbitrary task context for writes (e.g. via try_to_free_pages) and thus such bio/buffer_head accounting will be completely random. One of the way to achieve the goal I can propose the following (it's not perfect, but just smth to start discussion from). We implement support for accounting based on a bit on a kmem_cache structure and mark all kmalloc caches as not-accountable. Then we grep the kernel to find all kmalloc-s and think - if a kmalloc is to be accounted we turn this into kmem_cache_alloc() with dedicated kmem_cache and mark it as accountable. > -- Suleiman > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with ESMTP id A7ED66B0098 for ; Thu, 26 Nov 2009 07:52:06 -0500 (EST) Received: from wpaz13.hot.corp.google.com (wpaz13.hot.corp.google.com [172.24.198.77]) by smtp-out.google.com with ESMTP id nAQCq3tO012045 for ; Thu, 26 Nov 2009 04:52:03 -0800 Received: from pzk42 (pzk42.prod.google.com [10.243.19.170]) by wpaz13.hot.corp.google.com with ESMTP id nAQCpp3S028642 for ; Thu, 26 Nov 2009 04:52:00 -0800 Received: by pzk42 with SMTP id 42so520779pzk.31 for ; Thu, 26 Nov 2009 04:52:00 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <4B0E7530.8050304@parallels.com> References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> <4B0E50B1.20602@parallels.com> <4B0E7530.8050304@parallels.com> Date: Thu, 26 Nov 2009 04:52:00 -0800 Message-ID: Subject: Re: memcg: slab control From: Suleiman Souhlal Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org To: Pavel Emelyanov Cc: KAMEZAWA Hiroyuki , balbir@linux.vnet.ibm.com, David Rientjes , Ying Han , linux-mm@kvack.org List-ID: On 11/26/09, Pavel Emelyanov wrote: > > Aren't there patches to make the kernel track which cgroup caused > > which disk I/O? If so, it should be possible to charge the bios to the > > right cgroup. > > > > Maybe one way to decide which kernel allocations should be accounted > > would be to look at the calling context: If the allocation is done in > > user context (syscall), then it could be counted towards that user, > > while if the allocation is done in interrupt or kthread context, it > > shouldn't be accounted. > > > > Of course, this wouldn't be perfect, but it might be a good enough > > approximation. > > > I disagree. Bio-s are allocated in user context for all typical reads > (unless we requested aio) and are allocated either in pdflush context > or (!) in arbitrary task context for writes (e.g. via try_to_free_pages) > and thus such bio/buffer_head accounting will be completely random. Yes, that's why I pointed out that you can account to the right cgroup if you track who caused the I/O (which, I imagine, should already be done by the block i/o bandwidth controller, or similar). For most other allocations, on the other hand, accounting to the current context should be fine. > One of the way to achieve the goal I can propose the following (it's > not perfect, but just smth to start discussion from). > > We implement support for accounting based on a bit on a kmem_cache > structure and mark all kmalloc caches as not-accountable. Then we grep > the kernel to find all kmalloc-s and think - if a kmalloc is to be > accounted we turn this into kmem_cache_alloc() with dedicated > kmem_cache and mark it as accountable. That sounds like a lot of work. :-) -- Suleiman -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with ESMTP id E8A366B004D for ; Fri, 27 Nov 2009 02:02:01 -0500 (EST) Received: from spaceape7.eur.corp.google.com (spaceape7.eur.corp.google.com [172.28.16.141]) by smtp-out.google.com with ESMTP id nAR71vqm011430 for ; Fri, 27 Nov 2009 07:01:58 GMT Received: from pwj17 (pwj17.prod.google.com [10.241.219.81]) by spaceape7.eur.corp.google.com with ESMTP id nAR71pCn022893 for ; Thu, 26 Nov 2009 23:01:54 -0800 Received: by pwj17 with SMTP id 17so1010743pwj.5 for ; Thu, 26 Nov 2009 23:01:51 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20091126113209.5A68.A69D9226@jp.fujitsu.com> References: <20091126113209.5A68.A69D9226@jp.fujitsu.com> Date: Thu, 26 Nov 2009 23:01:51 -0800 Message-ID: <604427e00911262301tac7f55avedd44263fbabccc2@mail.gmail.com> Subject: Re: memcg: slab control From: Ying Han Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org To: KOSAKI Motohiro Cc: David Rientjes , Balbir Singh , Pavel Emelyanov , KAMEZAWA Hiroyuki , Suleiman Souhlal , linux-mm@kvack.org List-ID: On Wed, Nov 25, 2009 at 6:35 PM, KOSAKI Motohiro wrote: > Hi > >> Hi, >> >> I wanted to see what the current ideas are concerning kernel memory >> accounting as it relates to the memory controller. =A0Eventually we'll w= ant >> the ability to restrict cgroups to a hard slab limit. =A0That'll require >> accounting to map slab allocations back to user tasks so that we can >> enforce a policy based on the cgroup's aggregated slab usage similiar to >> how the memory controller currently does for user memory. >> >> Is this currently being thought about within the memcg community? =A0We'= d >> like to start a discussion and get everybody's requirements and interest= s >> on the table and then become actively involved in the development of suc= h >> a feature. > > I don't think memory hard isolation is bad idea. however, slab restrictio= n > is too strange. some device use slab frequently, another someone use get_= free_pages() > directly. only slab restriction will not make expected result from admin = view. > > Probably, we need to implement generic memory reservation framework. it m= ihgt help > implemnt rt-task memory reservation and userland oom manager. > > It is only my personal opinion... Looks like the beancounters implementation counts both the kernel slab objects as well as the pages from get_free_pages(). But It relies the caller to pass down a GFP flag indicating the page or slab to be accountable or not. I am looking at the beancounters v5 at: http://lkml.indiana.edu/hypermail/linux/kernel/0610.0/1719.html I kind of like the idea to have a kernel memory controller instead of kernel slab controller. If we only count kernel slabs, do we need another mechanism to count kernel allocations directly from get_free_pages() ? --Ying > > > Thanks. > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with ESMTP id 917BC6B004D for ; Fri, 27 Nov 2009 02:15:57 -0500 (EST) Received: from spaceape11.eur.corp.google.com (spaceape11.eur.corp.google.com [172.28.16.145]) by smtp-out.google.com with ESMTP id nAR7FrVL027035 for ; Thu, 26 Nov 2009 23:15:54 -0800 Received: from pxi36 (pxi36.prod.google.com [10.243.27.36]) by spaceape11.eur.corp.google.com with ESMTP id nAR7FoCv021916 for ; Thu, 26 Nov 2009 23:15:50 -0800 Received: by pxi36 with SMTP id 36so909990pxi.26 for ; Thu, 26 Nov 2009 23:15:49 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <4B0E7530.8050304@parallels.com> References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> <4B0E50B1.20602@parallels.com> <4B0E7530.8050304@parallels.com> Date: Thu, 26 Nov 2009 23:15:49 -0800 Message-ID: <604427e00911262315n5d520cf4p447f68e7053adc11@mail.gmail.com> Subject: Re: memcg: slab control From: Ying Han Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org To: Pavel Emelyanov Cc: Suleiman Souhlal , KAMEZAWA Hiroyuki , balbir@linux.vnet.ibm.com, David Rientjes , linux-mm@kvack.org List-ID: On Thu, Nov 26, 2009 at 4:31 AM, Pavel Emelyanov wrote: >> Aren't there patches to make the kernel track which cgroup caused >> which disk I/O? If so, it should be possible to charge the bios to the >> right cgroup. >> >> Maybe one way to decide which kernel allocations should be accounted >> would be to look at the calling context: If the allocation is done in >> user context (syscall), then it could be counted towards that user, >> while if the allocation is done in interrupt or kthread context, it >> shouldn't be accounted. >> >> Of course, this wouldn't be perfect, but it might be a good enough >> approximation. > > I disagree. Bio-s are allocated in user context for all typical reads > (unless we requested aio) and are allocated either in pdflush context > or (!) in arbitrary task context for writes (e.g. via try_to_free_pages) > and thus such bio/buffer_head accounting will be completely random. > > One of the way to achieve the goal I can propose the following (it's > not perfect, but just smth to start discussion from). > > We implement support for accounting based on a bit on a kmem_cache > structure and mark all kmalloc caches as not-accountable. Then we grep > the kernel to find all kmalloc-s and think - if a kmalloc is to be > accounted we turn this into kmem_cache_alloc() with dedicated > kmem_cache and mark it as accountable. Well it would be nice to count all kernel memory allocations trigger-able by user programs, the kernel memory includes kernel slabs as well as the pages directly allocated by get_free_pages(). However some of the allocations happen asynchronously like in kernel thread or interrupt context. We can not charge them on the random process happen to run at the time. We can either not count those allocations, or do some special treatment to remember who owns those allocations. In our networking intensive workload, it causes us lots of trouble of miscounting the networking slabs for incoming packets. So we make changes in the networking stack which records the owner of the socket and then charge the slab later using that recorded information. --Ying >> -- Suleiman >> > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with ESMTP id C8CD56B004D for ; Fri, 27 Nov 2009 04:45:27 -0500 (EST) Message-ID: <4B0F9F9E.3060604@parallels.com> Date: Fri, 27 Nov 2009 12:45:02 +0300 From: Pavel Emelyanov MIME-Version: 1.0 Subject: Re: memcg: slab control References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> <4B0E50B1.20602@parallels.com> <4B0E7530.8050304@parallels.com> <604427e00911262315n5d520cf4p447f68e7053adc11@mail.gmail.com> In-Reply-To: <604427e00911262315n5d520cf4p447f68e7053adc11@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Ying Han Cc: Suleiman Souhlal , KAMEZAWA Hiroyuki , balbir@linux.vnet.ibm.com, David Rientjes , linux-mm@kvack.org List-ID: Ying Han wrote: > On Thu, Nov 26, 2009 at 4:31 AM, Pavel Emelyanov wrote: >>> Aren't there patches to make the kernel track which cgroup caused >>> which disk I/O? If so, it should be possible to charge the bios to the >>> right cgroup. >>> >>> Maybe one way to decide which kernel allocations should be accounted >>> would be to look at the calling context: If the allocation is done in >>> user context (syscall), then it could be counted towards that user, >>> while if the allocation is done in interrupt or kthread context, it >>> shouldn't be accounted. >>> >>> Of course, this wouldn't be perfect, but it might be a good enough >>> approximation. >> I disagree. Bio-s are allocated in user context for all typical reads >> (unless we requested aio) and are allocated either in pdflush context >> or (!) in arbitrary task context for writes (e.g. via try_to_free_pages) >> and thus such bio/buffer_head accounting will be completely random. >> >> One of the way to achieve the goal I can propose the following (it's >> not perfect, but just smth to start discussion from). >> >> We implement support for accounting based on a bit on a kmem_cache >> structure and mark all kmalloc caches as not-accountable. Then we grep >> the kernel to find all kmalloc-s and think - if a kmalloc is to be >> accounted we turn this into kmem_cache_alloc() with dedicated >> kmem_cache and mark it as accountable. > > Well it would be nice to count all kernel memory allocations > trigger-able by user programs, the kernel > memory includes kernel slabs as well as the pages directly allocated > by get_free_pages(). However some > of the allocations happen asynchronously like in kernel thread or > interrupt context. We can not charge them > on the random process happen to run at the time. > > We can either not count those allocations, or do some special > treatment to remember who owns those allocations. > In our networking intensive workload, it causes us lots of trouble of > miscounting the networking slabs for incoming > packets. So we make changes in the networking stack which records the > owner of the socket and then charge the > slab later using that recorded information. That's the same as what we do, but note, that simple accounting is not enough for socket buffers (a.k.a. skb-s). In a perfect world we should implement a memory management similar to what already exists in the networking. In particular - sockets should not report errors in case of kmem shortage, but instead goto waiting state. Besides, TCP sockets should adjust the TCP window according to the current kmem controller state and this task is quite complex. > --Ying > >>> -- Suleiman >>> >> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id 15E056B004D for ; Fri, 27 Nov 2009 04:48:34 -0500 (EST) Message-ID: <4B0FA060.4050907@parallels.com> Date: Fri, 27 Nov 2009 12:48:16 +0300 From: Pavel Emelyanov MIME-Version: 1.0 Subject: Re: memcg: slab control References: <20091126113209.5A68.A69D9226@jp.fujitsu.com> <604427e00911262301tac7f55avedd44263fbabccc2@mail.gmail.com> In-Reply-To: <604427e00911262301tac7f55avedd44263fbabccc2@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Ying Han Cc: KOSAKI Motohiro , David Rientjes , Balbir Singh , Pavel Emelyanov , KAMEZAWA Hiroyuki , Suleiman Souhlal , linux-mm@kvack.org List-ID: Ying Han wrote: > I kind of like the idea to have a kernel memory controller instead of > kernel slab controller. > If we only count kernel slabs, do we need another mechanism to count > kernel allocations > directly from get_free_pages() ? We do. Look at what get_free_pages we mark with GFP_UBC in beancounters and see, that if we don't count them this creates way to DoS the kernel. > --Ying >> >> Thanks. >> >> > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id 9E883600309 for ; Mon, 30 Nov 2009 04:17:09 -0500 (EST) Received: from d23relay04.au.ibm.com (d23relay04.au.ibm.com [202.81.31.246]) by e23smtp05.au.ibm.com (8.14.3/8.13.1) with ESMTP id nAU9E4CP002030 for ; Mon, 30 Nov 2009 20:14:04 +1100 Received: from d23av02.au.ibm.com (d23av02.au.ibm.com [9.190.235.138]) by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id nAU9DWWM1151058 for ; Mon, 30 Nov 2009 20:13:32 +1100 Received: from d23av02.au.ibm.com (loopback [127.0.0.1]) by d23av02.au.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id nAU9H43B029812 for ; Mon, 30 Nov 2009 20:17:04 +1100 Date: Mon, 30 Nov 2009 14:47:00 +0530 From: Balbir Singh Subject: Re: memcg: slab control Message-ID: <20091130091700.GK2970@balbir.in.ibm.com> Reply-To: balbir@linux.vnet.ibm.com References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org To: Suleiman Souhlal Cc: KAMEZAWA Hiroyuki , David Rientjes , Pavel Emelyanov , Ying Han , linux-mm@kvack.org List-ID: * Suleiman Souhlal [2009-11-26 02:13:17]: > On 11/26/09, Balbir Singh wrote: > > I think it is easier to write a slab controller IMHO. > > One potential problem I can think of with writing a slab controller > would be that the user would have to estimate what fraction of the > amount of memory slab should be allowed to use, which might not be > ideal. > > If you wanted to limit a cgroup to a total of 1GB of memory, you might > not care if the job wants to use 0.9 GB of user memory and 0.1GB of > slab or if it wants to use 0.9GB of slab and 0.1GB of user memory.. > Hmm.. true, yes not caring about how memory usage is partitioned is nice (we have memsw for very similar reasons). > Because of this, it might be more practical to integrate the slab > accounting in memcg. > I tend to agree, but I would like to see the early design and thoughts. Like Kame pointed, integrating their accounting can be an issue. -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with ESMTP id 767BF600309 for ; Mon, 30 Nov 2009 17:45:54 -0500 (EST) Received: from zps75.corp.google.com (zps75.corp.google.com [172.25.146.75]) by smtp-out.google.com with ESMTP id nAUMjqC9011280 for ; Mon, 30 Nov 2009 14:45:53 -0800 Received: from pxi39 (pxi39.prod.google.com [10.243.27.39]) by zps75.corp.google.com with ESMTP id nAUMjmCj000960 for ; Mon, 30 Nov 2009 14:45:50 -0800 Received: by pxi39 with SMTP id 39so3268220pxi.2 for ; Mon, 30 Nov 2009 14:45:48 -0800 (PST) Date: Mon, 30 Nov 2009 14:45:45 -0800 (PST) From: David Rientjes Subject: Re: memcg: slab control In-Reply-To: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> Message-ID: References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org To: KAMEZAWA Hiroyuki Cc: Balbir Singh , Pavel Emelyanov , Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: On Thu, 26 Nov 2009, KAMEZAWA Hiroyuki wrote: > But, considering user-side, all people will not welcome dividing memcg and slabcg. > So, tieing it to current memcg is ok for me. Agreed. > like... > == > struct mem_cgroup { > .... > .... > struct slab_cgroup slabcg; (or struct slab_cgroup *slabcg) > } > == > > But we have to use another counter and another scheme, another implemenation > than memcg, which has good scalability and more fuzzy/lazy controls. > (For example, trigger slab-shrink when usage exceeds hiwatermark, not limit.) > We're only really interested in using memcg and slabcg together for accounting all memory allotted to a particular cgroup. I'm trying to imagine a scenario where someone would want to account and enforce hard slab limits without using memcg as well. If there are none (and one of the reasons we're trying to illicit discussion is to determine everyone's requirements for such a feature), we can probably tie them together without worrying about incurring unnecessary overhead by using the memcg framework that isn't related to slab accounting. I think the ideal userspace API would be simply to add slab accounting to the memcg's limit_in_bytes if a memcg option were enabled for a cgroup. I don't think it would be helpful to add a ratio of that limit for slab, though, since it's very difficult to predict the usage for a particular workload. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with ESMTP id 559FD600309 for ; Mon, 30 Nov 2009 17:55:33 -0500 (EST) Received: from zps75.corp.google.com (zps75.corp.google.com [172.25.146.75]) by smtp-out.google.com with ESMTP id nAUMtTWI021009 for ; Mon, 30 Nov 2009 22:55:30 GMT Received: from pzk1 (pzk1.prod.google.com [10.243.19.129]) by zps75.corp.google.com with ESMTP id nAUMtQte011143 for ; Mon, 30 Nov 2009 14:55:26 -0800 Received: by pzk1 with SMTP id 1so3133664pzk.33 for ; Mon, 30 Nov 2009 14:55:26 -0800 (PST) Date: Mon, 30 Nov 2009 14:55:25 -0800 (PST) From: David Rientjes Subject: Re: memcg: slab control In-Reply-To: <4B0E461C.50606@parallels.com> Message-ID: References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org To: Pavel Emelyanov Cc: KAMEZAWA Hiroyuki , balbir@linux.vnet.ibm.com, Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: On Thu, 26 Nov 2009, Pavel Emelyanov wrote: > I'm ready to resurrect the patches and port them for slab. > But before doing it we should answer one question. > Do you have a pointer to your latest implementation that you proposed for slab? > Consider we have two kmalloc-s in a kernel code - one is > user-space triggerable and the other one is not. From my > POV we should account for the former one, but should not > for the latter. > > If so - how should we patch the kernel to achieve that goal? > I think all slab allocations should be accounted for based on current's memcg other than those done in hardirq context, annotating slab allocations doesn't seem scalable. Whether the accounting is done on a task level or cgroup level isn't really a problem for us since we don't move tasks amongst cgroups. I imagine there've been previous restrictions on that put into place with the memcg so this doesn't seem like a slabcg-specific requirement anyway. The problem on the freeing side is mapping the object back to the cgroup that allocated it. We'd also need to map the object to the context in which it was allocated to determine whether we should decrement the counter or not. How do you propose doing that without a considerable overhead in memory consumption, fastpath branch, and cache cold slabcg lookups? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with ESMTP id 29368600309 for ; Mon, 30 Nov 2009 17:58:02 -0500 (EST) Received: from spaceape23.eur.corp.google.com (spaceape23.eur.corp.google.com [172.28.16.75]) by smtp-out.google.com with ESMTP id nAUMvwDX017841 for ; Mon, 30 Nov 2009 22:57:58 GMT Received: from pzk7 (pzk7.prod.google.com [10.243.19.135]) by spaceape23.eur.corp.google.com with ESMTP id nAUMrv5t016802 for ; Mon, 30 Nov 2009 14:57:56 -0800 Received: by pzk7 with SMTP id 7so6586773pzk.30 for ; Mon, 30 Nov 2009 14:57:55 -0800 (PST) Date: Mon, 30 Nov 2009 14:57:53 -0800 (PST) From: David Rientjes Subject: Re: memcg: slab control In-Reply-To: <4B0E7530.8050304@parallels.com> Message-ID: References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> <4B0E50B1.20602@parallels.com> <4B0E7530.8050304@parallels.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org To: Pavel Emelyanov Cc: Suleiman Souhlal , KAMEZAWA Hiroyuki , balbir@linux.vnet.ibm.com, Ying Han , linux-mm@kvack.org List-ID: On Thu, 26 Nov 2009, Pavel Emelyanov wrote: > I disagree. Bio-s are allocated in user context for all typical reads > (unless we requested aio) and are allocated either in pdflush context > or (!) in arbitrary task context for writes (e.g. via try_to_free_pages) > and thus such bio/buffer_head accounting will be completely random. > pdflush has been removed, they should all be allocated in process context. > We implement support for accounting based on a bit on a kmem_cache > structure and mark all kmalloc caches as not-accountable. Then we grep > the kernel to find all kmalloc-s and think - if a kmalloc is to be > accounted we turn this into kmem_cache_alloc() with dedicated > kmem_cache and mark it as accountable. > That doesn't work with slab cache merging done in slub. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with SMTP id 6C8C2600309 for ; Tue, 1 Dec 2009 00:14:09 -0500 (EST) Received: from m6.gw.fujitsu.co.jp ([10.0.50.76]) by fgwmail6.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id nB15E61Z015449 for (envelope-from kosaki.motohiro@jp.fujitsu.com); Tue, 1 Dec 2009 14:14:07 +0900 Received: from smail (m6 [127.0.0.1]) by outgoing.m6.gw.fujitsu.co.jp (Postfix) with ESMTP id 7765F45DE4E for ; Tue, 1 Dec 2009 14:14:06 +0900 (JST) Received: from s6.gw.fujitsu.co.jp (s6.gw.fujitsu.co.jp [10.0.50.96]) by m6.gw.fujitsu.co.jp (Postfix) with ESMTP id 58FE245DE4C for ; Tue, 1 Dec 2009 14:14:06 +0900 (JST) Received: from s6.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s6.gw.fujitsu.co.jp (Postfix) with ESMTP id 43DE51DB803E for ; Tue, 1 Dec 2009 14:14:06 +0900 (JST) Received: from m107.s.css.fujitsu.com (m107.s.css.fujitsu.com [10.249.87.107]) by s6.gw.fujitsu.co.jp (Postfix) with ESMTP id 028811DB803A for ; Tue, 1 Dec 2009 14:14:06 +0900 (JST) From: KOSAKI Motohiro Subject: Re: memcg: slab control In-Reply-To: <604427e00911262315n5d520cf4p447f68e7053adc11@mail.gmail.com> References: <4B0E7530.8050304@parallels.com> <604427e00911262315n5d520cf4p447f68e7053adc11@mail.gmail.com> Message-Id: <20091201140726.5C28.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Date: Tue, 1 Dec 2009 14:14:04 +0900 (JST) Sender: owner-linux-mm@kvack.org To: Ying Han Cc: kosaki.motohiro@jp.fujitsu.com, Pavel Emelyanov , Suleiman Souhlal , KAMEZAWA Hiroyuki , balbir@linux.vnet.ibm.com, David Rientjes , linux-mm@kvack.org List-ID: > We can either not count those allocations, or do some special > treatment to remember who owns those allocations. > In our networking intensive workload, it causes us lots of trouble of > miscounting the networking slabs for incoming > packets. So we make changes in the networking stack which records the > owner of the socket and then charge the > slab later using that recorded information. I agree, currentlly network intensive workload is problematic. but I don't think network memory management improvement need to change generic slab management. Why can't we improve current tcp/udp memory accounting? it is good user interface than "amount of slab memory". -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail202.messagelabs.com (mail202.messagelabs.com [216.82.254.227]) by kanga.kvack.org (Postfix) with ESMTP id 4505F600309 for ; Tue, 1 Dec 2009 02:36:41 -0500 (EST) Received: from d23relay04.au.ibm.com (d23relay04.au.ibm.com [202.81.31.246]) by e23smtp04.au.ibm.com (8.14.3/8.13.1) with ESMTP id nB17XNnP006614 for ; Tue, 1 Dec 2009 18:33:23 +1100 Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97]) by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id nB17X2Qn1450232 for ; Tue, 1 Dec 2009 18:33:02 +1100 Received: from d23av03.au.ibm.com (loopback [127.0.0.1]) by d23av03.au.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id nB17aaOT030513 for ; Tue, 1 Dec 2009 18:36:36 +1100 Date: Tue, 1 Dec 2009 13:06:09 +0530 From: Balbir Singh Subject: Re: memcg: slab control Message-ID: <20091201073609.GQ2970@balbir.in.ibm.com> Reply-To: balbir@linux.vnet.ibm.com References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> <4B0E50B1.20602@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <4B0E50B1.20602@parallels.com> Sender: owner-linux-mm@kvack.org To: Pavel Emelyanov Cc: KAMEZAWA Hiroyuki , David Rientjes , Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: * Pavel Emelyanov [2009-11-26 12:56:01]: > KAMEZAWA Hiroyuki wrote: > > On Thu, 26 Nov 2009 12:10:52 +0300 > > Pavel Emelyanov wrote: > > > >>>> Anyway, I agree that we need another > >>>> slabcg, Pavel did some work in that area and posted patches, but they > >>>> were mostly based and limited to SLUB (IIRC). > >> I'm ready to resurrect the patches and port them for slab. > >> But before doing it we should answer one question. > >> > >> Consider we have two kmalloc-s in a kernel code - one is > >> user-space triggerable and the other one is not. From my > >> POV we should account for the former one, but should not > >> for the latter. > >> > >> If so - how should we patch the kernel to achieve that goal? > >> > >>> My point is that most of the kernel codes cannot work well when kmalloc(small area) > >>> returns NULL. > >> :) That's not so actually. As our experience shows kernel lives fine > >> when kmalloc returns NULL (this doesn't include drivers though). > >> > > One issue it comes to my mind is that file system can return -EIO because > > kmalloc() returns NULL. the kernel may work fine but terrible to users ;) > > That relates to my question above - we should not account for all > kmalloc-s. In particular - we don't account for bio-s and buffer-head-s > since their amount is not under direct user control. Yes, you can > request for heavy IO, but first, kernel sends your task to sleep under > certain conditions and second, bio-s are destroyed as soon as they are > finished and thus bio-s and buffer-head-s cannot be used to eat all the > kernel memory. Just to understand the context better, is this really a problem. This can occur when we do really run out of memory. The idea of using slabcg + memcg together is good, except for our accounting process. I can repost percpu counter patches that adds fuzziness along with other tricks that Kame has to do batch accounting, that we will need to make sure we are able to do with slab allocations as well. -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with ESMTP id 2D33F600309 for ; Tue, 1 Dec 2009 02:40:20 -0500 (EST) Received: from d23relay04.au.ibm.com (d23relay04.au.ibm.com [202.81.31.246]) by e23smtp04.au.ibm.com (8.14.3/8.13.1) with ESMTP id nB17b3kk008508 for ; Tue, 1 Dec 2009 18:37:03 +1100 Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97]) by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id nB17agPk1110134 for ; Tue, 1 Dec 2009 18:36:42 +1100 Received: from d23av03.au.ibm.com (loopback [127.0.0.1]) by d23av03.au.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id nB17eGYI001213 for ; Tue, 1 Dec 2009 18:40:16 +1100 Date: Tue, 1 Dec 2009 13:10:10 +0530 From: Balbir Singh Subject: Re: memcg: slab control Message-ID: <20091201074010.GR2970@balbir.in.ibm.com> Reply-To: balbir@linux.vnet.ibm.com References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> <4B0E50B1.20602@parallels.com> <4B0E7530.8050304@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org To: Suleiman Souhlal Cc: Pavel Emelyanov , KAMEZAWA Hiroyuki , David Rientjes , Ying Han , linux-mm@kvack.org List-ID: * Suleiman Souhlal [2009-11-26 04:52:00]: > On 11/26/09, Pavel Emelyanov wrote: > > > Aren't there patches to make the kernel track which cgroup caused > > > which disk I/O? If so, it should be possible to charge the bios to the > > > right cgroup. > > > > > > Maybe one way to decide which kernel allocations should be accounted > > > would be to look at the calling context: If the allocation is done in > > > user context (syscall), then it could be counted towards that user, > > > while if the allocation is done in interrupt or kthread context, it > > > shouldn't be accounted. > > > > > > Of course, this wouldn't be perfect, but it might be a good enough > > > approximation. > > > > > > I disagree. Bio-s are allocated in user context for all typical reads > > (unless we requested aio) and are allocated either in pdflush context > > or (!) in arbitrary task context for writes (e.g. via try_to_free_pages) > > and thus such bio/buffer_head accounting will be completely random. > > Yes, that's why I pointed out that you can account to the right cgroup > if you track who caused the I/O (which, I imagine, should already be > done by the block i/o bandwidth controller, or similar). > We can do so, we do that for task I/O accounting today and it works quite well for the applications I've applied them to. > For most other allocations, on the other hand, accounting to the > current context should be fine. > Absolutely! Except when the context is a kernel thread like pdflush/ksm, etc. > > One of the way to achieve the goal I can propose the following (it's > > not perfect, but just smth to start discussion from). > > > > We implement support for accounting based on a bit on a kmem_cache > > structure and mark all kmalloc caches as not-accountable. Then we grep > > the kernel to find all kmalloc-s and think - if a kmalloc is to be > > accounted we turn this into kmem_cache_alloc() with dedicated > > kmem_cache and mark it as accountable. > > That sounds like a lot of work. :-) > Hmm.. yes, it does, but I wonder if there are better alternatives. -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with ESMTP id 7B3DC600786 for ; Tue, 1 Dec 2009 05:31:27 -0500 (EST) Message-ID: <4B14F06D.1000901@parallels.com> Date: Tue, 01 Dec 2009 13:31:09 +0300 From: Pavel Emelyanov MIME-Version: 1.0 Subject: Re: memcg: slab control References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> <4B0E50B1.20602@parallels.com> <4B0E7530.8050304@parallels.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: David Rientjes Cc: Suleiman Souhlal , KAMEZAWA Hiroyuki , balbir@linux.vnet.ibm.com, Ying Han , linux-mm@kvack.org List-ID: David Rientjes wrote: > On Thu, 26 Nov 2009, Pavel Emelyanov wrote: > >> I disagree. Bio-s are allocated in user context for all typical reads >> (unless we requested aio) and are allocated either in pdflush context >> or (!) in arbitrary task context for writes (e.g. via try_to_free_pages) >> and thus such bio/buffer_head accounting will be completely random. >> > > pdflush has been removed, they should all be allocated in process context. OK, but the try_to_free_pages() concern still stands. >> We implement support for accounting based on a bit on a kmem_cache >> structure and mark all kmalloc caches as not-accountable. Then we grep >> the kernel to find all kmalloc-s and think - if a kmalloc is to be >> accounted we turn this into kmem_cache_alloc() with dedicated >> kmem_cache and mark it as accountable. >> > > That doesn't work with slab cache merging done in slub. Surely we'll have to change it a bit. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id 08892600786 for ; Tue, 1 Dec 2009 05:39:51 -0500 (EST) Message-ID: <4B14F263.50109@parallels.com> Date: Tue, 01 Dec 2009 13:39:31 +0300 From: Pavel Emelyanov MIME-Version: 1.0 Subject: Re: memcg: slab control References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: David Rientjes Cc: KAMEZAWA Hiroyuki , balbir@linux.vnet.ibm.com, Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: David Rientjes wrote: > On Thu, 26 Nov 2009, Pavel Emelyanov wrote: > >> I'm ready to resurrect the patches and port them for slab. >> But before doing it we should answer one question. >> > > Do you have a pointer to your latest implementation that you proposed for > slab? I believe this is the one: https://lists.linux-foundation.org/pipermail/containers/2007-September/007481.html >> Consider we have two kmalloc-s in a kernel code - one is >> user-space triggerable and the other one is not. From my >> POV we should account for the former one, but should not >> for the latter. >> >> If so - how should we patch the kernel to achieve that goal? >> > > I think all slab allocations should be accounted for based on current's > memcg other than those done in hardirq context, annotating slab > allocations doesn't seem scalable. Whether the accounting is done on a > task level or cgroup level isn't really a problem for us since we don't > move tasks amongst cgroups. I imagine there've been previous restrictions > on that put into place with the memcg so this doesn't seem like a > slabcg-specific requirement anyway. > > The problem on the freeing side is mapping the object back to the cgroup > that allocated it. We'd also need to map the object to the context in > which it was allocated to determine whether we should decrement the > counter or not. How do you propose doing that without a considerable > overhead in memory consumption, fastpath branch, and cache cold slabcg > lookups? That's the biggest problem. Generally speaking - no other way rather than store additional pointer. In some situations you can rely on the cgroup of a task in which context an object is being freed, but in that case once you move a task to another cgroup your accounting is screwed. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail202.messagelabs.com (mail202.messagelabs.com [216.82.254.227]) by kanga.kvack.org (Postfix) with ESMTP id 8CCD7600786 for ; Tue, 1 Dec 2009 05:40:45 -0500 (EST) Message-ID: <4B14F29E.3090400@parallels.com> Date: Tue, 01 Dec 2009 13:40:30 +0300 From: Pavel Emelyanov MIME-Version: 1.0 Subject: Re: memcg: slab control References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> <4B0E50B1.20602@parallels.com> <20091201073609.GQ2970@balbir.in.ibm.com> In-Reply-To: <20091201073609.GQ2970@balbir.in.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: balbir@linux.vnet.ibm.com Cc: KAMEZAWA Hiroyuki , David Rientjes , Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: > Just to understand the context better, is this really a problem. This > can occur when we do really run out of memory. The idea of using > slabcg + memcg together is good, except for our accounting process. I > can repost percpu counter patches that adds fuzziness along with other > tricks that Kame has to do batch accounting, that we will need to > make sure we are able to do with slab allocations as well. > I'm not sure I understand you concern. Can you elaborate, please? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id A5935600309 for ; Tue, 1 Dec 2009 10:14:41 -0500 (EST) Received: from d28relay03.in.ibm.com (d28relay03.in.ibm.com [9.184.220.60]) by e28smtp07.in.ibm.com (8.14.3/8.13.1) with ESMTP id nB1FEZ0q015753 for ; Tue, 1 Dec 2009 20:44:35 +0530 Received: from d28av01.in.ibm.com (d28av01.in.ibm.com [9.184.220.63]) by d28relay03.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id nB1FEZwu3047588 for ; Tue, 1 Dec 2009 20:44:35 +0530 Received: from d28av01.in.ibm.com (loopback [127.0.0.1]) by d28av01.in.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id nB1FEYjQ019221 for ; Tue, 1 Dec 2009 20:44:35 +0530 Date: Tue, 1 Dec 2009 20:44:31 +0530 From: Balbir Singh Subject: Re: memcg: slab control Message-ID: <20091201151431.GV2970@balbir.in.ibm.com> Reply-To: balbir@linux.vnet.ibm.com References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> <4B0E50B1.20602@parallels.com> <20091201073609.GQ2970@balbir.in.ibm.com> <4B14F29E.3090400@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <4B14F29E.3090400@parallels.com> Sender: owner-linux-mm@kvack.org To: Pavel Emelyanov Cc: KAMEZAWA Hiroyuki , David Rientjes , Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: * Pavel Emelyanov [2009-12-01 13:40:30]: > > Just to understand the context better, is this really a problem. This > > can occur when we do really run out of memory. The idea of using > > slabcg + memcg together is good, except for our accounting process. I > > can repost percpu counter patches that adds fuzziness along with other > > tricks that Kame has to do batch accounting, that we will need to > > make sure we are able to do with slab allocations as well. > > > > I'm not sure I understand you concern. Can you elaborate, please? > The concern was mostly accounting when memcg + slabcg are integrated into the same framework. res_counters will need new scalability primitives. -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id 9D1CC60021B for ; Tue, 1 Dec 2009 17:29:22 -0500 (EST) Received: from spaceape8.eur.corp.google.com (spaceape8.eur.corp.google.com [172.28.16.142]) by smtp-out.google.com with ESMTP id nB1MTIcb031050 for ; Tue, 1 Dec 2009 14:29:19 -0800 Received: from pxi11 (pxi11.prod.google.com [10.243.27.11]) by spaceape8.eur.corp.google.com with ESMTP id nB1MTFJm011813 for ; Tue, 1 Dec 2009 14:29:15 -0800 Received: by pxi11 with SMTP id 11so4184845pxi.9 for ; Tue, 01 Dec 2009 14:29:14 -0800 (PST) Date: Tue, 1 Dec 2009 14:29:11 -0800 (PST) From: David Rientjes Subject: Re: memcg: slab control In-Reply-To: <4B14F06D.1000901@parallels.com> Message-ID: References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> <4B0E50B1.20602@parallels.com> <4B0E7530.8050304@parallels.com> <4B14F06D.1000901@parallels.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org To: Pavel Emelyanov Cc: Suleiman Souhlal , KAMEZAWA Hiroyuki , balbir@linux.vnet.ibm.com, Ying Han , linux-mm@kvack.org List-ID: On Tue, 1 Dec 2009, Pavel Emelyanov wrote: > > pdflush has been removed, they should all be allocated in process context. > > OK, but the try_to_free_pages() concern still stands. > Yes, we lack mappings between the per-bdi flusher kthreads back to the user cgroup that initiated the writeback. Since all of these kthreads are descendents of kthreadd, they'll be accounted for within that thread's cgroup unless we pass along the current context. > >> We implement support for accounting based on a bit on a kmem_cache > >> structure and mark all kmalloc caches as not-accountable. Then we grep > >> the kernel to find all kmalloc-s and think - if a kmalloc is to be > >> accounted we turn this into kmem_cache_alloc() with dedicated > >> kmem_cache and mark it as accountable. > >> > > > > That doesn't work with slab cache merging done in slub. > > Surely we'll have to change it a bit. > We can't add a cache flag passed to kmem_cache_create() to identify caches that should be accounted versus those that shouldn't, there are allocs done in both process context and irq context from the same caches and we don't want to inhibit accounting with an additional flag passed to kmem_cache_alloc() if that cache has accounting enabled. A vast majority of slab caches get merged into each other based on object size and alignment with slub; we could prevent that merging by checking the accounting bit for a cache, but that would come at a performance cost (nullifying many hot object allocs), increased fragmentation, and increased memory consumption. In other words, we don't want to make it an attribute of the cache itself, we need to make it an attribute of the context in which the allocation is done; there're many more cases where we'll want to have accounting enabled by default, so we'll need to add a bit passed on alloc to inhibit accounting for those objects. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with ESMTP id C7D14600727 for ; Wed, 2 Dec 2009 05:14:34 -0500 (EST) Message-ID: <4B163DF7.60305@parallels.com> Date: Wed, 02 Dec 2009 13:14:15 +0300 From: Pavel Emelyanov MIME-Version: 1.0 Subject: Re: memcg: slab control References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> <4B0E50B1.20602@parallels.com> <20091201073609.GQ2970@balbir.in.ibm.com> <4B14F29E.3090400@parallels.com> <20091201151431.GV2970@balbir.in.ibm.com> In-Reply-To: <20091201151431.GV2970@balbir.in.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: balbir@linux.vnet.ibm.com Cc: KAMEZAWA Hiroyuki , David Rientjes , Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: Balbir Singh wrote: > * Pavel Emelyanov [2009-12-01 13:40:30]: > >>> Just to understand the context better, is this really a problem. This >>> can occur when we do really run out of memory. The idea of using >>> slabcg + memcg together is good, except for our accounting process. I >>> can repost percpu counter patches that adds fuzziness along with other >>> tricks that Kame has to do batch accounting, that we will need to >>> make sure we are able to do with slab allocations as well. >>> >> I'm not sure I understand you concern. Can you elaborate, please? >> > > The concern was mostly accounting when memcg + slabcg are integrated > into the same framework. res_counters will need new scalability > primitives. > I see. I think the best we can do here is start with a separate controller. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id 01E6C600762 for ; Wed, 2 Dec 2009 05:19:43 -0500 (EST) Received: from d28relay03.in.ibm.com (d28relay03.in.ibm.com [9.184.220.60]) by e28smtp01.in.ibm.com (8.14.3/8.13.1) with ESMTP id nB2AJcTt007851 for ; Wed, 2 Dec 2009 15:49:38 +0530 Received: from d28av02.in.ibm.com (d28av02.in.ibm.com [9.184.220.64]) by d28relay03.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id nB2AJcJc3100722 for ; Wed, 2 Dec 2009 15:49:38 +0530 Received: from d28av02.in.ibm.com (loopback [127.0.0.1]) by d28av02.in.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id nB2AJbJE007318 for ; Wed, 2 Dec 2009 21:19:38 +1100 Date: Wed, 2 Dec 2009 15:49:15 +0530 From: Balbir Singh Subject: Re: memcg: slab control Message-ID: <20091202101915.GB3545@balbir.in.ibm.com> Reply-To: balbir@linux.vnet.ibm.com References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> <4B0E50B1.20602@parallels.com> <20091201073609.GQ2970@balbir.in.ibm.com> <4B14F29E.3090400@parallels.com> <20091201151431.GV2970@balbir.in.ibm.com> <4B163DF7.60305@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <4B163DF7.60305@parallels.com> Sender: owner-linux-mm@kvack.org To: Pavel Emelyanov Cc: KAMEZAWA Hiroyuki , David Rientjes , Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: * Pavel Emelyanov [2009-12-02 13:14:15]: > Balbir Singh wrote: > > * Pavel Emelyanov [2009-12-01 13:40:30]: > > > >>> Just to understand the context better, is this really a problem. This > >>> can occur when we do really run out of memory. The idea of using > >>> slabcg + memcg together is good, except for our accounting process. I > >>> can repost percpu counter patches that adds fuzziness along with other > >>> tricks that Kame has to do batch accounting, that we will need to > >>> make sure we are able to do with slab allocations as well. > >>> > >> I'm not sure I understand you concern. Can you elaborate, please? > >> > > > > The concern was mostly accounting when memcg + slabcg are integrated > > into the same framework. res_counters will need new scalability > > primitives. > > > > I see. I think the best we can do here is start with a separate controller. > I would think so as well, but setting up independent limits might be a challenge, how does the user really estimate the amount of kernel memory needed? This is the same problem that David posted sometime back. -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with ESMTP id 5D0D7600762 for ; Wed, 2 Dec 2009 05:52:11 -0500 (EST) Message-ID: <4B1646C9.1040200@parallels.com> Date: Wed, 02 Dec 2009 13:51:53 +0300 From: Pavel Emelyanov MIME-Version: 1.0 Subject: Re: memcg: slab control References: <20091126101414.829936d8.kamezawa.hiroyu@jp.fujitsu.com> <20091126085031.GG2970@balbir.in.ibm.com> <20091126175606.f7df2f80.kamezawa.hiroyu@jp.fujitsu.com> <4B0E461C.50606@parallels.com> <20091126183335.7a18cb09.kamezawa.hiroyu@jp.fujitsu.com> <4B0E50B1.20602@parallels.com> <20091201073609.GQ2970@balbir.in.ibm.com> <4B14F29E.3090400@parallels.com> <20091201151431.GV2970@balbir.in.ibm.com> <4B163DF7.60305@parallels.com> <20091202101915.GB3545@balbir.in.ibm.com> In-Reply-To: <20091202101915.GB3545@balbir.in.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: balbir@linux.vnet.ibm.com Cc: KAMEZAWA Hiroyuki , David Rientjes , Suleiman Souhlal , Ying Han , linux-mm@kvack.org List-ID: Balbir Singh wrote: > * Pavel Emelyanov [2009-12-02 13:14:15]: > >> Balbir Singh wrote: >>> * Pavel Emelyanov [2009-12-01 13:40:30]: >>> >>>>> Just to understand the context better, is this really a problem. This >>>>> can occur when we do really run out of memory. The idea of using >>>>> slabcg + memcg together is good, except for our accounting process. I >>>>> can repost percpu counter patches that adds fuzziness along with other >>>>> tricks that Kame has to do batch accounting, that we will need to >>>>> make sure we are able to do with slab allocations as well. >>>>> >>>> I'm not sure I understand you concern. Can you elaborate, please? >>>> >>> The concern was mostly accounting when memcg + slabcg are integrated >>> into the same framework. res_counters will need new scalability >>> primitives. >>> >> I see. I think the best we can do here is start with a separate controller. >> > > I would think so as well, but setting up independent limits might be a > challenge, how does the user really estimate the amount of kernel > memory needed? This is the same problem that David posted sometime > back. I agree with you, but note, that the memcg consists of several part and the question "where to account bytes to" is quite independent from "what allocations to account" and "where to get the memcg context from on kfree" ;) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org