From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: memcg cgroup controller & sbrk interaction Date: Fri, 8 Jun 2012 16:51:47 +0200 Message-ID: <20120608145147.GA15332@tiehlicka.suse.cz> References: <1339118347.78794.YahooMailNeo@web112018.mail.gq1.yahoo.com> Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: Content-Disposition: inline In-Reply-To: <1339118347.78794.YahooMailNeo@web112018.mail.gq1.yahoo.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="iso-8859-1" To: Ron Chen Cc: Linux Mailing List , linux-mm@kvack.org, cgroups mailinglist On Thu 07-06-12 18:19:07, Ron Chen wrote: [...] > However, not only us, but others have found that the memcg controller > does not cause sbrk(2) or mmap(2) to return error when the cgroup is > under high memory pressure. Yes, because memory controller tracks the allocated memory (with page granularity) rather than address space. So the memory is accounted when it is faulted in. > Further, when the amount of free memory is really low, the Linux > Kernel OOM killer picks something and kills it. Yes, this is the result of the design when the memory is tracked during page faults. > http://www.spinics.net/lists/cgroups/msg02622.html >=20 >=20 > We also would like to see if it is technically possible for the > Virtual Memory Manager to interact with the memory=A0controller > properly and give us the=A0semantics of setrlimit(2). What prevents you from using setrlimit from inside the group? --=20 Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 =20 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx139.postini.com [74.125.245.139]) by kanga.kvack.org (Postfix) with SMTP id 8A6306B0070 for ; Fri, 8 Jun 2012 10:51:50 -0400 (EDT) Date: Fri, 8 Jun 2012 16:51:47 +0200 From: Michal Hocko Subject: Re: memcg cgroup controller & sbrk interaction Message-ID: <20120608145147.GA15332@tiehlicka.suse.cz> References: <1339118347.78794.YahooMailNeo@web112018.mail.gq1.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1339118347.78794.YahooMailNeo@web112018.mail.gq1.yahoo.com> Sender: owner-linux-mm@kvack.org List-ID: To: Ron Chen Cc: Linux Mailing List , linux-mm@kvack.org, cgroups mailinglist On Thu 07-06-12 18:19:07, Ron Chen wrote: [...] > However, not only us, but others have found that the memcg controller > does not cause sbrk(2) or mmap(2) to return error when the cgroup is > under high memory pressure. Yes, because memory controller tracks the allocated memory (with page granularity) rather than address space. So the memory is accounted when it is faulted in. > Further, when the amount of free memory is really low, the Linux > Kernel OOM killer picks something and kills it. Yes, this is the result of the design when the memory is tracked during page faults. > http://www.spinics.net/lists/cgroups/msg02622.html > > > We also would like to see if it is technically possible for the > Virtual Memory Manager to interact with the memory controller > properly and give us the semantics of setrlimit(2). What prevents you from using setrlimit from inside the group? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759928Ab2FHBVL (ORCPT ); Thu, 7 Jun 2012 21:21:11 -0400 Received: from nm24-vm0.bullet.mail.sp2.yahoo.com ([98.139.91.226]:36544 "HELO nm24-vm0.bullet.mail.sp2.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1759763Ab2FHBVJ convert rfc822-to-8bit (ORCPT ); Thu, 7 Jun 2012 21:21:09 -0400 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 19114.58981.bm@omp1052.mail.sp2.yahoo.com DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=0AhYeBRNaU68Ty3AqnLij9EX+utbNQKwDFeuKRCnnePYVaDsryfAX9LxZBOfgBeOPEQmn5apyY+AR4BmLXNXOGopux2pDOGWZz9UjSZkjcsKbZ0gvGtlFBBBuVKV7Erz2x1eziR6pxzT5Ms167F3+2X9DcSjVSUAX4hFwB4WZfo=; X-YMail-OSG: c4rEE4YVM1m2a.42bTsuWxMjFHhqDKwpLXnL7ujmU1yRZcX xoLlFjcf3nhLn27ru7SS8T4UZtyj11dG9gPeG1BbNhn0fOH5ZDlUgSN8VrPP if3c2O2Cnju5_xChdypGTcDWR5LaKWUAcwYEaNrCasjHIIvBb7jz9C6P8eVJ mZYEjS7NVT5dobeQNoA6gPU.TcQGFEijfccTcJnIhMIe8ppFy50T4CGNtmKA YcRUSDDkC61ayuSs_rehPsii1pRmM4GMOKoNcJycwW_2xpkV4dtDEyzxt.zn 7Rdu3pjwiLu43l8W7vXhOHIKaMRZj0EhvNinWzr9ew2AdDqMIhQ2Ny_A17AF oMFyOkc0RHoNv4jjUNuTsshACGZrKm8Pu8ubkC7z7Oy4fYp6RiSBPxpHW9P2 8XXOuJdlJVrOlKQ3gyFpPy_Oe3zJZHW_xKf0VkmWQogQ._ZpqTods0PeZPPK UhBEJluj5FxaoVqWpOoMNPR4T8GcvHGjHuNvsSjIIHubZ4_zA40UGmb1s6jq XB8TyMWc8XYWQahsHHEWs8vJsgO8M395v.yc9hKLhJ7wfum0WqeLYoKdgytR O2FFlYC0yOx8RS_CrzvO3haEa6BEtjKpXwX_EAxrKRbZiLAiECsfagTQjENA 1Hp58rExDURR.MYvf69b80UDvcNC3Of214YLWvmBiQLz2OzHsQK.m.bGJ8hQ 603EvIAVR1iiyhm3pOIJ_c3WDLN5renTafxxWujXX7bfmUJmwphsSKZxdKFG o X-Mailer: YahooMailWebService/0.8.118.349524 Message-ID: <1339118347.78794.YahooMailNeo@web112018.mail.gq1.yahoo.com> Date: Thu, 7 Jun 2012 18:19:07 -0700 (PDT) From: Ron Chen Reply-To: Ron Chen Subject: memcg cgroup controller & sbrk interaction To: Linux Mailing List MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We are from the Open Grid Scheduler, which is the official Open Source Grid Engine. Open Grid Scheduler/ Grid Engine ( http://gridscheduler.sourceforge.net ) is used by many compute farms & HPC sites for job scheduling. In the next release, we are using cgroups to define a Job Container interface for batch jobs: http://blogs.scalablelogic.com/2012/05/grid-engine-cgroups-integration.html However, not only us, but others have found that the memcg controller does not cause sbrk(2) or mmap(2) to return error when the cgroup is under high memory pressure. Further, when the amount of free memory is really low, the Linux Kernel OOM killer picks something and kills it. http://www.spinics.net/lists/cgroups/msg02622.html We also would like to see if it is technically possible for the Virtual Memory Manager to interact with the memory controller properly and give us the semantics of setrlimit(2). So basically if the current address space usage exceeds the "memory.memsw.limit_in_bytes" limit defined by the administrator, then the memory allocation system calls (example: mmap(2), sbrk(2), etc) will return error such that the OOM killer is not invoked. Thanks in advance.  -Ron From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760766Ab2FHOvy (ORCPT ); Fri, 8 Jun 2012 10:51:54 -0400 Received: from cantor2.suse.de ([195.135.220.15]:51171 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760583Ab2FHOvu (ORCPT ); Fri, 8 Jun 2012 10:51:50 -0400 Date: Fri, 8 Jun 2012 16:51:47 +0200 From: Michal Hocko To: Ron Chen Cc: Linux Mailing List , linux-mm@kvack.org, cgroups mailinglist Subject: Re: memcg cgroup controller & sbrk interaction Message-ID: <20120608145147.GA15332@tiehlicka.suse.cz> References: <1339118347.78794.YahooMailNeo@web112018.mail.gq1.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1339118347.78794.YahooMailNeo@web112018.mail.gq1.yahoo.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 07-06-12 18:19:07, Ron Chen wrote: [...] > However, not only us, but others have found that the memcg controller > does not cause sbrk(2) or mmap(2) to return error when the cgroup is > under high memory pressure. Yes, because memory controller tracks the allocated memory (with page granularity) rather than address space. So the memory is accounted when it is faulted in. > Further, when the amount of free memory is really low, the Linux > Kernel OOM killer picks something and kills it. Yes, this is the result of the design when the memory is tracked during page faults. > http://www.spinics.net/lists/cgroups/msg02622.html > > > We also would like to see if it is technically possible for the > Virtual Memory Manager to interact with the memory controller > properly and give us the semantics of setrlimit(2). What prevents you from using setrlimit from inside the group? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751801Ab2FLHKA (ORCPT ); Tue, 12 Jun 2012 03:10:00 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:57589 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751354Ab2FLHJ7 (ORCPT ); Tue, 12 Jun 2012 03:09:59 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4FD6EACE.9010109@jp.fujitsu.com> Date: Tue, 12 Jun 2012 16:07:58 +0900 From: Kamezawa Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Ron Chen CC: Linux Mailing List Subject: Re: memcg cgroup controller & sbrk interaction References: <1339118347.78794.YahooMailNeo@web112018.mail.gq1.yahoo.com> In-Reply-To: <1339118347.78794.YahooMailNeo@web112018.mail.gq1.yahoo.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/06/08 10:19), Ron Chen wrote: > We are from the Open Grid Scheduler, which is the official Open Source Grid Engine. Open Grid Scheduler/ > Grid Engine ( http://gridscheduler.sourceforge.net ) is used by many compute farms& HPC sites for job scheduling. > > In the next release, we are using cgroups to define a Job Container interface for batch jobs: > > http://blogs.scalablelogic.com/2012/05/grid-engine-cgroups-integration.html > > > However, not only us, but others have found that the memcg controller does not cause sbrk(2) or mmap(2) to > return error when the cgroup is under high memory pressure. Further, when the amount of free memory is > really low, the Linux Kernel OOM killer picks something and kills it. > > http://www.spinics.net/lists/cgroups/msg02622.html > > > We also would like to see if it is technically possible for the Virtual Memory Manager to interact with the > memory controller properly and give us the semantics of setrlimit(2). So basically if the current address > space usage exceeds the "memory.memsw.limit_in_bytes" limit defined by the administrator, then the > memory allocation system calls (example: mmap(2), sbrk(2), etc) will return error such that the OOM > killer is not invoked. > It's not implemented yet. And, it was proposed before and patches were posted but finally didn't be merged. IIRC, there were some implementation problem but the biggest reason of rejection was the author couldn't convince us there are real use case. If you have real use case and want a new feature on memory cgroup, please CC cgroups@vger.kernel.org, linux-mm@kvack.org Someone (including me) may be able to cook a patch for future linux kernel if you have real use cases. BTW, you can stop memory-cgroup-level oom-killer by memory.oom_control file. But you cannot stop system-level oom-killer, there are no knobs. Thanks, -Kame