* Re: Kernel scanning/freeing to relieve cgroup memory pressure [not found] ` <533C0BB4.4070009-6n0RBC2Q8y8S+FvcfC7Uqw@public.gmane.org> @ 2014-04-02 18:00 ` Tejun Heo 2014-04-14 8:11 ` Glyn Normington 0 siblings, 1 reply; 6+ messages in thread From: Tejun Heo @ 2014-04-02 18:00 UTC (permalink / raw) To: Glyn Normington Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Johannes Weiner, Michal Hocko, cgroups-u79uwXL29TY76Z2rM5mHXA (cc'ing memcg maintainers and cgroup ML) On Wed, Apr 02, 2014 at 02:08:04PM +0100, Glyn Normington wrote: > Currently, a memory cgroup can hit its oom limit when pages could, in > principle, be reclaimed by the kernel except that the kernel does not > respond directly to cgroup-local memory pressure. So, ummm, it does. > A use case where this is important is running a moderately large Java > application in a memory cgroup in a PaaS environment where cost to the > user depends on the memory limit ([1]). Users need to tune the memory > limit to reduce their costs. During application initialisation large > numbers of JAR files are opened (read-only) and read while loading the > application code and its dependencies. This is reflected in a peak of > file cache usage which can push the memory cgroup memory usage > significantly higher than the value actually needed to run the application. > > Possible approaches include (1) automatic response to cgroup-local > memory pressure in the kernel, and (2) a kernel API for reclaiming > memory from a cgroup which could be driven under oom notification (with > the oom killer disabled for the cgroup - it would be enabled if the > cgroup was still oom after calling the kernel to reclaim memory). > > Clearly (1) is the preferred approach. The closest facility in the > kernel to (2) is to ask the kernel to free pagecache using `echo 1 > > /proc/sys/vms/drop_caches`, but that is too wide-ranging, especially in > a PaaS environment hosting multiple applications. A similar facility > could be provided for a cgroup via a cgroup pseudo-file > `memory.drop_caches`. > > Other approaches include a mempressure cgroup ([2]) which would not be > suitable for PaaS applications. See [3] for Andrew Morton's response. A > related workaround ([4]) was included in the 3.6 kernel. > > Related discussions: > [1] https://groups.google.com/a/cloudfoundry.org/d/topic/vcap-dev/6M8BDV_tq7w/discussion > [2]https://lwn.net/Articles/531077/ <https://lwn.net/Articles/531077/> > [3]https://lwn.net/Articles/531138/ <https://lwn.net/Articles/531138/> > [4]https://lkml.org/lkml/2013/6/6/462 <https://lkml.org/lkml/2013/6/6/462>& > https://github.com/torvalds/linux/commit/e62e384e > <https://github.com/torvalds/linux/commit/e62e384e>. -- tejun ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel scanning/freeing to relieve cgroup memory pressure 2014-04-02 18:00 ` Kernel scanning/freeing to relieve cgroup memory pressure Tejun Heo @ 2014-04-14 8:11 ` Glyn Normington [not found] ` <534B982D.8060106-6n0RBC2Q8y8S+FvcfC7Uqw@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Glyn Normington @ 2014-04-14 8:11 UTC (permalink / raw) To: Johannes Weiner, Michal Hocko; +Cc: Tejun Heo, linux-kernel, cgroups Johannes/Michal What are your thoughts on this matter? Do you see this as a valid requirement? Regards, Glyn On 02/04/2014 19:00, Tejun Heo wrote: > (cc'ing memcg maintainers and cgroup ML) > > On Wed, Apr 02, 2014 at 02:08:04PM +0100, Glyn Normington wrote: >> Currently, a memory cgroup can hit its oom limit when pages could, in >> principle, be reclaimed by the kernel except that the kernel does not >> respond directly to cgroup-local memory pressure. > So, ummm, it does. > >> A use case where this is important is running a moderately large Java >> application in a memory cgroup in a PaaS environment where cost to the >> user depends on the memory limit ([1]). Users need to tune the memory >> limit to reduce their costs. During application initialisation large >> numbers of JAR files are opened (read-only) and read while loading the >> application code and its dependencies. This is reflected in a peak of >> file cache usage which can push the memory cgroup memory usage >> significantly higher than the value actually needed to run the application. >> >> Possible approaches include (1) automatic response to cgroup-local >> memory pressure in the kernel, and (2) a kernel API for reclaiming >> memory from a cgroup which could be driven under oom notification (with >> the oom killer disabled for the cgroup - it would be enabled if the >> cgroup was still oom after calling the kernel to reclaim memory). >> >> Clearly (1) is the preferred approach. The closest facility in the >> kernel to (2) is to ask the kernel to free pagecache using `echo 1 > >> /proc/sys/vms/drop_caches`, but that is too wide-ranging, especially in >> a PaaS environment hosting multiple applications. A similar facility >> could be provided for a cgroup via a cgroup pseudo-file >> `memory.drop_caches`. >> >> Other approaches include a mempressure cgroup ([2]) which would not be >> suitable for PaaS applications. See [3] for Andrew Morton's response. A >> related workaround ([4]) was included in the 3.6 kernel. >> >> Related discussions: >> [1] https://groups.google.com/a/cloudfoundry.org/d/topic/vcap-dev/6M8BDV_tq7w/discussion >> [2]https://lwn.net/Articles/531077/ <https://lwn.net/Articles/531077/> >> [3]https://lwn.net/Articles/531138/ <https://lwn.net/Articles/531138/> >> [4]https://lkml.org/lkml/2013/6/6/462 <https://lkml.org/lkml/2013/6/6/462>& >> https://github.com/torvalds/linux/commit/e62e384e >> <https://github.com/torvalds/linux/commit/e62e384e>. ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <534B982D.8060106-6n0RBC2Q8y8S+FvcfC7Uqw@public.gmane.org>]
* Re: Kernel scanning/freeing to relieve cgroup memory pressure [not found] ` <534B982D.8060106-6n0RBC2Q8y8S+FvcfC7Uqw@public.gmane.org> @ 2014-04-14 20:50 ` Johannes Weiner [not found] ` <20140414205034.GA6443-druUgvl0LCNAfugRpC6u6w@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Johannes Weiner @ 2014-04-14 20:50 UTC (permalink / raw) To: Glyn Normington Cc: Michal Hocko, Tejun Heo, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA On Mon, Apr 14, 2014 at 09:11:25AM +0100, Glyn Normington wrote: > Johannes/Michal > > What are your thoughts on this matter? Do you see this as a valid > requirement? As Tejun said, memory cgroups *do* respond to internal pressure and enter targetted reclaim before invoking the OOM killer. So I'm not exactly sure what you are asking. > On 02/04/2014 19:00, Tejun Heo wrote: > >(cc'ing memcg maintainers and cgroup ML) > > > >On Wed, Apr 02, 2014 at 02:08:04PM +0100, Glyn Normington wrote: > >>Currently, a memory cgroup can hit its oom limit when pages could, in > >>principle, be reclaimed by the kernel except that the kernel does not > >>respond directly to cgroup-local memory pressure. > >So, ummm, it does. > > > >>A use case where this is important is running a moderately large Java > >>application in a memory cgroup in a PaaS environment where cost to the > >>user depends on the memory limit ([1]). Users need to tune the memory > >>limit to reduce their costs. During application initialisation large > >>numbers of JAR files are opened (read-only) and read while loading the > >>application code and its dependencies. This is reflected in a peak of > >>file cache usage which can push the memory cgroup memory usage > >>significantly higher than the value actually needed to run the application. > >> > >>Possible approaches include (1) automatic response to cgroup-local > >>memory pressure in the kernel, and (2) a kernel API for reclaiming > >>memory from a cgroup which could be driven under oom notification (with > >>the oom killer disabled for the cgroup - it would be enabled if the > >>cgroup was still oom after calling the kernel to reclaim memory). > >> > >>Clearly (1) is the preferred approach. The closest facility in the > >>kernel to (2) is to ask the kernel to free pagecache using `echo 1 > > >>/proc/sys/vms/drop_caches`, but that is too wide-ranging, especially in > >>a PaaS environment hosting multiple applications. A similar facility > >>could be provided for a cgroup via a cgroup pseudo-file > >>`memory.drop_caches`. > >> > >>Other approaches include a mempressure cgroup ([2]) which would not be > >>suitable for PaaS applications. See [3] for Andrew Morton's response. A > >>related workaround ([4]) was included in the 3.6 kernel. > >> > >>Related discussions: > >>[1] https://groups.google.com/a/cloudfoundry.org/d/topic/vcap-dev/6M8BDV_tq7w/discussion > >>[2]https://lwn.net/Articles/531077/ <https://lwn.net/Articles/531077/> > >>[3]https://lwn.net/Articles/531138/ <https://lwn.net/Articles/531138/> > >>[4]https://lkml.org/lkml/2013/6/6/462 <https://lkml.org/lkml/2013/6/6/462>& > >>https://github.com/torvalds/linux/commit/e62e384e > >><https://github.com/torvalds/linux/commit/e62e384e>. > ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20140414205034.GA6443-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]
* Re: Kernel scanning/freeing to relieve cgroup memory pressure [not found] ` <20140414205034.GA6443-druUgvl0LCNAfugRpC6u6w@public.gmane.org> @ 2014-04-15 8:38 ` Glyn Normington 2014-04-16 9:11 ` Michal Hocko 0 siblings, 1 reply; 6+ messages in thread From: Glyn Normington @ 2014-04-15 8:38 UTC (permalink / raw) To: Johannes Weiner Cc: Michal Hocko, Tejun Heo, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA On 14/04/2014 21:50, Johannes Weiner wrote: > On Mon, Apr 14, 2014 at 09:11:25AM +0100, Glyn Normington wrote: >> Johannes/Michal >> >> What are your thoughts on this matter? Do you see this as a valid >> requirement? > As Tejun said, memory cgroups *do* respond to internal pressure and > enter targetted reclaim before invoking the OOM killer. So I'm not > exactly sure what you are asking. We are repeatedly seeing a situation where a memory cgroup with a given memory limit results in an application process in the cgroup being killed oom during application initialisation. One theory is that dirty file cache pages are not being written to disk to reduce memory consumption before the oom killer is invoked. Should memory cgroups' response to internal pressure include writing dirty file cache pages to disk? > >> On 02/04/2014 19:00, Tejun Heo wrote: >>> (cc'ing memcg maintainers and cgroup ML) >>> >>> On Wed, Apr 02, 2014 at 02:08:04PM +0100, Glyn Normington wrote: >>>> Currently, a memory cgroup can hit its oom limit when pages could, in >>>> principle, be reclaimed by the kernel except that the kernel does not >>>> respond directly to cgroup-local memory pressure. >>> So, ummm, it does. >>> >>>> A use case where this is important is running a moderately large Java >>>> application in a memory cgroup in a PaaS environment where cost to the >>>> user depends on the memory limit ([1]). Users need to tune the memory >>>> limit to reduce their costs. During application initialisation large >>>> numbers of JAR files are opened (read-only) and read while loading the >>>> application code and its dependencies. This is reflected in a peak of >>>> file cache usage which can push the memory cgroup memory usage >>>> significantly higher than the value actually needed to run the application. >>>> >>>> Possible approaches include (1) automatic response to cgroup-local >>>> memory pressure in the kernel, and (2) a kernel API for reclaiming >>>> memory from a cgroup which could be driven under oom notification (with >>>> the oom killer disabled for the cgroup - it would be enabled if the >>>> cgroup was still oom after calling the kernel to reclaim memory). >>>> >>>> Clearly (1) is the preferred approach. The closest facility in the >>>> kernel to (2) is to ask the kernel to free pagecache using `echo 1 > >>>> /proc/sys/vms/drop_caches`, but that is too wide-ranging, especially in >>>> a PaaS environment hosting multiple applications. A similar facility >>>> could be provided for a cgroup via a cgroup pseudo-file >>>> `memory.drop_caches`. >>>> >>>> Other approaches include a mempressure cgroup ([2]) which would not be >>>> suitable for PaaS applications. See [3] for Andrew Morton's response. A >>>> related workaround ([4]) was included in the 3.6 kernel. >>>> >>>> Related discussions: >>>> [1] https://groups.google.com/a/cloudfoundry.org/d/topic/vcap-dev/6M8BDV_tq7w/discussion >>>> [2]https://lwn.net/Articles/531077/ <https://lwn.net/Articles/531077/> >>>> [3]https://lwn.net/Articles/531138/ <https://lwn.net/Articles/531138/> >>>> [4]https://lkml.org/lkml/2013/6/6/462 <https://lkml.org/lkml/2013/6/6/462>& >>>> https://github.com/torvalds/linux/commit/e62e384e >>>> <https://github.com/torvalds/linux/commit/e62e384e>. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel scanning/freeing to relieve cgroup memory pressure 2014-04-15 8:38 ` Glyn Normington @ 2014-04-16 9:11 ` Michal Hocko 2014-04-17 8:00 ` Glyn Normington 0 siblings, 1 reply; 6+ messages in thread From: Michal Hocko @ 2014-04-16 9:11 UTC (permalink / raw) To: Glyn Normington; +Cc: Johannes Weiner, Tejun Heo, linux-kernel, cgroups On Tue 15-04-14 09:38:10, Glyn Normington wrote: > On 14/04/2014 21:50, Johannes Weiner wrote: > >On Mon, Apr 14, 2014 at 09:11:25AM +0100, Glyn Normington wrote: > >>Johannes/Michal > >> > >>What are your thoughts on this matter? Do you see this as a valid > >>requirement? > >As Tejun said, memory cgroups *do* respond to internal pressure and > >enter targetted reclaim before invoking the OOM killer. So I'm not > >exactly sure what you are asking. > We are repeatedly seeing a situation where a memory cgroup with a given > memory limit results in an application process in the cgroup being killed > oom during application initialisation. One theory is that dirty file cache > pages are not being written to disk to reduce memory consumption before the > oom killer is invoked. Should memory cgroups' response to internal pressure > include writing dirty file cache pages to disk? This depends on the kernel version. OOM with a lot of dirty pages on memcg LRUs was a big problem. Now we are waiting for pages under writeback during reclaim which should prevent from such spurious OOMs. Which kernel versions are we talking about? The fix (or better said workaround) I am thinking about is e62e384e9da8 memcg: prevent OOM with too many dirty pages. I am still not sure I understand your setup and the problem. Could you describe your setup (what runs where under what limits), please? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel scanning/freeing to relieve cgroup memory pressure 2014-04-16 9:11 ` Michal Hocko @ 2014-04-17 8:00 ` Glyn Normington 0 siblings, 0 replies; 6+ messages in thread From: Glyn Normington @ 2014-04-17 8:00 UTC (permalink / raw) To: Michal Hocko; +Cc: Johannes Weiner, Tejun Heo, linux-kernel, cgroups On 16/04/2014 10:11, Michal Hocko wrote: > On Tue 15-04-14 09:38:10, Glyn Normington wrote: >> On 14/04/2014 21:50, Johannes Weiner wrote: >>> On Mon, Apr 14, 2014 at 09:11:25AM +0100, Glyn Normington wrote: >>>> Johannes/Michal >>>> >>>> What are your thoughts on this matter? Do you see this as a valid >>>> requirement? >>> As Tejun said, memory cgroups *do* respond to internal pressure and >>> enter targetted reclaim before invoking the OOM killer. So I'm not >>> exactly sure what you are asking. >> We are repeatedly seeing a situation where a memory cgroup with a given >> memory limit results in an application process in the cgroup being killed >> oom during application initialisation. One theory is that dirty file cache >> pages are not being written to disk to reduce memory consumption before the >> oom killer is invoked. Should memory cgroups' response to internal pressure >> include writing dirty file cache pages to disk? > This depends on the kernel version. OOM with a lot of dirty pages on > memcg LRUs was a big problem. Now we are waiting for pages under > writeback during reclaim which should prevent from such spurious OOMs. > Which kernel versions are we talking about? The fix (or better said > workaround) I am thinking about is e62e384e9da8 memcg: prevent OOM with > too many dirty pages. Thanks Michal - very helpful! The kernel version, as reported by uname -r, is 3.2.0-23-generic. According to https://github.com/torvalds/linux/commit/e62e384e9da8, the above workaround first went into kernel version 3.6, so we should plan to upgrade. > > I am still not sure I understand your setup and the problem. Could you > describe your setup (what runs where under what limits), please? I won't waste your time with the details of our setup unless the problem recurs with e62e384e9da8 in place. Regards, Glyn ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-04-17 8:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <533C0BB4.4070009@gopivotal.com>
[not found] ` <533C0BB4.4070009-6n0RBC2Q8y8S+FvcfC7Uqw@public.gmane.org>
2014-04-02 18:00 ` Kernel scanning/freeing to relieve cgroup memory pressure Tejun Heo
2014-04-14 8:11 ` Glyn Normington
[not found] ` <534B982D.8060106-6n0RBC2Q8y8S+FvcfC7Uqw@public.gmane.org>
2014-04-14 20:50 ` Johannes Weiner
[not found] ` <20140414205034.GA6443-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2014-04-15 8:38 ` Glyn Normington
2014-04-16 9:11 ` Michal Hocko
2014-04-17 8:00 ` Glyn Normington
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).