Re: Kernel scanning/freeing to relieve cgroup memory pressure

cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Kernel scanning/freeing to relieve cgroup memory pressure
       [not found] ` <533C0BB4.4070009-6n0RBC2Q8y8S+FvcfC7Uqw@public.gmane.org>
@ 2014-04-02 18:00   ` Tejun Heo
  2014-04-14  8:11     ` Glyn Normington
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2014-04-02 18:00 UTC (permalink / raw)
  To: Glyn Normington
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Johannes Weiner,
	Michal Hocko, cgroups-u79uwXL29TY76Z2rM5mHXA

(cc'ing memcg maintainers and cgroup ML)

On Wed, Apr 02, 2014 at 02:08:04PM +0100, Glyn Normington wrote:
> Currently, a memory cgroup can hit its oom limit when pages could, in
> principle, be reclaimed by the kernel except that the kernel does not
> respond directly to cgroup-local memory pressure.

So, ummm, it does.

> A use case where this is important is running a moderately large Java
> application in a memory cgroup in a PaaS environment where cost to the
> user depends on the memory limit ([1]). Users need to tune the memory
> limit to reduce their costs. During application initialisation large
> numbers of JAR files are opened (read-only) and read while loading the
> application code and its dependencies. This is reflected in a peak of
> file cache usage which can push the memory cgroup memory usage
> significantly higher than the value actually needed to run the application.
> 
> Possible approaches include (1) automatic response to cgroup-local
> memory pressure in the kernel, and (2) a kernel API for reclaiming
> memory from a cgroup which could be driven under oom notification (with
> the oom killer disabled for the cgroup - it would be enabled if the
> cgroup was still oom after calling the kernel to reclaim memory).
> 
> Clearly (1) is the preferred approach. The closest facility in the
> kernel to (2) is to ask the kernel to free pagecache using `echo 1 >
> /proc/sys/vms/drop_caches`, but that is too wide-ranging, especially in
> a PaaS environment hosting multiple applications. A similar facility
> could be provided for a cgroup via a cgroup pseudo-file
> `memory.drop_caches`.
> 
> Other approaches include a mempressure cgroup ([2]) which would not be
> suitable for PaaS applications. See [3] for Andrew Morton's response. A
> related workaround ([4]) was included in the 3.6 kernel.
> 
> Related discussions:
> [1] https://groups.google.com/a/cloudfoundry.org/d/topic/vcap-dev/6M8BDV_tq7w/discussion
> [2]https://lwn.net/Articles/531077/ <https://lwn.net/Articles/531077/>
> [3]https://lwn.net/Articles/531138/ <https://lwn.net/Articles/531138/>
> [4]https://lkml.org/lkml/2013/6/6/462 <https://lkml.org/lkml/2013/6/6/462>&
> https://github.com/torvalds/linux/commit/e62e384e
> <https://github.com/torvalds/linux/commit/e62e384e>.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kernel scanning/freeing to relieve cgroup memory pressure
  2014-04-02 18:00   ` Kernel scanning/freeing to relieve cgroup memory pressure Tejun Heo
@ 2014-04-14  8:11     ` Glyn Normington
       [not found]       ` <534B982D.8060106-6n0RBC2Q8y8S+FvcfC7Uqw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Glyn Normington @ 2014-04-14  8:11 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko; +Cc: Tejun Heo, linux-kernel, cgroups

Johannes/Michal

What are your thoughts on this matter? Do you see this as a valid 
requirement?

Regards,
Glyn

On 02/04/2014 19:00, Tejun Heo wrote:
> (cc'ing memcg maintainers and cgroup ML)
>
> On Wed, Apr 02, 2014 at 02:08:04PM +0100, Glyn Normington wrote:
>> Currently, a memory cgroup can hit its oom limit when pages could, in
>> principle, be reclaimed by the kernel except that the kernel does not
>> respond directly to cgroup-local memory pressure.
> So, ummm, it does.
>
>> A use case where this is important is running a moderately large Java
>> application in a memory cgroup in a PaaS environment where cost to the
>> user depends on the memory limit ([1]). Users need to tune the memory
>> limit to reduce their costs. During application initialisation large
>> numbers of JAR files are opened (read-only) and read while loading the
>> application code and its dependencies. This is reflected in a peak of
>> file cache usage which can push the memory cgroup memory usage
>> significantly higher than the value actually needed to run the application.
>>
>> Possible approaches include (1) automatic response to cgroup-local
>> memory pressure in the kernel, and (2) a kernel API for reclaiming
>> memory from a cgroup which could be driven under oom notification (with
>> the oom killer disabled for the cgroup - it would be enabled if the
>> cgroup was still oom after calling the kernel to reclaim memory).
>>
>> Clearly (1) is the preferred approach. The closest facility in the
>> kernel to (2) is to ask the kernel to free pagecache using `echo 1 >
>> /proc/sys/vms/drop_caches`, but that is too wide-ranging, especially in
>> a PaaS environment hosting multiple applications. A similar facility
>> could be provided for a cgroup via a cgroup pseudo-file
>> `memory.drop_caches`.
>>
>> Other approaches include a mempressure cgroup ([2]) which would not be
>> suitable for PaaS applications. See [3] for Andrew Morton's response. A
>> related workaround ([4]) was included in the 3.6 kernel.
>>
>> Related discussions:
>> [1] https://groups.google.com/a/cloudfoundry.org/d/topic/vcap-dev/6M8BDV_tq7w/discussion
>> [2]https://lwn.net/Articles/531077/ <https://lwn.net/Articles/531077/>
>> [3]https://lwn.net/Articles/531138/ <https://lwn.net/Articles/531138/>
>> [4]https://lkml.org/lkml/2013/6/6/462 <https://lkml.org/lkml/2013/6/6/462>&
>> https://github.com/torvalds/linux/commit/e62e384e
>> <https://github.com/torvalds/linux/commit/e62e384e>.

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <534B982D.8060106-6n0RBC2Q8y8S+FvcfC7Uqw@public.gmane.org>]

* Re: Kernel scanning/freeing to relieve cgroup memory pressure
       [not found]       ` <534B982D.8060106-6n0RBC2Q8y8S+FvcfC7Uqw@public.gmane.org>
@ 2014-04-14 20:50         ` Johannes Weiner
       [not found]           ` <20140414205034.GA6443-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Johannes Weiner @ 2014-04-14 20:50 UTC (permalink / raw)
  To: Glyn Normington
  Cc: Michal Hocko, Tejun Heo, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Mon, Apr 14, 2014 at 09:11:25AM +0100, Glyn Normington wrote:
> Johannes/Michal
> 
> What are your thoughts on this matter? Do you see this as a valid
> requirement?

As Tejun said, memory cgroups *do* respond to internal pressure and
enter targetted reclaim before invoking the OOM killer.  So I'm not
exactly sure what you are asking.

> On 02/04/2014 19:00, Tejun Heo wrote:
> >(cc'ing memcg maintainers and cgroup ML)
> >
> >On Wed, Apr 02, 2014 at 02:08:04PM +0100, Glyn Normington wrote:
> >>Currently, a memory cgroup can hit its oom limit when pages could, in
> >>principle, be reclaimed by the kernel except that the kernel does not
> >>respond directly to cgroup-local memory pressure.
> >So, ummm, it does.
> >
> >>A use case where this is important is running a moderately large Java
> >>application in a memory cgroup in a PaaS environment where cost to the
> >>user depends on the memory limit ([1]). Users need to tune the memory
> >>limit to reduce their costs. During application initialisation large
> >>numbers of JAR files are opened (read-only) and read while loading the
> >>application code and its dependencies. This is reflected in a peak of
> >>file cache usage which can push the memory cgroup memory usage
> >>significantly higher than the value actually needed to run the application.
> >>
> >>Possible approaches include (1) automatic response to cgroup-local
> >>memory pressure in the kernel, and (2) a kernel API for reclaiming
> >>memory from a cgroup which could be driven under oom notification (with
> >>the oom killer disabled for the cgroup - it would be enabled if the
> >>cgroup was still oom after calling the kernel to reclaim memory).
> >>
> >>Clearly (1) is the preferred approach. The closest facility in the
> >>kernel to (2) is to ask the kernel to free pagecache using `echo 1 >
> >>/proc/sys/vms/drop_caches`, but that is too wide-ranging, especially in
> >>a PaaS environment hosting multiple applications. A similar facility
> >>could be provided for a cgroup via a cgroup pseudo-file
> >>`memory.drop_caches`.
> >>
> >>Other approaches include a mempressure cgroup ([2]) which would not be
> >>suitable for PaaS applications. See [3] for Andrew Morton's response. A
> >>related workaround ([4]) was included in the 3.6 kernel.
> >>
> >>Related discussions:
> >>[1] https://groups.google.com/a/cloudfoundry.org/d/topic/vcap-dev/6M8BDV_tq7w/discussion
> >>[2]https://lwn.net/Articles/531077/ <https://lwn.net/Articles/531077/>
> >>[3]https://lwn.net/Articles/531138/ <https://lwn.net/Articles/531138/>
> >>[4]https://lkml.org/lkml/2013/6/6/462 <https://lkml.org/lkml/2013/6/6/462>&
> >>https://github.com/torvalds/linux/commit/e62e384e
> >><https://github.com/torvalds/linux/commit/e62e384e>.
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <20140414205034.GA6443-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]

* Re: Kernel scanning/freeing to relieve cgroup memory pressure
       [not found]           ` <20140414205034.GA6443-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
@ 2014-04-15  8:38             ` Glyn Normington
  2014-04-16  9:11               ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Glyn Normington @ 2014-04-15  8:38 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Tejun Heo, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 14/04/2014 21:50, Johannes Weiner wrote:
> On Mon, Apr 14, 2014 at 09:11:25AM +0100, Glyn Normington wrote:
>> Johannes/Michal
>>
>> What are your thoughts on this matter? Do you see this as a valid
>> requirement?
> As Tejun said, memory cgroups *do* respond to internal pressure and
> enter targetted reclaim before invoking the OOM killer.  So I'm not
> exactly sure what you are asking.
We are repeatedly seeing a situation where a memory cgroup with a given 
memory limit results in an application process in the cgroup being 
killed oom during application initialisation. One theory is that dirty 
file cache pages are not being written to disk to reduce memory 
consumption before the oom killer is invoked. Should memory cgroups' 
response to internal pressure include writing dirty file cache pages to 
disk?
>
>> On 02/04/2014 19:00, Tejun Heo wrote:
>>> (cc'ing memcg maintainers and cgroup ML)
>>>
>>> On Wed, Apr 02, 2014 at 02:08:04PM +0100, Glyn Normington wrote:
>>>> Currently, a memory cgroup can hit its oom limit when pages could, in
>>>> principle, be reclaimed by the kernel except that the kernel does not
>>>> respond directly to cgroup-local memory pressure.
>>> So, ummm, it does.
>>>
>>>> A use case where this is important is running a moderately large Java
>>>> application in a memory cgroup in a PaaS environment where cost to the
>>>> user depends on the memory limit ([1]). Users need to tune the memory
>>>> limit to reduce their costs. During application initialisation large
>>>> numbers of JAR files are opened (read-only) and read while loading the
>>>> application code and its dependencies. This is reflected in a peak of
>>>> file cache usage which can push the memory cgroup memory usage
>>>> significantly higher than the value actually needed to run the application.
>>>>
>>>> Possible approaches include (1) automatic response to cgroup-local
>>>> memory pressure in the kernel, and (2) a kernel API for reclaiming
>>>> memory from a cgroup which could be driven under oom notification (with
>>>> the oom killer disabled for the cgroup - it would be enabled if the
>>>> cgroup was still oom after calling the kernel to reclaim memory).
>>>>
>>>> Clearly (1) is the preferred approach. The closest facility in the
>>>> kernel to (2) is to ask the kernel to free pagecache using `echo 1 >
>>>> /proc/sys/vms/drop_caches`, but that is too wide-ranging, especially in
>>>> a PaaS environment hosting multiple applications. A similar facility
>>>> could be provided for a cgroup via a cgroup pseudo-file
>>>> `memory.drop_caches`.
>>>>
>>>> Other approaches include a mempressure cgroup ([2]) which would not be
>>>> suitable for PaaS applications. See [3] for Andrew Morton's response. A
>>>> related workaround ([4]) was included in the 3.6 kernel.
>>>>
>>>> Related discussions:
>>>> [1] https://groups.google.com/a/cloudfoundry.org/d/topic/vcap-dev/6M8BDV_tq7w/discussion
>>>> [2]https://lwn.net/Articles/531077/ <https://lwn.net/Articles/531077/>
>>>> [3]https://lwn.net/Articles/531138/ <https://lwn.net/Articles/531138/>
>>>> [4]https://lkml.org/lkml/2013/6/6/462 <https://lkml.org/lkml/2013/6/6/462>&
>>>> https://github.com/torvalds/linux/commit/e62e384e
>>>> <https://github.com/torvalds/linux/commit/e62e384e>.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kernel scanning/freeing to relieve cgroup memory pressure
  2014-04-15  8:38             ` Glyn Normington
@ 2014-04-16  9:11               ` Michal Hocko
  2014-04-17  8:00                 ` Glyn Normington
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2014-04-16  9:11 UTC (permalink / raw)
  To: Glyn Normington; +Cc: Johannes Weiner, Tejun Heo, linux-kernel, cgroups

On Tue 15-04-14 09:38:10, Glyn Normington wrote:
> On 14/04/2014 21:50, Johannes Weiner wrote:
> >On Mon, Apr 14, 2014 at 09:11:25AM +0100, Glyn Normington wrote:
> >>Johannes/Michal
> >>
> >>What are your thoughts on this matter? Do you see this as a valid
> >>requirement?
> >As Tejun said, memory cgroups *do* respond to internal pressure and
> >enter targetted reclaim before invoking the OOM killer.  So I'm not
> >exactly sure what you are asking.
> We are repeatedly seeing a situation where a memory cgroup with a given
> memory limit results in an application process in the cgroup being killed
> oom during application initialisation. One theory is that dirty file cache
> pages are not being written to disk to reduce memory consumption before the
> oom killer is invoked. Should memory cgroups' response to internal pressure
> include writing dirty file cache pages to disk?

This depends on the kernel version. OOM with a lot of dirty pages on
memcg LRUs was a big problem. Now we are waiting for pages under
writeback during reclaim which should prevent from such spurious OOMs.
Which kernel versions are we talking about? The fix (or better said
workaround) I am thinking about is e62e384e9da8 memcg: prevent OOM with
too many dirty pages.

I am still not sure I understand your setup and the problem. Could you
describe your setup (what runs where under what limits), please?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kernel scanning/freeing to relieve cgroup memory pressure
  2014-04-16  9:11               ` Michal Hocko
@ 2014-04-17  8:00                 ` Glyn Normington
  0 siblings, 0 replies; 6+ messages in thread
From: Glyn Normington @ 2014-04-17  8:00 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Johannes Weiner, Tejun Heo, linux-kernel, cgroups

On 16/04/2014 10:11, Michal Hocko wrote:
> On Tue 15-04-14 09:38:10, Glyn Normington wrote:
>> On 14/04/2014 21:50, Johannes Weiner wrote:
>>> On Mon, Apr 14, 2014 at 09:11:25AM +0100, Glyn Normington wrote:
>>>> Johannes/Michal
>>>>
>>>> What are your thoughts on this matter? Do you see this as a valid
>>>> requirement?
>>> As Tejun said, memory cgroups *do* respond to internal pressure and
>>> enter targetted reclaim before invoking the OOM killer.  So I'm not
>>> exactly sure what you are asking.
>> We are repeatedly seeing a situation where a memory cgroup with a given
>> memory limit results in an application process in the cgroup being killed
>> oom during application initialisation. One theory is that dirty file cache
>> pages are not being written to disk to reduce memory consumption before the
>> oom killer is invoked. Should memory cgroups' response to internal pressure
>> include writing dirty file cache pages to disk?
> This depends on the kernel version. OOM with a lot of dirty pages on
> memcg LRUs was a big problem. Now we are waiting for pages under
> writeback during reclaim which should prevent from such spurious OOMs.
> Which kernel versions are we talking about? The fix (or better said
> workaround) I am thinking about is e62e384e9da8 memcg: prevent OOM with
> too many dirty pages.
Thanks Michal - very helpful!

The kernel version, as reported by uname -r, is 3.2.0-23-generic.

According to https://github.com/torvalds/linux/commit/e62e384e9da8, the 
above workaround first went into kernel version 3.6, so we should plan 
to upgrade.
>
> I am still not sure I understand your setup and the problem. Could you
> describe your setup (what runs where under what limits), please?
I won't waste your time with the details of our setup unless the problem 
recurs with e62e384e9da8 in place.

Regards,
Glyn

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-04-17  8:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <533C0BB4.4070009@gopivotal.com>
     [not found] ` <533C0BB4.4070009-6n0RBC2Q8y8S+FvcfC7Uqw@public.gmane.org>
2014-04-02 18:00   ` Kernel scanning/freeing to relieve cgroup memory pressure Tejun Heo
2014-04-14  8:11     ` Glyn Normington
     [not found]       ` <534B982D.8060106-6n0RBC2Q8y8S+FvcfC7Uqw@public.gmane.org>
2014-04-14 20:50         ` Johannes Weiner
     [not found]           ` <20140414205034.GA6443-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2014-04-15  8:38             ` Glyn Normington
2014-04-16  9:11               ` Michal Hocko
2014-04-17  8:00                 ` Glyn Normington

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).