linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
       [not found] <CAMcjixYa-mjo5TrxmtBkr0MOf+8r_iSeW5MF4c8nJKdp5m+RPA@mail.gmail.com>
@ 2013-07-01 18:01 ` Serge Hallyn
  2013-07-01 18:45   ` Johannes Weiner
  0 siblings, 1 reply; 5+ messages in thread
From: Serge Hallyn @ 2013-07-01 18:01 UTC (permalink / raw)
  To: Aaron Staley; +Cc: containers, Paul Menage, Li Zefan, linux-kernel, linux-mm

Quoting Aaron Staley (aaron@picloud.com):
> This is better explained here:
> http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> (The
> highest-voted answer believes this to be a kernel bug.)

Hi,

in irc it has been suggested that indeed the kernel should be slowing
down new page creates while waiting for old page cache entries to be
written out to disk, rather than ooming.

With a 3.0.27-1-ac100 kernel, doing dd if=/dev/zero of=xxx bs=1M
count=100 is immediately killed.  In contrast, doing the same from a
3.0.8 kernel did the right thing for me.  But I did reproduce your
experiment below on ec2 with the same result.

So, cc:ing linux-mm in the hopes someone can tell us whether this
is expected behavior, known mis-behavior, or an unknown bug.

> Summary: I have set up a system where I am using LXC to create multiple
> virtualized containers on my system with limited resources. Unfortunately, I'm
> running into a troublesome scenario where the OOM killer is hard killing
> processes in my LXC container when I write a file with size exceeding the
> memory limitation (set to 300MB). There appears to be some issue with the
> file buffering respecting the containers memory limit.
> 
> 
> Reproducing:
> 
> /done on a c1.xlarge instance running on Amazon EC2
> 
> Create 6 empty lxc containers (in my case I did lxc-create -n testcon -t
> ubuntu -- -r precise)
> 
> Modify the configuration of each container to set lxc.cgroup.memory.
> limit_in_bytes = 300M
> 
> Within each container run:
> dd if=/dev/zero of=test2 bs=100k count=5010
> parallel
> 
> This will with high probability activate the OOM (as seen in demsg); often
> the dd processes themselves will be killed.
> 
> This has been verified to have problems on:
> Linux 3.8.0-25-generic #37-Ubuntu SMP and Linux ip-10-8-139-98
> 3.2.0-29-virtual #46-Ubuntu SMP Fri Jul 27 17:23:50 UTC 2012 x86_64 x86_64
> x86_64 GNU/Linux
> 
> Please let me know your thoughts.
> 
> Regards,
> Aaron Staley
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
  2013-07-01 18:01 ` PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM Serge Hallyn
@ 2013-07-01 18:45   ` Johannes Weiner
  2013-07-01 18:51     ` Aaron Staley
  2013-07-01 19:02     ` Serge Hallyn
  0 siblings, 2 replies; 5+ messages in thread
From: Johannes Weiner @ 2013-07-01 18:45 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Aaron Staley, containers, Paul Menage, Li Zefan, Michal Hocko,
	linux-kernel, linux-mm

On Mon, Jul 01, 2013 at 01:01:01PM -0500, Serge Hallyn wrote:
> Quoting Aaron Staley (aaron@picloud.com):
> > This is better explained here:
> > http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> > (The
> > highest-voted answer believes this to be a kernel bug.)
> 
> Hi,
> 
> in irc it has been suggested that indeed the kernel should be slowing
> down new page creates while waiting for old page cache entries to be
> written out to disk, rather than ooming.
> 
> With a 3.0.27-1-ac100 kernel, doing dd if=/dev/zero of=xxx bs=1M
> count=100 is immediately killed.  In contrast, doing the same from a
> 3.0.8 kernel did the right thing for me.  But I did reproduce your
> experiment below on ec2 with the same result.
>
> So, cc:ing linux-mm in the hopes someone can tell us whether this
> is expected behavior, known mis-behavior, or an unknown bug.

It's a known issue that was fixed/improved in e62e384 'memcg: prevent
OOM with too many dirty pages', included in 3.6+.

> > Summary: I have set up a system where I am using LXC to create multiple
> > virtualized containers on my system with limited resources. Unfortunately, I'm
> > running into a troublesome scenario where the OOM killer is hard killing
> > processes in my LXC container when I write a file with size exceeding the
> > memory limitation (set to 300MB). There appears to be some issue with the
> > file buffering respecting the containers memory limit.
> > 
> > 
> > Reproducing:
> > 
> > /done on a c1.xlarge instance running on Amazon EC2
> > 
> > Create 6 empty lxc containers (in my case I did lxc-create -n testcon -t
> > ubuntu -- -r precise)
> > 
> > Modify the configuration of each container to set lxc.cgroup.memory.
> > limit_in_bytes = 300M
> > 
> > Within each container run:
> > dd if=/dev/zero of=test2 bs=100k count=5010
> > parallel
> > 
> > This will with high probability activate the OOM (as seen in demsg); often
> > the dd processes themselves will be killed.
> > 
> > This has been verified to have problems on:
> > Linux 3.8.0-25-generic #37-Ubuntu SMP and Linux ip-10-8-139-98
> > 3.2.0-29-virtual #46-Ubuntu SMP Fri Jul 27 17:23:50 UTC 2012 x86_64 x86_64
> > x86_64 GNU/Linux
> > 
> > Please let me know your thoughts.
> > 
> > Regards,
> > Aaron Staley
> > _______________________________________________
> > Containers mailing list
> > Containers@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/containers
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
  2013-07-01 18:45   ` Johannes Weiner
@ 2013-07-01 18:51     ` Aaron Staley
  2013-07-01 19:02     ` Serge Hallyn
  1 sibling, 0 replies; 5+ messages in thread
From: Aaron Staley @ 2013-07-01 18:51 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Serge Hallyn, containers@lists.linux-foundation.org, Paul Menage,
	Li Zefan, Michal Hocko, linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 3116 bytes --]

Hi Johannes,

It does appear to still be happening on Linux 3.8.  Does it remain an open
issue?

Regards,
Aaron


On Mon, Jul 1, 2013 at 11:45 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:

> On Mon, Jul 01, 2013 at 01:01:01PM -0500, Serge Hallyn wrote:
> > Quoting Aaron Staley (aaron@picloud.com):
> > > This is better explained here:
> > >
> http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> > > (The
> > > highest-voted answer believes this to be a kernel bug.)
> >
> > Hi,
> >
> > in irc it has been suggested that indeed the kernel should be slowing
> > down new page creates while waiting for old page cache entries to be
> > written out to disk, rather than ooming.
> >
> > With a 3.0.27-1-ac100 kernel, doing dd if=/dev/zero of=xxx bs=1M
> > count=100 is immediately killed.  In contrast, doing the same from a
> > 3.0.8 kernel did the right thing for me.  But I did reproduce your
> > experiment below on ec2 with the same result.
> >
> > So, cc:ing linux-mm in the hopes someone can tell us whether this
> > is expected behavior, known mis-behavior, or an unknown bug.
>
> It's a known issue that was fixed/improved in e62e384 'memcg: prevent
> OOM with too many dirty pages', included in 3.6+.
>
> > > Summary: I have set up a system where I am using LXC to create multiple
> > > virtualized containers on my system with limited resources.
> Unfortunately, I'm
> > > running into a troublesome scenario where the OOM killer is hard
> killing
> > > processes in my LXC container when I write a file with size exceeding
> the
> > > memory limitation (set to 300MB). There appears to be some issue with
> the
> > > file buffering respecting the containers memory limit.
> > >
> > >
> > > Reproducing:
> > >
> > > /done on a c1.xlarge instance running on Amazon EC2
> > >
> > > Create 6 empty lxc containers (in my case I did lxc-create -n testcon
> -t
> > > ubuntu -- -r precise)
> > >
> > > Modify the configuration of each container to set lxc.cgroup.memory.
> > > limit_in_bytes = 300M
> > >
> > > Within each container run:
> > > dd if=/dev/zero of=test2 bs=100k count=5010
> > > parallel
> > >
> > > This will with high probability activate the OOM (as seen in demsg);
> often
> > > the dd processes themselves will be killed.
> > >
> > > This has been verified to have problems on:
> > > Linux 3.8.0-25-generic #37-Ubuntu SMP and Linux ip-10-8-139-98
> > > 3.2.0-29-virtual #46-Ubuntu SMP Fri Jul 27 17:23:50 UTC 2012 x86_64
> x86_64
> > > x86_64 GNU/Linux
> > >
> > > Please let me know your thoughts.
> > >
> > > Regards,
> > > Aaron Staley
> > > _______________________________________________
> > > Containers mailing list
> > > Containers@lists.linux-foundation.org
> > > https://lists.linuxfoundation.org/mailman/listinfo/containers
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>



-- 
Aaron Staley
*PiCloud, Inc.*

[-- Attachment #2: Type: text/html, Size: 4730 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
  2013-07-01 18:45   ` Johannes Weiner
  2013-07-01 18:51     ` Aaron Staley
@ 2013-07-01 19:02     ` Serge Hallyn
  2013-07-02 12:42       ` Michal Hocko
  1 sibling, 1 reply; 5+ messages in thread
From: Serge Hallyn @ 2013-07-01 19:02 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Aaron Staley, containers, Paul Menage, Li Zefan, Michal Hocko,
	linux-kernel, linux-mm

Quoting Johannes Weiner (hannes@cmpxchg.org):
> On Mon, Jul 01, 2013 at 01:01:01PM -0500, Serge Hallyn wrote:
> > Quoting Aaron Staley (aaron@picloud.com):
> > > This is better explained here:
> > > http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> > > (The
> > > highest-voted answer believes this to be a kernel bug.)
> > 
> > Hi,
> > 
> > in irc it has been suggested that indeed the kernel should be slowing
> > down new page creates while waiting for old page cache entries to be
> > written out to disk, rather than ooming.
> > 
> > With a 3.0.27-1-ac100 kernel, doing dd if=/dev/zero of=xxx bs=1M
> > count=100 is immediately killed.  In contrast, doing the same from a
> > 3.0.8 kernel did the right thing for me.  But I did reproduce your
> > experiment below on ec2 with the same result.
> >
> > So, cc:ing linux-mm in the hopes someone can tell us whether this
> > is expected behavior, known mis-behavior, or an unknown bug.
> 
> It's a known issue that was fixed/improved in e62e384 'memcg: prevent

Ah ok, I see the commit says:

    The solution is far from being ideal - long term solution is memcg aware
    dirty throttling - but it is meant to be a band aid until we have a real
    fix.  We are seeing this happening during nightly backups which are placed

... and ...

    The issue is more visible with slower devices for output.

I'm guessing we see it on ec2 because of slowed fs.

> OOM with too many dirty pages', included in 3.6+.

Is anyone actively working on the long term solution?

thanks,
-serge

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
  2013-07-01 19:02     ` Serge Hallyn
@ 2013-07-02 12:42       ` Michal Hocko
  0 siblings, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2013-07-02 12:42 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Johannes Weiner, Aaron Staley, containers, Paul Menage, Li Zefan,
	linux-kernel, linux-mm

On Mon 01-07-13 14:02:22, Serge Hallyn wrote:
> Quoting Johannes Weiner (hannes@cmpxchg.org):
[...]
> > OOM with too many dirty pages', included in 3.6+.
> 
> Is anyone actively working on the long term solution?

Patches for memcg dirty pages accounted were posted quite some time ago.
I plan to look at the at some point but I am rather busy with other
stuff right now. That would be just a first step though. Then we need to
hook into dirty pages throttling and make it memcg aware which sounds
like a bigger challenge.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-07-02 12:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAMcjixYa-mjo5TrxmtBkr0MOf+8r_iSeW5MF4c8nJKdp5m+RPA@mail.gmail.com>
2013-07-01 18:01 ` PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM Serge Hallyn
2013-07-01 18:45   ` Johannes Weiner
2013-07-01 18:51     ` Aaron Staley
2013-07-01 19:02     ` Serge Hallyn
2013-07-02 12:42       ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).