* PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
@ 2013-06-25 5:45 Aaron Staley
[not found] ` <CAMcjixYa-mjo5TrxmtBkr0MOf+8r_iSeW5MF4c8nJKdp5m+RPA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 23+ messages in thread
From: Aaron Staley @ 2013-06-25 5:45 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: Paul Menage, Li Zefan
This is better explained here:
http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
(The
highest-voted answer believes this to be a kernel bug.)
Summary: I have set up a system where I am using LXC to create multiple
virtualized containers on my system with limited resources. Unfortunately, I'm
running into a troublesome scenario where the OOM killer is hard killing
processes in my LXC container when I write a file with size exceeding the
memory limitation (set to 300MB). There appears to be some issue with the
file buffering respecting the containers memory limit.
Reproducing:
/done on a c1.xlarge instance running on Amazon EC2
Create 6 empty lxc containers (in my case I did lxc-create -n testcon -t
ubuntu -- -r precise)
Modify the configuration of each container to set lxc.cgroup.memory.
limit_in_bytes = 300M
Within each container run:
dd if=/dev/zero of=test2 bs=100k count=5010
parallel
This will with high probability activate the OOM (as seen in demsg); often
the dd processes themselves will be killed.
This has been verified to have problems on:
Linux 3.8.0-25-generic #37-Ubuntu SMP and Linux ip-10-8-139-98
3.2.0-29-virtual #46-Ubuntu SMP Fri Jul 27 17:23:50 UTC 2012 x86_64 x86_64
x86_64 GNU/Linux
Please let me know your thoughts.
Regards,
Aaron Staley
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
[not found] ` <CAMcjixYa-mjo5TrxmtBkr0MOf+8r_iSeW5MF4c8nJKdp5m+RPA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-06-25 13:24 ` Serge Hallyn
2013-06-25 20:54 ` Aaron Staley
2013-07-01 18:01 ` Serge Hallyn
1 sibling, 1 reply; 23+ messages in thread
From: Serge Hallyn @ 2013-06-25 13:24 UTC (permalink / raw)
To: Aaron Staley
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Li Zefan,
Paul Menage
Quoting Aaron Staley (aaron-ZNH+RosXeVlBDgjK7y7TUQ@public.gmane.org):
> This is better explained here:
> http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> (The
> highest-voted answer believes this to be a kernel bug.)
Yeah, sorry I haven't had time to look more into it, but I'm pretty
that's the case. When you sent the previous email I looked quickly at
the dd source. I had always assumed that dd looked at available memory
and malloced as much as it thought it could - but looking at the source,
it does not in fact do that. So yes, I think the kernel is simply
leaving it all in page cache and accounting that to the process which
then gets OOMed.
Instead, the kernel should be throttling the task while it waits for
the page cache to be written to disk (since blkio might also be
slowed down).
-serge
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
2013-06-25 13:24 ` Serge Hallyn
@ 2013-06-25 20:54 ` Aaron Staley
[not found] ` <CAMcjixZtW0KdAmLXyDoGFqL2qh4aEgdGjBYJWrPESFMz0dvELw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 23+ messages in thread
From: Aaron Staley @ 2013-06-25 20:54 UTC (permalink / raw)
To: Serge Hallyn
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Li Zefan,
Paul Menage
Hi Serge,
Thanks a lot. Would you know of any workarounds outside of forcing every
write to sync to disk (which kills performance)? Perhaps some settings in
the container I can set? Unfortunately, modifying dirty_background_ratio
and dirty_expire_centiseconds globally (/etc/sysctl.conf) as suggested by
the serverfault answer will not stop the OOM kills.
Regards,
Aaron
On Tue, Jun 25, 2013 at 6:24 AM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>wrote:
> Quoting Aaron Staley (aaron-ZNH+RosXeVlBDgjK7y7TUQ@public.gmane.org):
> > This is better explained here:
> >
> http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> > (The
> > highest-voted answer believes this to be a kernel bug.)
>
> Yeah, sorry I haven't had time to look more into it, but I'm pretty
> that's the case. When you sent the previous email I looked quickly at
> the dd source. I had always assumed that dd looked at available memory
> and malloced as much as it thought it could - but looking at the source,
> it does not in fact do that. So yes, I think the kernel is simply
> leaving it all in page cache and accounting that to the process which
> then gets OOMed.
>
> Instead, the kernel should be throttling the task while it waits for
> the page cache to be written to disk (since blkio might also be
> slowed down).
>
> -serge
>
--
Aaron Staley
*PiCloud, Inc.*
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
[not found] ` <CAMcjixZtW0KdAmLXyDoGFqL2qh4aEgdGjBYJWrPESFMz0dvELw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-06-27 5:12 ` Zhu Yanhai
[not found] ` <CAC8teKXNof5PdU2dMz4iZ8tM=Dx=aZ0efxotDhdc-xopeK-+Xg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-06-28 18:52 ` Serge E. Hallyn
1 sibling, 1 reply; 23+ messages in thread
From: Zhu Yanhai @ 2013-06-27 5:12 UTC (permalink / raw)
To: Aaron Staley
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
Serge Hallyn, Li Zefan, Paul Menage
Hi,
Please check this patch, it could fix your problem,
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e62e384e9da8d9a0c599795464a7e76fd490931c
--
Thanks,
Zhu Yanhai
2013/6/26 Aaron Staley <aaron-ZNH+RosXeVlBDgjK7y7TUQ@public.gmane.org>
> Hi Serge,
>
> Thanks a lot. Would you know of any workarounds outside of forcing every
> write to sync to disk (which kills performance)? Perhaps some settings in
> the container I can set? Unfortunately, modifying dirty_background_ratio
> and dirty_expire_centiseconds globally (/etc/sysctl.conf) as suggested by
> the serverfault answer will not stop the OOM kills.
>
> Regards,
> Aaron
>
>
> On Tue, Jun 25, 2013 at 6:24 AM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org
> >wrote:
>
> > Quoting Aaron Staley (aaron-ZNH+RosXeVlBDgjK7y7TUQ@public.gmane.org):
> > > This is better explained here:
> > >
> >
> http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> > > (The
> > > highest-voted answer believes this to be a kernel bug.)
> >
> > Yeah, sorry I haven't had time to look more into it, but I'm pretty
> > that's the case. When you sent the previous email I looked quickly at
> > the dd source. I had always assumed that dd looked at available memory
> > and malloced as much as it thought it could - but looking at the source,
> > it does not in fact do that. So yes, I think the kernel is simply
> > leaving it all in page cache and accounting that to the process which
> > then gets OOMed.
> >
> > Instead, the kernel should be throttling the task while it waits for
> > the page cache to be written to disk (since blkio might also be
> > slowed down).
> >
> > -serge
> >
>
>
>
> --
> Aaron Staley
> *PiCloud, Inc.*
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
[not found] ` <CAC8teKXNof5PdU2dMz4iZ8tM=Dx=aZ0efxotDhdc-xopeK-+Xg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-06-27 5:19 ` Aaron Staley
[not found] ` <CAMcjixZdXWGezcVQPPo3M0yNBx4TtbiOoqYJQWvNbTyqULifUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 23+ messages in thread
From: Aaron Staley @ 2013-06-27 5:19 UTC (permalink / raw)
To: Zhu Yanhai
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
Serge Hallyn, Li Zefan, Paul Menage
The behavior it fixes sounds similar to what I'm seeing. However, if I read
the logs correctly, wasn't this committed into Linux 3.5? If so, wouldn't
Linux 3.8.0-25-generic #37-Ubuntu SMP (where I can reproduce the problem)
already have this fix?
Thanks,
Aaron
On Wed, Jun 26, 2013 at 10:12 PM, Zhu Yanhai <zhu.yanhai-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi,
> Please check this patch, it could fix your problem,
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e62e384e9da8d9a0c599795464a7e76fd490931c
>
> --
> Thanks,
> Zhu Yanhai
>
>
> 2013/6/26 Aaron Staley <aaron-ZNH+RosXeVlBDgjK7y7TUQ@public.gmane.org>
>
>> Hi Serge,
>>
>> Thanks a lot. Would you know of any workarounds outside of forcing every
>> write to sync to disk (which kills performance)? Perhaps some settings in
>> the container I can set? Unfortunately, modifying dirty_background_ratio
>> and dirty_expire_centiseconds globally (/etc/sysctl.conf) as suggested by
>> the serverfault answer will not stop the OOM kills.
>>
>> Regards,
>> Aaron
>>
>>
>> On Tue, Jun 25, 2013 at 6:24 AM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org
>> >wrote:
>>
>> > Quoting Aaron Staley (aaron-ZNH+RosXeVlBDgjK7y7TUQ@public.gmane.org):
>> > > This is better explained here:
>> > >
>> >
>> http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
>> > > (The
>> > > highest-voted answer believes this to be a kernel bug.)
>> >
>> > Yeah, sorry I haven't had time to look more into it, but I'm pretty
>> > that's the case. When you sent the previous email I looked quickly at
>> > the dd source. I had always assumed that dd looked at available memory
>> > and malloced as much as it thought it could - but looking at the source,
>> > it does not in fact do that. So yes, I think the kernel is simply
>> > leaving it all in page cache and accounting that to the process which
>> > then gets OOMed.
>> >
>> > Instead, the kernel should be throttling the task while it waits for
>> > the page cache to be written to disk (since blkio might also be
>> > slowed down).
>> >
>> > -serge
>> >
>>
>>
>>
>> --
>> Aaron Staley
>> *PiCloud, Inc.*
>> _______________________________________________
>> Containers mailing list
>> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>> https://lists.linuxfoundation.org/mailman/listinfo/containers
>>
>
>
--
Aaron Staley
*PiCloud, Inc.*
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
[not found] ` <CAMcjixZdXWGezcVQPPo3M0yNBx4TtbiOoqYJQWvNbTyqULifUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-06-27 20:51 ` Aristeu Rozanski
2013-07-01 15:21 ` Serge Hallyn
1 sibling, 0 replies; 23+ messages in thread
From: Aristeu Rozanski @ 2013-06-27 20:51 UTC (permalink / raw)
To: Aaron Staley
Cc: Paul Menage, Serge Hallyn, Li Zefan,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
On Wed, Jun 26, 2013 at 10:19:51PM -0700, Aaron Staley wrote:
> The behavior it fixes sounds similar to what I'm seeing. However, if I read
> the logs correctly, wasn't this committed into Linux 3.5? If so, wouldn't
> Linux 3.8.0-25-generic #37-Ubuntu SMP (where I can reproduce the problem)
> already have this fix?
3.6, but yes, it should be in your kernel.
--
Aristeu
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
[not found] ` <CAMcjixZtW0KdAmLXyDoGFqL2qh4aEgdGjBYJWrPESFMz0dvELw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-06-27 5:12 ` Zhu Yanhai
@ 2013-06-28 18:52 ` Serge E. Hallyn
1 sibling, 0 replies; 23+ messages in thread
From: Serge E. Hallyn @ 2013-06-28 18:52 UTC (permalink / raw)
To: Aaron Staley
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Serge Hallyn, Li Zefan, Paul Menage
Quoting Aaron Staley (aaron-ZNH+RosXeVlBDgjK7y7TUQ@public.gmane.org):
> Hi Serge,
>
> Thanks a lot. Would you know of any workarounds outside of forcing every
> write to sync to disk (which kills performance)? Perhaps some settings in
> the container I can set? Unfortunately, modifying dirty_background_ratio
> and dirty_expire_centiseconds globally (/etc/sysctl.conf) as suggested by
> the serverfault answer will not stop the OOM kills.
Sorry i haven't replied yet have I - no, I'm afraid I don't have any
good ideas offhand. I think this is something that's going to have
to be fixed in the kernel, and there's really nothing you can do
in the container. I could be wrong.
Over the next few days I'm going to be taking a fresh look at the
behaviors of the various controllers (at least as documented) for
the sake of thinking about configuration api - perhaps the answer
will become clear to me as I do that.
-serge
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
[not found] ` <CAMcjixZdXWGezcVQPPo3M0yNBx4TtbiOoqYJQWvNbTyqULifUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-06-27 20:51 ` Aristeu Rozanski
@ 2013-07-01 15:21 ` Serge Hallyn
2013-07-01 15:27 ` Aaron Staley
1 sibling, 1 reply; 23+ messages in thread
From: Serge Hallyn @ 2013-07-01 15:21 UTC (permalink / raw)
To: Aaron Staley
Cc: Paul Menage, Li Zefan,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Quoting Aaron Staley (aaron-ZNH+RosXeVlBDgjK7y7TUQ@public.gmane.org):
> The behavior it fixes sounds similar to what I'm seeing. However, if I read
> the logs correctly, wasn't this committed into Linux 3.5? If so, wouldn't
> Linux 3.8.0-25-generic #37-Ubuntu SMP (where I can reproduce the problem)
> already have this fix?
Hi,
I've been trying to reproduce this in ubuntu raring, but couldn't. I
started a shell as unprivileged user, and stuck it in a memory cgroup
with memory.limit_in_bytes set to 10M. Did dd if=/dev/zero of=/tmp/xxx
bs=1M count=100M - it was slow, and memory.failcnt hit 1612526, but it
did in the end succeed.
kernel here is 3.8.0-23-generic #34-Ubuntu, slightly older than you.
Maybe you're onto a new regression?
-serge
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
2013-07-01 15:21 ` Serge Hallyn
@ 2013-07-01 15:27 ` Aaron Staley
0 siblings, 0 replies; 23+ messages in thread
From: Aaron Staley @ 2013-07-01 15:27 UTC (permalink / raw)
To: Serge Hallyn
Cc: Paul Menage, Li Zefan,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Hi Serge,
To reproduce, I need to run ~6 containers in parallel each running that
command. I generally cannot reproduce it with just one running. (Repro
instructions are in original email).
Regards,
Aaron
On Mon, Jul 1, 2013 at 8:21 AM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>wrote:
> Quoting Aaron Staley (aaron-ZNH+RosXeVlBDgjK7y7TUQ@public.gmane.org):
> > The behavior it fixes sounds similar to what I'm seeing. However, if I
> read
> > the logs correctly, wasn't this committed into Linux 3.5? If so, wouldn't
> > Linux 3.8.0-25-generic #37-Ubuntu SMP (where I can reproduce the problem)
> > already have this fix?
>
> Hi,
>
> I've been trying to reproduce this in ubuntu raring, but couldn't. I
> started a shell as unprivileged user, and stuck it in a memory cgroup
> with memory.limit_in_bytes set to 10M. Did dd if=/dev/zero of=/tmp/xxx
> bs=1M count=100M - it was slow, and memory.failcnt hit 1612526, but it
> did in the end succeed.
>
> kernel here is 3.8.0-23-generic #34-Ubuntu, slightly older than you.
> Maybe you're onto a new regression?
>
> -serge
>
--
Aaron Staley
*PiCloud, Inc.*
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
2013-06-25 5:45 PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM Aaron Staley
[not found] ` <CAMcjixYa-mjo5TrxmtBkr0MOf+8r_iSeW5MF4c8nJKdp5m+RPA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-07-01 18:01 ` Serge Hallyn
0 siblings, 0 replies; 23+ messages in thread
From: Serge Hallyn @ 2013-07-01 18:01 UTC (permalink / raw)
To: Aaron Staley
Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Li Zefan,
Paul Menage, linux-kernel-u79uwXL29TY76Z2rM5mHXA
Quoting Aaron Staley (aaron-ZNH+RosXeVlBDgjK7y7TUQ@public.gmane.org):
> This is better explained here:
> http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> (The
> highest-voted answer believes this to be a kernel bug.)
Hi,
in irc it has been suggested that indeed the kernel should be slowing
down new page creates while waiting for old page cache entries to be
written out to disk, rather than ooming.
With a 3.0.27-1-ac100 kernel, doing dd if=/dev/zero of=xxx bs=1M
count=100 is immediately killed. In contrast, doing the same from a
3.0.8 kernel did the right thing for me. But I did reproduce your
experiment below on ec2 with the same result.
So, cc:ing linux-mm in the hopes someone can tell us whether this
is expected behavior, known mis-behavior, or an unknown bug.
> Summary: I have set up a system where I am using LXC to create multiple
> virtualized containers on my system with limited resources. Unfortunately, I'm
> running into a troublesome scenario where the OOM killer is hard killing
> processes in my LXC container when I write a file with size exceeding the
> memory limitation (set to 300MB). There appears to be some issue with the
> file buffering respecting the containers memory limit.
>
>
> Reproducing:
>
> /done on a c1.xlarge instance running on Amazon EC2
>
> Create 6 empty lxc containers (in my case I did lxc-create -n testcon -t
> ubuntu -- -r precise)
>
> Modify the configuration of each container to set lxc.cgroup.memory.
> limit_in_bytes = 300M
>
> Within each container run:
> dd if=/dev/zero of=test2 bs=100k count=5010
> parallel
>
> This will with high probability activate the OOM (as seen in demsg); often
> the dd processes themselves will be killed.
>
> This has been verified to have problems on:
> Linux 3.8.0-25-generic #37-Ubuntu SMP and Linux ip-10-8-139-98
> 3.2.0-29-virtual #46-Ubuntu SMP Fri Jul 27 17:23:50 UTC 2012 x86_64 x86_64
> x86_64 GNU/Linux
>
> Please let me know your thoughts.
>
> Regards,
> Aaron Staley
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
@ 2013-07-01 18:01 ` Serge Hallyn
0 siblings, 0 replies; 23+ messages in thread
From: Serge Hallyn @ 2013-07-01 18:01 UTC (permalink / raw)
To: Aaron Staley; +Cc: containers, Paul Menage, Li Zefan, linux-kernel, linux-mm
Quoting Aaron Staley (aaron@picloud.com):
> This is better explained here:
> http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> (The
> highest-voted answer believes this to be a kernel bug.)
Hi,
in irc it has been suggested that indeed the kernel should be slowing
down new page creates while waiting for old page cache entries to be
written out to disk, rather than ooming.
With a 3.0.27-1-ac100 kernel, doing dd if=/dev/zero of=xxx bs=1M
count=100 is immediately killed. In contrast, doing the same from a
3.0.8 kernel did the right thing for me. But I did reproduce your
experiment below on ec2 with the same result.
So, cc:ing linux-mm in the hopes someone can tell us whether this
is expected behavior, known mis-behavior, or an unknown bug.
> Summary: I have set up a system where I am using LXC to create multiple
> virtualized containers on my system with limited resources. Unfortunately, I'm
> running into a troublesome scenario where the OOM killer is hard killing
> processes in my LXC container when I write a file with size exceeding the
> memory limitation (set to 300MB). There appears to be some issue with the
> file buffering respecting the containers memory limit.
>
>
> Reproducing:
>
> /done on a c1.xlarge instance running on Amazon EC2
>
> Create 6 empty lxc containers (in my case I did lxc-create -n testcon -t
> ubuntu -- -r precise)
>
> Modify the configuration of each container to set lxc.cgroup.memory.
> limit_in_bytes = 300M
>
> Within each container run:
> dd if=/dev/zero of=test2 bs=100k count=5010
> parallel
>
> This will with high probability activate the OOM (as seen in demsg); often
> the dd processes themselves will be killed.
>
> This has been verified to have problems on:
> Linux 3.8.0-25-generic #37-Ubuntu SMP and Linux ip-10-8-139-98
> 3.2.0-29-virtual #46-Ubuntu SMP Fri Jul 27 17:23:50 UTC 2012 x86_64 x86_64
> x86_64 GNU/Linux
>
> Please let me know your thoughts.
>
> Regards,
> Aaron Staley
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
@ 2013-07-01 18:01 ` Serge Hallyn
0 siblings, 0 replies; 23+ messages in thread
From: Serge Hallyn @ 2013-07-01 18:01 UTC (permalink / raw)
To: Aaron Staley; +Cc: containers, Paul Menage, Li Zefan, linux-kernel, linux-mm
Quoting Aaron Staley (aaron@picloud.com):
> This is better explained here:
> http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> (The
> highest-voted answer believes this to be a kernel bug.)
Hi,
in irc it has been suggested that indeed the kernel should be slowing
down new page creates while waiting for old page cache entries to be
written out to disk, rather than ooming.
With a 3.0.27-1-ac100 kernel, doing dd if=/dev/zero of=xxx bs=1M
count=100 is immediately killed. In contrast, doing the same from a
3.0.8 kernel did the right thing for me. But I did reproduce your
experiment below on ec2 with the same result.
So, cc:ing linux-mm in the hopes someone can tell us whether this
is expected behavior, known mis-behavior, or an unknown bug.
> Summary: I have set up a system where I am using LXC to create multiple
> virtualized containers on my system with limited resources. Unfortunately, I'm
> running into a troublesome scenario where the OOM killer is hard killing
> processes in my LXC container when I write a file with size exceeding the
> memory limitation (set to 300MB). There appears to be some issue with the
> file buffering respecting the containers memory limit.
>
>
> Reproducing:
>
> /done on a c1.xlarge instance running on Amazon EC2
>
> Create 6 empty lxc containers (in my case I did lxc-create -n testcon -t
> ubuntu -- -r precise)
>
> Modify the configuration of each container to set lxc.cgroup.memory.
> limit_in_bytes = 300M
>
> Within each container run:
> dd if=/dev/zero of=test2 bs=100k count=5010
> parallel
>
> This will with high probability activate the OOM (as seen in demsg); often
> the dd processes themselves will be killed.
>
> This has been verified to have problems on:
> Linux 3.8.0-25-generic #37-Ubuntu SMP and Linux ip-10-8-139-98
> 3.2.0-29-virtual #46-Ubuntu SMP Fri Jul 27 17:23:50 UTC 2012 x86_64 x86_64
> x86_64 GNU/Linux
>
> Please let me know your thoughts.
>
> Regards,
> Aaron Staley
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
2013-07-01 18:01 ` Serge Hallyn
` (2 preceding siblings ...)
(?)
@ 2013-07-01 18:45 ` Johannes Weiner
-1 siblings, 0 replies; 23+ messages in thread
From: Johannes Weiner @ 2013-07-01 18:45 UTC (permalink / raw)
To: Serge Hallyn
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Li Zefan,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Aaron Staley, Paul Menage
On Mon, Jul 01, 2013 at 01:01:01PM -0500, Serge Hallyn wrote:
> Quoting Aaron Staley (aaron-ZNH+RosXeVlBDgjK7y7TUQ@public.gmane.org):
> > This is better explained here:
> > http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> > (The
> > highest-voted answer believes this to be a kernel bug.)
>
> Hi,
>
> in irc it has been suggested that indeed the kernel should be slowing
> down new page creates while waiting for old page cache entries to be
> written out to disk, rather than ooming.
>
> With a 3.0.27-1-ac100 kernel, doing dd if=/dev/zero of=xxx bs=1M
> count=100 is immediately killed. In contrast, doing the same from a
> 3.0.8 kernel did the right thing for me. But I did reproduce your
> experiment below on ec2 with the same result.
>
> So, cc:ing linux-mm in the hopes someone can tell us whether this
> is expected behavior, known mis-behavior, or an unknown bug.
It's a known issue that was fixed/improved in e62e384 'memcg: prevent
OOM with too many dirty pages', included in 3.6+.
> > Summary: I have set up a system where I am using LXC to create multiple
> > virtualized containers on my system with limited resources. Unfortunately, I'm
> > running into a troublesome scenario where the OOM killer is hard killing
> > processes in my LXC container when I write a file with size exceeding the
> > memory limitation (set to 300MB). There appears to be some issue with the
> > file buffering respecting the containers memory limit.
> >
> >
> > Reproducing:
> >
> > /done on a c1.xlarge instance running on Amazon EC2
> >
> > Create 6 empty lxc containers (in my case I did lxc-create -n testcon -t
> > ubuntu -- -r precise)
> >
> > Modify the configuration of each container to set lxc.cgroup.memory.
> > limit_in_bytes = 300M
> >
> > Within each container run:
> > dd if=/dev/zero of=test2 bs=100k count=5010
> > parallel
> >
> > This will with high probability activate the OOM (as seen in demsg); often
> > the dd processes themselves will be killed.
> >
> > This has been verified to have problems on:
> > Linux 3.8.0-25-generic #37-Ubuntu SMP and Linux ip-10-8-139-98
> > 3.2.0-29-virtual #46-Ubuntu SMP Fri Jul 27 17:23:50 UTC 2012 x86_64 x86_64
> > x86_64 GNU/Linux
> >
> > Please let me know your thoughts.
> >
> > Regards,
> > Aaron Staley
> > _______________________________________________
> > Containers mailing list
> > Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > https://lists.linuxfoundation.org/mailman/listinfo/containers
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
2013-07-01 18:01 ` Serge Hallyn
@ 2013-07-01 18:45 ` Johannes Weiner
-1 siblings, 0 replies; 23+ messages in thread
From: Johannes Weiner @ 2013-07-01 18:45 UTC (permalink / raw)
To: Serge Hallyn
Cc: Aaron Staley, containers, Paul Menage, Li Zefan, Michal Hocko,
linux-kernel, linux-mm
On Mon, Jul 01, 2013 at 01:01:01PM -0500, Serge Hallyn wrote:
> Quoting Aaron Staley (aaron@picloud.com):
> > This is better explained here:
> > http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> > (The
> > highest-voted answer believes this to be a kernel bug.)
>
> Hi,
>
> in irc it has been suggested that indeed the kernel should be slowing
> down new page creates while waiting for old page cache entries to be
> written out to disk, rather than ooming.
>
> With a 3.0.27-1-ac100 kernel, doing dd if=/dev/zero of=xxx bs=1M
> count=100 is immediately killed. In contrast, doing the same from a
> 3.0.8 kernel did the right thing for me. But I did reproduce your
> experiment below on ec2 with the same result.
>
> So, cc:ing linux-mm in the hopes someone can tell us whether this
> is expected behavior, known mis-behavior, or an unknown bug.
It's a known issue that was fixed/improved in e62e384 'memcg: prevent
OOM with too many dirty pages', included in 3.6+.
> > Summary: I have set up a system where I am using LXC to create multiple
> > virtualized containers on my system with limited resources. Unfortunately, I'm
> > running into a troublesome scenario where the OOM killer is hard killing
> > processes in my LXC container when I write a file with size exceeding the
> > memory limitation (set to 300MB). There appears to be some issue with the
> > file buffering respecting the containers memory limit.
> >
> >
> > Reproducing:
> >
> > /done on a c1.xlarge instance running on Amazon EC2
> >
> > Create 6 empty lxc containers (in my case I did lxc-create -n testcon -t
> > ubuntu -- -r precise)
> >
> > Modify the configuration of each container to set lxc.cgroup.memory.
> > limit_in_bytes = 300M
> >
> > Within each container run:
> > dd if=/dev/zero of=test2 bs=100k count=5010
> > parallel
> >
> > This will with high probability activate the OOM (as seen in demsg); often
> > the dd processes themselves will be killed.
> >
> > This has been verified to have problems on:
> > Linux 3.8.0-25-generic #37-Ubuntu SMP and Linux ip-10-8-139-98
> > 3.2.0-29-virtual #46-Ubuntu SMP Fri Jul 27 17:23:50 UTC 2012 x86_64 x86_64
> > x86_64 GNU/Linux
> >
> > Please let me know your thoughts.
> >
> > Regards,
> > Aaron Staley
> > _______________________________________________
> > Containers mailing list
> > Containers@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/containers
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
@ 2013-07-01 18:45 ` Johannes Weiner
0 siblings, 0 replies; 23+ messages in thread
From: Johannes Weiner @ 2013-07-01 18:45 UTC (permalink / raw)
To: Serge Hallyn
Cc: Aaron Staley, containers, Paul Menage, Li Zefan, Michal Hocko,
linux-kernel, linux-mm
On Mon, Jul 01, 2013 at 01:01:01PM -0500, Serge Hallyn wrote:
> Quoting Aaron Staley (aaron@picloud.com):
> > This is better explained here:
> > http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> > (The
> > highest-voted answer believes this to be a kernel bug.)
>
> Hi,
>
> in irc it has been suggested that indeed the kernel should be slowing
> down new page creates while waiting for old page cache entries to be
> written out to disk, rather than ooming.
>
> With a 3.0.27-1-ac100 kernel, doing dd if=/dev/zero of=xxx bs=1M
> count=100 is immediately killed. In contrast, doing the same from a
> 3.0.8 kernel did the right thing for me. But I did reproduce your
> experiment below on ec2 with the same result.
>
> So, cc:ing linux-mm in the hopes someone can tell us whether this
> is expected behavior, known mis-behavior, or an unknown bug.
It's a known issue that was fixed/improved in e62e384 'memcg: prevent
OOM with too many dirty pages', included in 3.6+.
> > Summary: I have set up a system where I am using LXC to create multiple
> > virtualized containers on my system with limited resources. Unfortunately, I'm
> > running into a troublesome scenario where the OOM killer is hard killing
> > processes in my LXC container when I write a file with size exceeding the
> > memory limitation (set to 300MB). There appears to be some issue with the
> > file buffering respecting the containers memory limit.
> >
> >
> > Reproducing:
> >
> > /done on a c1.xlarge instance running on Amazon EC2
> >
> > Create 6 empty lxc containers (in my case I did lxc-create -n testcon -t
> > ubuntu -- -r precise)
> >
> > Modify the configuration of each container to set lxc.cgroup.memory.
> > limit_in_bytes = 300M
> >
> > Within each container run:
> > dd if=/dev/zero of=test2 bs=100k count=5010
> > parallel
> >
> > This will with high probability activate the OOM (as seen in demsg); often
> > the dd processes themselves will be killed.
> >
> > This has been verified to have problems on:
> > Linux 3.8.0-25-generic #37-Ubuntu SMP and Linux ip-10-8-139-98
> > 3.2.0-29-virtual #46-Ubuntu SMP Fri Jul 27 17:23:50 UTC 2012 x86_64 x86_64
> > x86_64 GNU/Linux
> >
> > Please let me know your thoughts.
> >
> > Regards,
> > Aaron Staley
> > _______________________________________________
> > Containers mailing list
> > Containers@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/containers
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
[not found] ` <20130701184503.GG17812-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
@ 2013-07-01 18:51 ` Aaron Staley
2013-07-01 19:02 ` Serge Hallyn
1 sibling, 0 replies; 23+ messages in thread
From: Aaron Staley @ 2013-07-01 18:51 UTC (permalink / raw)
To: Johannes Weiner
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
Serge Hallyn, Li Zefan, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
Michal Hocko, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Paul Menage
Hi Johannes,
It does appear to still be happening on Linux 3.8. Does it remain an open
issue?
Regards,
Aaron
On Mon, Jul 1, 2013 at 11:45 AM, Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
> On Mon, Jul 01, 2013 at 01:01:01PM -0500, Serge Hallyn wrote:
> > Quoting Aaron Staley (aaron-ZNH+RosXeVlBDgjK7y7TUQ@public.gmane.org):
> > > This is better explained here:
> > >
> http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> > > (The
> > > highest-voted answer believes this to be a kernel bug.)
> >
> > Hi,
> >
> > in irc it has been suggested that indeed the kernel should be slowing
> > down new page creates while waiting for old page cache entries to be
> > written out to disk, rather than ooming.
> >
> > With a 3.0.27-1-ac100 kernel, doing dd if=/dev/zero of=xxx bs=1M
> > count=100 is immediately killed. In contrast, doing the same from a
> > 3.0.8 kernel did the right thing for me. But I did reproduce your
> > experiment below on ec2 with the same result.
> >
> > So, cc:ing linux-mm in the hopes someone can tell us whether this
> > is expected behavior, known mis-behavior, or an unknown bug.
>
> It's a known issue that was fixed/improved in e62e384 'memcg: prevent
> OOM with too many dirty pages', included in 3.6+.
>
> > > Summary: I have set up a system where I am using LXC to create multiple
> > > virtualized containers on my system with limited resources.
> Unfortunately, I'm
> > > running into a troublesome scenario where the OOM killer is hard
> killing
> > > processes in my LXC container when I write a file with size exceeding
> the
> > > memory limitation (set to 300MB). There appears to be some issue with
> the
> > > file buffering respecting the containers memory limit.
> > >
> > >
> > > Reproducing:
> > >
> > > /done on a c1.xlarge instance running on Amazon EC2
> > >
> > > Create 6 empty lxc containers (in my case I did lxc-create -n testcon
> -t
> > > ubuntu -- -r precise)
> > >
> > > Modify the configuration of each container to set lxc.cgroup.memory.
> > > limit_in_bytes = 300M
> > >
> > > Within each container run:
> > > dd if=/dev/zero of=test2 bs=100k count=5010
> > > parallel
> > >
> > > This will with high probability activate the OOM (as seen in demsg);
> often
> > > the dd processes themselves will be killed.
> > >
> > > This has been verified to have problems on:
> > > Linux 3.8.0-25-generic #37-Ubuntu SMP and Linux ip-10-8-139-98
> > > 3.2.0-29-virtual #46-Ubuntu SMP Fri Jul 27 17:23:50 UTC 2012 x86_64
> x86_64
> > > x86_64 GNU/Linux
> > >
> > > Please let me know your thoughts.
> > >
> > > Regards,
> > > Aaron Staley
> > > _______________________________________________
> > > Containers mailing list
> > > Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > > https://lists.linuxfoundation.org/mailman/listinfo/containers
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>
>
--
Aaron Staley
*PiCloud, Inc.*
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
2013-07-01 18:45 ` Johannes Weiner
(?)
(?)
@ 2013-07-01 18:51 ` Aaron Staley
-1 siblings, 0 replies; 23+ messages in thread
From: Aaron Staley @ 2013-07-01 18:51 UTC (permalink / raw)
To: Johannes Weiner
Cc: Serge Hallyn, containers@lists.linux-foundation.org, Paul Menage,
Li Zefan, Michal Hocko, linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 3116 bytes --]
Hi Johannes,
It does appear to still be happening on Linux 3.8. Does it remain an open
issue?
Regards,
Aaron
On Mon, Jul 1, 2013 at 11:45 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Mon, Jul 01, 2013 at 01:01:01PM -0500, Serge Hallyn wrote:
> > Quoting Aaron Staley (aaron@picloud.com):
> > > This is better explained here:
> > >
> http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> > > (The
> > > highest-voted answer believes this to be a kernel bug.)
> >
> > Hi,
> >
> > in irc it has been suggested that indeed the kernel should be slowing
> > down new page creates while waiting for old page cache entries to be
> > written out to disk, rather than ooming.
> >
> > With a 3.0.27-1-ac100 kernel, doing dd if=/dev/zero of=xxx bs=1M
> > count=100 is immediately killed. In contrast, doing the same from a
> > 3.0.8 kernel did the right thing for me. But I did reproduce your
> > experiment below on ec2 with the same result.
> >
> > So, cc:ing linux-mm in the hopes someone can tell us whether this
> > is expected behavior, known mis-behavior, or an unknown bug.
>
> It's a known issue that was fixed/improved in e62e384 'memcg: prevent
> OOM with too many dirty pages', included in 3.6+.
>
> > > Summary: I have set up a system where I am using LXC to create multiple
> > > virtualized containers on my system with limited resources.
> Unfortunately, I'm
> > > running into a troublesome scenario where the OOM killer is hard
> killing
> > > processes in my LXC container when I write a file with size exceeding
> the
> > > memory limitation (set to 300MB). There appears to be some issue with
> the
> > > file buffering respecting the containers memory limit.
> > >
> > >
> > > Reproducing:
> > >
> > > /done on a c1.xlarge instance running on Amazon EC2
> > >
> > > Create 6 empty lxc containers (in my case I did lxc-create -n testcon
> -t
> > > ubuntu -- -r precise)
> > >
> > > Modify the configuration of each container to set lxc.cgroup.memory.
> > > limit_in_bytes = 300M
> > >
> > > Within each container run:
> > > dd if=/dev/zero of=test2 bs=100k count=5010
> > > parallel
> > >
> > > This will with high probability activate the OOM (as seen in demsg);
> often
> > > the dd processes themselves will be killed.
> > >
> > > This has been verified to have problems on:
> > > Linux 3.8.0-25-generic #37-Ubuntu SMP and Linux ip-10-8-139-98
> > > 3.2.0-29-virtual #46-Ubuntu SMP Fri Jul 27 17:23:50 UTC 2012 x86_64
> x86_64
> > > x86_64 GNU/Linux
> > >
> > > Please let me know your thoughts.
> > >
> > > Regards,
> > > Aaron Staley
> > > _______________________________________________
> > > Containers mailing list
> > > Containers@lists.linux-foundation.org
> > > https://lists.linuxfoundation.org/mailman/listinfo/containers
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
Aaron Staley
*PiCloud, Inc.*
[-- Attachment #2: Type: text/html, Size: 4730 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
2013-07-01 18:45 ` Johannes Weiner
(?)
@ 2013-07-01 19:02 ` Serge Hallyn
-1 siblings, 0 replies; 23+ messages in thread
From: Serge Hallyn @ 2013-07-01 19:02 UTC (permalink / raw)
To: Johannes Weiner
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Li Zefan,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Aaron Staley, Paul Menage
Quoting Johannes Weiner (hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org):
> On Mon, Jul 01, 2013 at 01:01:01PM -0500, Serge Hallyn wrote:
> > Quoting Aaron Staley (aaron-ZNH+RosXeVlBDgjK7y7TUQ@public.gmane.org):
> > > This is better explained here:
> > > http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> > > (The
> > > highest-voted answer believes this to be a kernel bug.)
> >
> > Hi,
> >
> > in irc it has been suggested that indeed the kernel should be slowing
> > down new page creates while waiting for old page cache entries to be
> > written out to disk, rather than ooming.
> >
> > With a 3.0.27-1-ac100 kernel, doing dd if=/dev/zero of=xxx bs=1M
> > count=100 is immediately killed. In contrast, doing the same from a
> > 3.0.8 kernel did the right thing for me. But I did reproduce your
> > experiment below on ec2 with the same result.
> >
> > So, cc:ing linux-mm in the hopes someone can tell us whether this
> > is expected behavior, known mis-behavior, or an unknown bug.
>
> It's a known issue that was fixed/improved in e62e384 'memcg: prevent
Ah ok, I see the commit says:
The solution is far from being ideal - long term solution is memcg aware
dirty throttling - but it is meant to be a band aid until we have a real
fix. We are seeing this happening during nightly backups which are placed
... and ...
The issue is more visible with slower devices for output.
I'm guessing we see it on ec2 because of slowed fs.
> OOM with too many dirty pages', included in 3.6+.
Is anyone actively working on the long term solution?
thanks,
-serge
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
@ 2013-07-01 19:02 ` Serge Hallyn
0 siblings, 0 replies; 23+ messages in thread
From: Serge Hallyn @ 2013-07-01 19:02 UTC (permalink / raw)
To: Johannes Weiner
Cc: Aaron Staley, containers, Paul Menage, Li Zefan, Michal Hocko,
linux-kernel, linux-mm
Quoting Johannes Weiner (hannes@cmpxchg.org):
> On Mon, Jul 01, 2013 at 01:01:01PM -0500, Serge Hallyn wrote:
> > Quoting Aaron Staley (aaron@picloud.com):
> > > This is better explained here:
> > > http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> > > (The
> > > highest-voted answer believes this to be a kernel bug.)
> >
> > Hi,
> >
> > in irc it has been suggested that indeed the kernel should be slowing
> > down new page creates while waiting for old page cache entries to be
> > written out to disk, rather than ooming.
> >
> > With a 3.0.27-1-ac100 kernel, doing dd if=/dev/zero of=xxx bs=1M
> > count=100 is immediately killed. In contrast, doing the same from a
> > 3.0.8 kernel did the right thing for me. But I did reproduce your
> > experiment below on ec2 with the same result.
> >
> > So, cc:ing linux-mm in the hopes someone can tell us whether this
> > is expected behavior, known mis-behavior, or an unknown bug.
>
> It's a known issue that was fixed/improved in e62e384 'memcg: prevent
Ah ok, I see the commit says:
The solution is far from being ideal - long term solution is memcg aware
dirty throttling - but it is meant to be a band aid until we have a real
fix. We are seeing this happening during nightly backups which are placed
... and ...
The issue is more visible with slower devices for output.
I'm guessing we see it on ec2 because of slowed fs.
> OOM with too many dirty pages', included in 3.6+.
Is anyone actively working on the long term solution?
thanks,
-serge
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
@ 2013-07-01 19:02 ` Serge Hallyn
0 siblings, 0 replies; 23+ messages in thread
From: Serge Hallyn @ 2013-07-01 19:02 UTC (permalink / raw)
To: Johannes Weiner
Cc: Aaron Staley, containers, Paul Menage, Li Zefan, Michal Hocko,
linux-kernel, linux-mm
Quoting Johannes Weiner (hannes@cmpxchg.org):
> On Mon, Jul 01, 2013 at 01:01:01PM -0500, Serge Hallyn wrote:
> > Quoting Aaron Staley (aaron@picloud.com):
> > > This is better explained here:
> > > http://serverfault.com/questions/516074/why-are-applications-in-a-memory-limited-lxc-container-writing-large-files-to-di
> > > (The
> > > highest-voted answer believes this to be a kernel bug.)
> >
> > Hi,
> >
> > in irc it has been suggested that indeed the kernel should be slowing
> > down new page creates while waiting for old page cache entries to be
> > written out to disk, rather than ooming.
> >
> > With a 3.0.27-1-ac100 kernel, doing dd if=/dev/zero of=xxx bs=1M
> > count=100 is immediately killed. In contrast, doing the same from a
> > 3.0.8 kernel did the right thing for me. But I did reproduce your
> > experiment below on ec2 with the same result.
> >
> > So, cc:ing linux-mm in the hopes someone can tell us whether this
> > is expected behavior, known mis-behavior, or an unknown bug.
>
> It's a known issue that was fixed/improved in e62e384 'memcg: prevent
Ah ok, I see the commit says:
The solution is far from being ideal - long term solution is memcg aware
dirty throttling - but it is meant to be a band aid until we have a real
fix. We are seeing this happening during nightly backups which are placed
... and ...
The issue is more visible with slower devices for output.
I'm guessing we see it on ec2 because of slowed fs.
> OOM with too many dirty pages', included in 3.6+.
Is anyone actively working on the long term solution?
thanks,
-serge
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
2013-07-01 19:02 ` Serge Hallyn
(?)
@ 2013-07-02 12:42 ` Michal Hocko
-1 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2013-07-02 12:42 UTC (permalink / raw)
To: Serge Hallyn
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Johannes Weiner, Li Zefan, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Aaron Staley, Paul Menage
On Mon 01-07-13 14:02:22, Serge Hallyn wrote:
> Quoting Johannes Weiner (hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org):
[...]
> > OOM with too many dirty pages', included in 3.6+.
>
> Is anyone actively working on the long term solution?
Patches for memcg dirty pages accounted were posted quite some time ago.
I plan to look at the at some point but I am rather busy with other
stuff right now. That would be just a first step though. Then we need to
hook into dirty pages throttling and make it memcg aware which sounds
like a bigger challenge.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
@ 2013-07-02 12:42 ` Michal Hocko
0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2013-07-02 12:42 UTC (permalink / raw)
To: Serge Hallyn
Cc: Johannes Weiner, Aaron Staley, containers, Paul Menage, Li Zefan,
linux-kernel, linux-mm
On Mon 01-07-13 14:02:22, Serge Hallyn wrote:
> Quoting Johannes Weiner (hannes@cmpxchg.org):
[...]
> > OOM with too many dirty pages', included in 3.6+.
>
> Is anyone actively working on the long term solution?
Patches for memcg dirty pages accounted were posted quite some time ago.
I plan to look at the at some point but I am rather busy with other
stuff right now. That would be just a first step though. Then we need to
hook into dirty pages throttling and make it memcg aware which sounds
like a bigger challenge.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM
@ 2013-07-02 12:42 ` Michal Hocko
0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2013-07-02 12:42 UTC (permalink / raw)
To: Serge Hallyn
Cc: Johannes Weiner, Aaron Staley, containers, Paul Menage, Li Zefan,
linux-kernel, linux-mm
On Mon 01-07-13 14:02:22, Serge Hallyn wrote:
> Quoting Johannes Weiner (hannes@cmpxchg.org):
[...]
> > OOM with too many dirty pages', included in 3.6+.
>
> Is anyone actively working on the long term solution?
Patches for memcg dirty pages accounted were posted quite some time ago.
I plan to look at the at some point but I am rather busy with other
stuff right now. That would be just a first step though. Then we need to
hook into dirty pages throttling and make it memcg aware which sounds
like a bigger challenge.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2013-07-02 12:42 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-25 5:45 PROBLEM: Processes writing large files in memory-limited LXC container are killed by OOM Aaron Staley
[not found] ` <CAMcjixYa-mjo5TrxmtBkr0MOf+8r_iSeW5MF4c8nJKdp5m+RPA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-06-25 13:24 ` Serge Hallyn
2013-06-25 20:54 ` Aaron Staley
[not found] ` <CAMcjixZtW0KdAmLXyDoGFqL2qh4aEgdGjBYJWrPESFMz0dvELw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-06-27 5:12 ` Zhu Yanhai
[not found] ` <CAC8teKXNof5PdU2dMz4iZ8tM=Dx=aZ0efxotDhdc-xopeK-+Xg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-06-27 5:19 ` Aaron Staley
[not found] ` <CAMcjixZdXWGezcVQPPo3M0yNBx4TtbiOoqYJQWvNbTyqULifUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-06-27 20:51 ` Aristeu Rozanski
2013-07-01 15:21 ` Serge Hallyn
2013-07-01 15:27 ` Aaron Staley
2013-06-28 18:52 ` Serge E. Hallyn
2013-07-01 18:01 ` Serge Hallyn
2013-07-01 18:01 ` Serge Hallyn
2013-07-01 18:01 ` Serge Hallyn
2013-07-01 18:45 ` Johannes Weiner
2013-07-01 18:45 ` Johannes Weiner
[not found] ` <20130701184503.GG17812-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-07-01 18:51 ` Aaron Staley
2013-07-01 19:02 ` Serge Hallyn
2013-07-01 19:02 ` Serge Hallyn
2013-07-01 19:02 ` Serge Hallyn
2013-07-02 12:42 ` Michal Hocko
2013-07-02 12:42 ` Michal Hocko
2013-07-02 12:42 ` Michal Hocko
2013-07-01 18:51 ` Aaron Staley
2013-07-01 18:45 ` Johannes Weiner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.