linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [BUG] mm/cgroupv2: memory.min may lead to an OOM error
@ 2024-08-01  4:54 Lance Yang
  2024-08-01 10:35 ` Vlastimil Babka (SUSE)
  0 siblings, 1 reply; 5+ messages in thread
From: Lance Yang @ 2024-08-01  4:54 UTC (permalink / raw)
  To: akpm
  Cc: 21cnbao, ryan.roberts, david, shy828301, ziy, libang.li,
	baolin.wang, linux-kernel, linux-mm, Lance Yang

Hi all,

It's possible to encounter an OOM error if both parent and child cgroups are
configured such that memory.min and memory.max are set to the same values, as
is practice in Kubernetes.

Hmm... I'm not sure that whether this behavior is a bug or an expected aspect of
the kernel design.

To reproduce the bug, we can follow these command-based steps:

1. Check Kernel Version and OS release:
    
    ```
    $ uname -r
    6.10.0-rc5+
    
    $ cat /etc/os-release
    PRETTY_NAME="Ubuntu 24.04 LTS"
    NAME="Ubuntu"
    VERSION_ID="24.04"
    VERSION="24.04 LTS (Noble Numbat)"
    VERSION_CODENAME=noble
    ID=ubuntu
    ID_LIKE=debian
    HOME_URL="<https://www.ubuntu.com/>"
    SUPPORT_URL="<https://help.ubuntu.com/>"
    BUG_REPORT_URL="<https://bugs.launchpad.net/ubuntu/>"
    PRIVACY_POLICY_URL="<https://www.ubuntu.com/legal/terms-and-policies/privacy-policy>"
    UBUNTU_CODENAME=noble
    LOGO=ubuntu-logo
    
    ```
    
2. Navigate to the cgroup v2 filesystem, create a test cgroup, and set memory settings:
    
    ```
    $ cd /sys/fs/cgroup/
    $ stat -fc %T /sys/fs/cgroup
    cgroup2fs
    $ mkdir test
    $ echo "+memory" > cgroup.subtree_control
    $ mkdir test/test-child
    $ echo 1073741824 > memory.max
    $ echo 1073741824 > memory.min
    $ cat memory.max
    1073741824
    $ cat memory.min
    1073741824
    $ cat memory.low
    0
    $ cat memory.high
    max
    ```
    
3. Set up and check memory settings in the child cgroup:
    
    ```
    $ cd test-child
    $ echo 1073741824 > memory.max
    $ echo 1073741824 > memory.min
    $ cat memory.max
    1073741824
    $ cat memory.min
    1073741824
    $ cat memory.low
    0
    $ cat memory.high
    max
    ```
    
4. Add process to the child cgroup and verify:
    
    ```
    $ echo $$ > cgroup.procs
    $ cat cgroup.procs
    1131
    1320
    $ ps -ef|grep 1131
    root        1131    1014  0 10:45 pts/0    00:00:00 -bash
    root        1321    1131 99 11:06 pts/0    00:00:00 ps -ef
    root        1322    1131  0 11:06 pts/0    00:00:00 grep --color=auto 1131
    ```
    
5. Attempt to create a large file using dd and observe the process being killed:
    
    ```
    $ dd if=/dev/zero of=/tmp/2gbfile bs=10M count=200
    Killed
    ```
    
6. Check kernel messages related to the OOM event:
    
    ```
    $ dmesg
    ...
    [ 1341.112388] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test,task_memcg=/test/test-child,task=dd,pid=1324,uid=0
    [ 1341.112418] Memory cgroup out of memory: Killed process 1324 (dd) total-vm:15548kB, anon-rss:10240kB, file-rss:1764kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:0
    ```
    
7. Reduce the `memory.min` setting in the child cgroup and attempt the same large file creation, and then this issue is resolved.
    
    ```
    # echo 107374182 > memory.min
    # dd if=/dev/zero of=/tmp/2gbfile bs=10M count=200
    200+0 records in
    200+0 records out
    2097152000 bytes (2.1 GB, 2.0 GiB) copied, 1.8713 s, 1.1 GB/s
    ```

Thanks,
Lance


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] mm/cgroupv2: memory.min may lead to an OOM error
  2024-08-01  4:54 [BUG] mm/cgroupv2: memory.min may lead to an OOM error Lance Yang
@ 2024-08-01 10:35 ` Vlastimil Babka (SUSE)
  2024-08-01 11:40   ` Lance Yang
  0 siblings, 1 reply; 5+ messages in thread
From: Vlastimil Babka (SUSE) @ 2024-08-01 10:35 UTC (permalink / raw)
  To: Lance Yang, akpm
  Cc: 21cnbao, ryan.roberts, david, shy828301, ziy, libang.li,
	baolin.wang, linux-kernel, linux-mm, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Cgroups

On 8/1/24 06:54, Lance Yang wrote:
> Hi all,
> 
> It's possible to encounter an OOM error if both parent and child cgroups are
> configured such that memory.min and memory.max are set to the same values, as
> is practice in Kubernetes.

Is it a practice in Kubernetes since forever or a recent one? Did it work
differently before?

> Hmm... I'm not sure that whether this behavior is a bug or an expected aspect of
> the kernel design.

Hmm I'm not a memcg expert, so I cc'd some.

> To reproduce the bug, we can follow these command-based steps:
> 
> 1. Check Kernel Version and OS release:
>     
>     ```
>     $ uname -r
>     6.10.0-rc5+

Were older kernels behaving the same?

Anyway memory.min documentations says "Hard memory protection. If the memory
usage of a cgroup is within its effective min boundary, the cgroup’s memory
won’t be reclaimed under any conditions. If there is no unprotected
reclaimable memory available, OOM killer is invoked."

So to my non-expert opinion this behavior seems valid. if you set min to the
same value as max and then reach the max, you effectively don't allow any
reclaim, so the memcg OOM kill is the only option AFAICS?

>     $ cat /etc/os-release
>     PRETTY_NAME="Ubuntu 24.04 LTS"
>     NAME="Ubuntu"
>     VERSION_ID="24.04"
>     VERSION="24.04 LTS (Noble Numbat)"
>     VERSION_CODENAME=noble
>     ID=ubuntu
>     ID_LIKE=debian
>     HOME_URL="<https://www.ubuntu.com/>"
>     SUPPORT_URL="<https://help.ubuntu.com/>"
>     BUG_REPORT_URL="<https://bugs.launchpad.net/ubuntu/>"
>     PRIVACY_POLICY_URL="<https://www.ubuntu.com/legal/terms-and-policies/privacy-policy>"
>     UBUNTU_CODENAME=noble
>     LOGO=ubuntu-logo
>     
>     ```
>     
> 2. Navigate to the cgroup v2 filesystem, create a test cgroup, and set memory settings:
>     
>     ```
>     $ cd /sys/fs/cgroup/
>     $ stat -fc %T /sys/fs/cgroup
>     cgroup2fs
>     $ mkdir test
>     $ echo "+memory" > cgroup.subtree_control
>     $ mkdir test/test-child
>     $ echo 1073741824 > memory.max
>     $ echo 1073741824 > memory.min
>     $ cat memory.max
>     1073741824
>     $ cat memory.min
>     1073741824
>     $ cat memory.low
>     0
>     $ cat memory.high
>     max
>     ```
>     
> 3. Set up and check memory settings in the child cgroup:
>     
>     ```
>     $ cd test-child
>     $ echo 1073741824 > memory.max
>     $ echo 1073741824 > memory.min
>     $ cat memory.max
>     1073741824
>     $ cat memory.min
>     1073741824
>     $ cat memory.low
>     0
>     $ cat memory.high
>     max
>     ```
>     
> 4. Add process to the child cgroup and verify:
>     
>     ```
>     $ echo $$ > cgroup.procs
>     $ cat cgroup.procs
>     1131
>     1320
>     $ ps -ef|grep 1131
>     root        1131    1014  0 10:45 pts/0    00:00:00 -bash
>     root        1321    1131 99 11:06 pts/0    00:00:00 ps -ef
>     root        1322    1131  0 11:06 pts/0    00:00:00 grep --color=auto 1131
>     ```
>     
> 5. Attempt to create a large file using dd and observe the process being killed:
>     
>     ```
>     $ dd if=/dev/zero of=/tmp/2gbfile bs=10M count=200
>     Killed
>     ```
>     
> 6. Check kernel messages related to the OOM event:
>     
>     ```
>     $ dmesg
>     ...
>     [ 1341.112388] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test,task_memcg=/test/test-child,task=dd,pid=1324,uid=0
>     [ 1341.112418] Memory cgroup out of memory: Killed process 1324 (dd) total-vm:15548kB, anon-rss:10240kB, file-rss:1764kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:0
>     ```
>     
> 7. Reduce the `memory.min` setting in the child cgroup and attempt the same large file creation, and then this issue is resolved.
>     
>     ```
>     # echo 107374182 > memory.min
>     # dd if=/dev/zero of=/tmp/2gbfile bs=10M count=200
>     200+0 records in
>     200+0 records out
>     2097152000 bytes (2.1 GB, 2.0 GiB) copied, 1.8713 s, 1.1 GB/s
>     ```
> 
> Thanks,
> Lance
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] mm/cgroupv2: memory.min may lead to an OOM error
  2024-08-01 10:35 ` Vlastimil Babka (SUSE)
@ 2024-08-01 11:40   ` Lance Yang
  2024-08-01 22:58     ` Michal Koutný
  0 siblings, 1 reply; 5+ messages in thread
From: Lance Yang @ 2024-08-01 11:40 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE)
  Cc: akpm, 21cnbao, ryan.roberts, david, shy828301, ziy, libang.li,
	baolin.wang, linux-kernel, linux-mm, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Cgroups

Hi Vlastimil,

Thanks a lot for paying attention!

On Thu, Aug 1, 2024 at 6:35 PM Vlastimil Babka (SUSE) <vbabka@kernel.org> wrote:
>
> On 8/1/24 06:54, Lance Yang wrote:
> > Hi all,
> >
> > It's possible to encounter an OOM error if both parent and child cgroups are
> > configured such that memory.min and memory.max are set to the same values, as
> > is practice in Kubernetes.
>
> Is it a practice in Kubernetes since forever or a recent one? Did it work
> differently before?

The memory.min is only applied when the Kubernetes memory QoS feature gate
is enabled, which is disabled by default.

>
> > Hmm... I'm not sure that whether this behavior is a bug or an expected aspect of
> > the kernel design.
>
> Hmm I'm not a memcg expert, so I cc'd some.
>
> > To reproduce the bug, we can follow these command-based steps:
> >
> > 1. Check Kernel Version and OS release:
> >
> >     ```
> >     $ uname -r
> >     6.10.0-rc5+
>
> Were older kernels behaving the same?

I tested another machine and it behaved the same way.

# uname -r
5.14.0-427.24.1.el9_4.x86_64

# cat /etc/os-release
NAME="Rocky Linux"
VERSION="9.4 (Blue Onyx)"
...

>
> Anyway memory.min documentations says "Hard memory protection. If the memory
> usage of a cgroup is within its effective min boundary, the cgroup’s memory
> won’t be reclaimed under any conditions. If there is no unprotected
> reclaimable memory available, OOM killer is invoked."
>
> So to my non-expert opinion this behavior seems valid. if you set min to the
> same value as max and then reach the max, you effectively don't allow any
> reclaim, so the memcg OOM kill is the only option AFAICS?

I completely agree that this behavior seems valid ;)

However, if the child cgroup doesn't exist and we add a process to the 'test'
cgroup, then attempt to create a large file(2GB) using dd, we won't encounter
an OOM error; everything works as expected.

Hmm... I'm a bit confused about that.

Thanks,
Lance

>
> >     $ cat /etc/os-release
> >     PRETTY_NAME="Ubuntu 24.04 LTS"
> >     NAME="Ubuntu"
> >     VERSION_ID="24.04"
> >     VERSION="24.04 LTS (Noble Numbat)"
> >     VERSION_CODENAME=noble
> >     ID=ubuntu
> >     ID_LIKE=debian
> >     HOME_URL="<https://www.ubuntu.com/>"
> >     SUPPORT_URL="<https://help.ubuntu.com/>"
> >     BUG_REPORT_URL="<https://bugs.launchpad.net/ubuntu/>"
> >     PRIVACY_POLICY_URL="<https://www.ubuntu.com/legal/terms-and-policies/privacy-policy>"
> >     UBUNTU_CODENAME=noble
> >     LOGO=ubuntu-logo
> >
> >     ```
> >
> > 2. Navigate to the cgroup v2 filesystem, create a test cgroup, and set memory settings:
> >
> >     ```
> >     $ cd /sys/fs/cgroup/
> >     $ stat -fc %T /sys/fs/cgroup
> >     cgroup2fs
> >     $ mkdir test
> >     $ echo "+memory" > cgroup.subtree_control
> >     $ mkdir test/test-child
> >     $ echo 1073741824 > memory.max
> >     $ echo 1073741824 > memory.min
> >     $ cat memory.max
> >     1073741824
> >     $ cat memory.min
> >     1073741824
> >     $ cat memory.low
> >     0
> >     $ cat memory.high
> >     max
> >     ```
> >
> > 3. Set up and check memory settings in the child cgroup:
> >
> >     ```
> >     $ cd test-child
> >     $ echo 1073741824 > memory.max
> >     $ echo 1073741824 > memory.min
> >     $ cat memory.max
> >     1073741824
> >     $ cat memory.min
> >     1073741824
> >     $ cat memory.low
> >     0
> >     $ cat memory.high
> >     max
> >     ```
> >
> > 4. Add process to the child cgroup and verify:
> >
> >     ```
> >     $ echo $$ > cgroup.procs
> >     $ cat cgroup.procs
> >     1131
> >     1320
> >     $ ps -ef|grep 1131
> >     root        1131    1014  0 10:45 pts/0    00:00:00 -bash
> >     root        1321    1131 99 11:06 pts/0    00:00:00 ps -ef
> >     root        1322    1131  0 11:06 pts/0    00:00:00 grep --color=auto 1131
> >     ```
> >
> > 5. Attempt to create a large file using dd and observe the process being killed:
> >
> >     ```
> >     $ dd if=/dev/zero of=/tmp/2gbfile bs=10M count=200
> >     Killed
> >     ```
> >
> > 6. Check kernel messages related to the OOM event:
> >
> >     ```
> >     $ dmesg
> >     ...
> >     [ 1341.112388] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test,task_memcg=/test/test-child,task=dd,pid=1324,uid=0
> >     [ 1341.112418] Memory cgroup out of memory: Killed process 1324 (dd) total-vm:15548kB, anon-rss:10240kB, file-rss:1764kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:0
> >     ```
> >
> > 7. Reduce the `memory.min` setting in the child cgroup and attempt the same large file creation, and then this issue is resolved.
> >
> >     ```
> >     # echo 107374182 > memory.min
> >     # dd if=/dev/zero of=/tmp/2gbfile bs=10M count=200
> >     200+0 records in
> >     200+0 records out
> >     2097152000 bytes (2.1 GB, 2.0 GiB) copied, 1.8713 s, 1.1 GB/s
> >     ```
> >
> > Thanks,
> > Lance
> >
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] mm/cgroupv2: memory.min may lead to an OOM error
  2024-08-01 11:40   ` Lance Yang
@ 2024-08-01 22:58     ` Michal Koutný
  2024-08-02  1:56       ` Lance Yang
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Koutný @ 2024-08-01 22:58 UTC (permalink / raw)
  To: Lance Yang
  Cc: Vlastimil Babka (SUSE), akpm, 21cnbao, ryan.roberts, david,
	shy828301, ziy, libang.li, baolin.wang, linux-kernel, linux-mm,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Cgroups

[-- Attachment #1: Type: text/plain, Size: 1516 bytes --]

Hello.

On Thu, Aug 01, 2024 at 07:40:10PM GMT, Lance Yang <ioworker0@gmail.com> wrote:
> However, if the child cgroup doesn't exist and we add a process to the 'test'
> cgroup, then attempt to create a large file(2GB) using dd, we won't encounter
> an OOM error; everything works as expected.

That's due to the way how effective protections are calculated, see [1].
If reclaim target is cgroup T, then it won't enjoy protection configured
on itself, whereas the child of T is subject of ancestral reclaim hence
the protection applies.

That would mean that in your 1st demo, it is test/memory.max that
triggers reclaim and then failure to reclaim from test/test-child causes
OOM in test.
That's interesting since the (same) limit of test-child/memory.max
should be evaluated first. I guess it is in your example there are
actually two parallel processes (1321 and 1324) so some charges may
randomly propagate to the upper test/memory.max limit.

As explained above, the 2nd demo has same reclaim target but due to no
nesting, protection is moot.
I believe you could reproduce with merely

	test/memory.max
	test-child/memory.min

> Hmm... I'm a bit confused about that.

I agree, the calculation of effective protection wrt reclaim target can
be confusing.

The effects you see are documented for memory.min:

> Putting more memory than generally available under this
> protection is discouraged and may lead to constant OOMs.

HTH,
Michal

[1] https://lore.kernel.org/all/20200729140537.13345-2-mkoutny@suse.com/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] mm/cgroupv2: memory.min may lead to an OOM error
  2024-08-01 22:58     ` Michal Koutný
@ 2024-08-02  1:56       ` Lance Yang
  0 siblings, 0 replies; 5+ messages in thread
From: Lance Yang @ 2024-08-02  1:56 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Vlastimil Babka (SUSE), akpm, 21cnbao, ryan.roberts, david,
	shy828301, ziy, libang.li, baolin.wang, linux-kernel, linux-mm,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Cgroups

Hi Michal,

Thanks a lot for clarifying!

On Fri, Aug 2, 2024 at 6:58 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> Hello.
>
> On Thu, Aug 01, 2024 at 07:40:10PM GMT, Lance Yang <ioworker0@gmail.com> wrote:
> > However, if the child cgroup doesn't exist and we add a process to the 'test'
> > cgroup, then attempt to create a large file(2GB) using dd, we won't encounter
> > an OOM error; everything works as expected.
>
> That's due to the way how effective protections are calculated, see [1].
> If reclaim target is cgroup T, then it won't enjoy protection configured
> on itself, whereas the child of T is subject of ancestral reclaim hence
> the protection applies.

Makes sense to me.

>
> That would mean that in your 1st demo, it is test/memory.max that
> triggers reclaim and then failure to reclaim from test/test-child causes
> OOM in test.
> That's interesting since the (same) limit of test-child/memory.max
> should be evaluated first. I guess it is in your example there are
> actually two parallel processes (1321 and 1324) so some charges may
> randomly propagate to the upper test/memory.max limit.
>
> As explained above, the 2nd demo has same reclaim target but due to no
> nesting, protection is moot.

Ah, that clears it up. I appreciate the detailed explanation - thanks!

> I believe you could reproduce with merely
>
>         test/memory.max
>         test-child/memory.min

Yep, I just tested it, and you're right ;)

>
> > Hmm... I'm a bit confused about that.
>
> I agree, the calculation of effective protection wrt reclaim target can
> be confusing.
>
> The effects you see are documented for memory.min:
>
> > Putting more memory than generally available under this
> > protection is discouraged and may lead to constant OOMs.

Thanks a lot again for your time!
Lance

>
> HTH,
> Michal
>
> [1] https://lore.kernel.org/all/20200729140537.13345-2-mkoutny@suse.com/


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-08-02  1:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-01  4:54 [BUG] mm/cgroupv2: memory.min may lead to an OOM error Lance Yang
2024-08-01 10:35 ` Vlastimil Babka (SUSE)
2024-08-01 11:40   ` Lance Yang
2024-08-01 22:58     ` Michal Koutný
2024-08-02  1:56       ` Lance Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).