* [BUG] mm/cgroupv2: memory.min may lead to an OOM error
@ 2024-08-01 4:54 Lance Yang
2024-08-01 10:35 ` Vlastimil Babka (SUSE)
0 siblings, 1 reply; 5+ messages in thread
From: Lance Yang @ 2024-08-01 4:54 UTC (permalink / raw)
To: akpm
Cc: 21cnbao, ryan.roberts, david, shy828301, ziy, libang.li,
baolin.wang, linux-kernel, linux-mm, Lance Yang
Hi all,
It's possible to encounter an OOM error if both parent and child cgroups are
configured such that memory.min and memory.max are set to the same values, as
is practice in Kubernetes.
Hmm... I'm not sure that whether this behavior is a bug or an expected aspect of
the kernel design.
To reproduce the bug, we can follow these command-based steps:
1. Check Kernel Version and OS release:
```
$ uname -r
6.10.0-rc5+
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="<https://www.ubuntu.com/>"
SUPPORT_URL="<https://help.ubuntu.com/>"
BUG_REPORT_URL="<https://bugs.launchpad.net/ubuntu/>"
PRIVACY_POLICY_URL="<https://www.ubuntu.com/legal/terms-and-policies/privacy-policy>"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
```
2. Navigate to the cgroup v2 filesystem, create a test cgroup, and set memory settings:
```
$ cd /sys/fs/cgroup/
$ stat -fc %T /sys/fs/cgroup
cgroup2fs
$ mkdir test
$ echo "+memory" > cgroup.subtree_control
$ mkdir test/test-child
$ echo 1073741824 > memory.max
$ echo 1073741824 > memory.min
$ cat memory.max
1073741824
$ cat memory.min
1073741824
$ cat memory.low
0
$ cat memory.high
max
```
3. Set up and check memory settings in the child cgroup:
```
$ cd test-child
$ echo 1073741824 > memory.max
$ echo 1073741824 > memory.min
$ cat memory.max
1073741824
$ cat memory.min
1073741824
$ cat memory.low
0
$ cat memory.high
max
```
4. Add process to the child cgroup and verify:
```
$ echo $$ > cgroup.procs
$ cat cgroup.procs
1131
1320
$ ps -ef|grep 1131
root 1131 1014 0 10:45 pts/0 00:00:00 -bash
root 1321 1131 99 11:06 pts/0 00:00:00 ps -ef
root 1322 1131 0 11:06 pts/0 00:00:00 grep --color=auto 1131
```
5. Attempt to create a large file using dd and observe the process being killed:
```
$ dd if=/dev/zero of=/tmp/2gbfile bs=10M count=200
Killed
```
6. Check kernel messages related to the OOM event:
```
$ dmesg
...
[ 1341.112388] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test,task_memcg=/test/test-child,task=dd,pid=1324,uid=0
[ 1341.112418] Memory cgroup out of memory: Killed process 1324 (dd) total-vm:15548kB, anon-rss:10240kB, file-rss:1764kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:0
```
7. Reduce the `memory.min` setting in the child cgroup and attempt the same large file creation, and then this issue is resolved.
```
# echo 107374182 > memory.min
# dd if=/dev/zero of=/tmp/2gbfile bs=10M count=200
200+0 records in
200+0 records out
2097152000 bytes (2.1 GB, 2.0 GiB) copied, 1.8713 s, 1.1 GB/s
```
Thanks,
Lance
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [BUG] mm/cgroupv2: memory.min may lead to an OOM error 2024-08-01 4:54 [BUG] mm/cgroupv2: memory.min may lead to an OOM error Lance Yang @ 2024-08-01 10:35 ` Vlastimil Babka (SUSE) 2024-08-01 11:40 ` Lance Yang 0 siblings, 1 reply; 5+ messages in thread From: Vlastimil Babka (SUSE) @ 2024-08-01 10:35 UTC (permalink / raw) To: Lance Yang, akpm Cc: 21cnbao, ryan.roberts, david, shy828301, ziy, libang.li, baolin.wang, linux-kernel, linux-mm, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Cgroups On 8/1/24 06:54, Lance Yang wrote: > Hi all, > > It's possible to encounter an OOM error if both parent and child cgroups are > configured such that memory.min and memory.max are set to the same values, as > is practice in Kubernetes. Is it a practice in Kubernetes since forever or a recent one? Did it work differently before? > Hmm... I'm not sure that whether this behavior is a bug or an expected aspect of > the kernel design. Hmm I'm not a memcg expert, so I cc'd some. > To reproduce the bug, we can follow these command-based steps: > > 1. Check Kernel Version and OS release: > > ``` > $ uname -r > 6.10.0-rc5+ Were older kernels behaving the same? Anyway memory.min documentations says "Hard memory protection. If the memory usage of a cgroup is within its effective min boundary, the cgroup’s memory won’t be reclaimed under any conditions. If there is no unprotected reclaimable memory available, OOM killer is invoked." So to my non-expert opinion this behavior seems valid. if you set min to the same value as max and then reach the max, you effectively don't allow any reclaim, so the memcg OOM kill is the only option AFAICS? > $ cat /etc/os-release > PRETTY_NAME="Ubuntu 24.04 LTS" > NAME="Ubuntu" > VERSION_ID="24.04" > VERSION="24.04 LTS (Noble Numbat)" > VERSION_CODENAME=noble > ID=ubuntu > ID_LIKE=debian > HOME_URL="<https://www.ubuntu.com/>" > SUPPORT_URL="<https://help.ubuntu.com/>" > BUG_REPORT_URL="<https://bugs.launchpad.net/ubuntu/>" > PRIVACY_POLICY_URL="<https://www.ubuntu.com/legal/terms-and-policies/privacy-policy>" > UBUNTU_CODENAME=noble > LOGO=ubuntu-logo > > ``` > > 2. Navigate to the cgroup v2 filesystem, create a test cgroup, and set memory settings: > > ``` > $ cd /sys/fs/cgroup/ > $ stat -fc %T /sys/fs/cgroup > cgroup2fs > $ mkdir test > $ echo "+memory" > cgroup.subtree_control > $ mkdir test/test-child > $ echo 1073741824 > memory.max > $ echo 1073741824 > memory.min > $ cat memory.max > 1073741824 > $ cat memory.min > 1073741824 > $ cat memory.low > 0 > $ cat memory.high > max > ``` > > 3. Set up and check memory settings in the child cgroup: > > ``` > $ cd test-child > $ echo 1073741824 > memory.max > $ echo 1073741824 > memory.min > $ cat memory.max > 1073741824 > $ cat memory.min > 1073741824 > $ cat memory.low > 0 > $ cat memory.high > max > ``` > > 4. Add process to the child cgroup and verify: > > ``` > $ echo $$ > cgroup.procs > $ cat cgroup.procs > 1131 > 1320 > $ ps -ef|grep 1131 > root 1131 1014 0 10:45 pts/0 00:00:00 -bash > root 1321 1131 99 11:06 pts/0 00:00:00 ps -ef > root 1322 1131 0 11:06 pts/0 00:00:00 grep --color=auto 1131 > ``` > > 5. Attempt to create a large file using dd and observe the process being killed: > > ``` > $ dd if=/dev/zero of=/tmp/2gbfile bs=10M count=200 > Killed > ``` > > 6. Check kernel messages related to the OOM event: > > ``` > $ dmesg > ... > [ 1341.112388] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test,task_memcg=/test/test-child,task=dd,pid=1324,uid=0 > [ 1341.112418] Memory cgroup out of memory: Killed process 1324 (dd) total-vm:15548kB, anon-rss:10240kB, file-rss:1764kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:0 > ``` > > 7. Reduce the `memory.min` setting in the child cgroup and attempt the same large file creation, and then this issue is resolved. > > ``` > # echo 107374182 > memory.min > # dd if=/dev/zero of=/tmp/2gbfile bs=10M count=200 > 200+0 records in > 200+0 records out > 2097152000 bytes (2.1 GB, 2.0 GiB) copied, 1.8713 s, 1.1 GB/s > ``` > > Thanks, > Lance > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] mm/cgroupv2: memory.min may lead to an OOM error 2024-08-01 10:35 ` Vlastimil Babka (SUSE) @ 2024-08-01 11:40 ` Lance Yang 2024-08-01 22:58 ` Michal Koutný 0 siblings, 1 reply; 5+ messages in thread From: Lance Yang @ 2024-08-01 11:40 UTC (permalink / raw) To: Vlastimil Babka (SUSE) Cc: akpm, 21cnbao, ryan.roberts, david, shy828301, ziy, libang.li, baolin.wang, linux-kernel, linux-mm, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Cgroups Hi Vlastimil, Thanks a lot for paying attention! On Thu, Aug 1, 2024 at 6:35 PM Vlastimil Babka (SUSE) <vbabka@kernel.org> wrote: > > On 8/1/24 06:54, Lance Yang wrote: > > Hi all, > > > > It's possible to encounter an OOM error if both parent and child cgroups are > > configured such that memory.min and memory.max are set to the same values, as > > is practice in Kubernetes. > > Is it a practice in Kubernetes since forever or a recent one? Did it work > differently before? The memory.min is only applied when the Kubernetes memory QoS feature gate is enabled, which is disabled by default. > > > Hmm... I'm not sure that whether this behavior is a bug or an expected aspect of > > the kernel design. > > Hmm I'm not a memcg expert, so I cc'd some. > > > To reproduce the bug, we can follow these command-based steps: > > > > 1. Check Kernel Version and OS release: > > > > ``` > > $ uname -r > > 6.10.0-rc5+ > > Were older kernels behaving the same? I tested another machine and it behaved the same way. # uname -r 5.14.0-427.24.1.el9_4.x86_64 # cat /etc/os-release NAME="Rocky Linux" VERSION="9.4 (Blue Onyx)" ... > > Anyway memory.min documentations says "Hard memory protection. If the memory > usage of a cgroup is within its effective min boundary, the cgroup’s memory > won’t be reclaimed under any conditions. If there is no unprotected > reclaimable memory available, OOM killer is invoked." > > So to my non-expert opinion this behavior seems valid. if you set min to the > same value as max and then reach the max, you effectively don't allow any > reclaim, so the memcg OOM kill is the only option AFAICS? I completely agree that this behavior seems valid ;) However, if the child cgroup doesn't exist and we add a process to the 'test' cgroup, then attempt to create a large file(2GB) using dd, we won't encounter an OOM error; everything works as expected. Hmm... I'm a bit confused about that. Thanks, Lance > > > $ cat /etc/os-release > > PRETTY_NAME="Ubuntu 24.04 LTS" > > NAME="Ubuntu" > > VERSION_ID="24.04" > > VERSION="24.04 LTS (Noble Numbat)" > > VERSION_CODENAME=noble > > ID=ubuntu > > ID_LIKE=debian > > HOME_URL="<https://www.ubuntu.com/>" > > SUPPORT_URL="<https://help.ubuntu.com/>" > > BUG_REPORT_URL="<https://bugs.launchpad.net/ubuntu/>" > > PRIVACY_POLICY_URL="<https://www.ubuntu.com/legal/terms-and-policies/privacy-policy>" > > UBUNTU_CODENAME=noble > > LOGO=ubuntu-logo > > > > ``` > > > > 2. Navigate to the cgroup v2 filesystem, create a test cgroup, and set memory settings: > > > > ``` > > $ cd /sys/fs/cgroup/ > > $ stat -fc %T /sys/fs/cgroup > > cgroup2fs > > $ mkdir test > > $ echo "+memory" > cgroup.subtree_control > > $ mkdir test/test-child > > $ echo 1073741824 > memory.max > > $ echo 1073741824 > memory.min > > $ cat memory.max > > 1073741824 > > $ cat memory.min > > 1073741824 > > $ cat memory.low > > 0 > > $ cat memory.high > > max > > ``` > > > > 3. Set up and check memory settings in the child cgroup: > > > > ``` > > $ cd test-child > > $ echo 1073741824 > memory.max > > $ echo 1073741824 > memory.min > > $ cat memory.max > > 1073741824 > > $ cat memory.min > > 1073741824 > > $ cat memory.low > > 0 > > $ cat memory.high > > max > > ``` > > > > 4. Add process to the child cgroup and verify: > > > > ``` > > $ echo $$ > cgroup.procs > > $ cat cgroup.procs > > 1131 > > 1320 > > $ ps -ef|grep 1131 > > root 1131 1014 0 10:45 pts/0 00:00:00 -bash > > root 1321 1131 99 11:06 pts/0 00:00:00 ps -ef > > root 1322 1131 0 11:06 pts/0 00:00:00 grep --color=auto 1131 > > ``` > > > > 5. Attempt to create a large file using dd and observe the process being killed: > > > > ``` > > $ dd if=/dev/zero of=/tmp/2gbfile bs=10M count=200 > > Killed > > ``` > > > > 6. Check kernel messages related to the OOM event: > > > > ``` > > $ dmesg > > ... > > [ 1341.112388] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test,task_memcg=/test/test-child,task=dd,pid=1324,uid=0 > > [ 1341.112418] Memory cgroup out of memory: Killed process 1324 (dd) total-vm:15548kB, anon-rss:10240kB, file-rss:1764kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:0 > > ``` > > > > 7. Reduce the `memory.min` setting in the child cgroup and attempt the same large file creation, and then this issue is resolved. > > > > ``` > > # echo 107374182 > memory.min > > # dd if=/dev/zero of=/tmp/2gbfile bs=10M count=200 > > 200+0 records in > > 200+0 records out > > 2097152000 bytes (2.1 GB, 2.0 GiB) copied, 1.8713 s, 1.1 GB/s > > ``` > > > > Thanks, > > Lance > > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] mm/cgroupv2: memory.min may lead to an OOM error 2024-08-01 11:40 ` Lance Yang @ 2024-08-01 22:58 ` Michal Koutný 2024-08-02 1:56 ` Lance Yang 0 siblings, 1 reply; 5+ messages in thread From: Michal Koutný @ 2024-08-01 22:58 UTC (permalink / raw) To: Lance Yang Cc: Vlastimil Babka (SUSE), akpm, 21cnbao, ryan.roberts, david, shy828301, ziy, libang.li, baolin.wang, linux-kernel, linux-mm, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Cgroups [-- Attachment #1: Type: text/plain, Size: 1516 bytes --] Hello. On Thu, Aug 01, 2024 at 07:40:10PM GMT, Lance Yang <ioworker0@gmail.com> wrote: > However, if the child cgroup doesn't exist and we add a process to the 'test' > cgroup, then attempt to create a large file(2GB) using dd, we won't encounter > an OOM error; everything works as expected. That's due to the way how effective protections are calculated, see [1]. If reclaim target is cgroup T, then it won't enjoy protection configured on itself, whereas the child of T is subject of ancestral reclaim hence the protection applies. That would mean that in your 1st demo, it is test/memory.max that triggers reclaim and then failure to reclaim from test/test-child causes OOM in test. That's interesting since the (same) limit of test-child/memory.max should be evaluated first. I guess it is in your example there are actually two parallel processes (1321 and 1324) so some charges may randomly propagate to the upper test/memory.max limit. As explained above, the 2nd demo has same reclaim target but due to no nesting, protection is moot. I believe you could reproduce with merely test/memory.max test-child/memory.min > Hmm... I'm a bit confused about that. I agree, the calculation of effective protection wrt reclaim target can be confusing. The effects you see are documented for memory.min: > Putting more memory than generally available under this > protection is discouraged and may lead to constant OOMs. HTH, Michal [1] https://lore.kernel.org/all/20200729140537.13345-2-mkoutny@suse.com/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] mm/cgroupv2: memory.min may lead to an OOM error 2024-08-01 22:58 ` Michal Koutný @ 2024-08-02 1:56 ` Lance Yang 0 siblings, 0 replies; 5+ messages in thread From: Lance Yang @ 2024-08-02 1:56 UTC (permalink / raw) To: Michal Koutný Cc: Vlastimil Babka (SUSE), akpm, 21cnbao, ryan.roberts, david, shy828301, ziy, libang.li, baolin.wang, linux-kernel, linux-mm, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Cgroups Hi Michal, Thanks a lot for clarifying! On Fri, Aug 2, 2024 at 6:58 AM Michal Koutný <mkoutny@suse.com> wrote: > > Hello. > > On Thu, Aug 01, 2024 at 07:40:10PM GMT, Lance Yang <ioworker0@gmail.com> wrote: > > However, if the child cgroup doesn't exist and we add a process to the 'test' > > cgroup, then attempt to create a large file(2GB) using dd, we won't encounter > > an OOM error; everything works as expected. > > That's due to the way how effective protections are calculated, see [1]. > If reclaim target is cgroup T, then it won't enjoy protection configured > on itself, whereas the child of T is subject of ancestral reclaim hence > the protection applies. Makes sense to me. > > That would mean that in your 1st demo, it is test/memory.max that > triggers reclaim and then failure to reclaim from test/test-child causes > OOM in test. > That's interesting since the (same) limit of test-child/memory.max > should be evaluated first. I guess it is in your example there are > actually two parallel processes (1321 and 1324) so some charges may > randomly propagate to the upper test/memory.max limit. > > As explained above, the 2nd demo has same reclaim target but due to no > nesting, protection is moot. Ah, that clears it up. I appreciate the detailed explanation - thanks! > I believe you could reproduce with merely > > test/memory.max > test-child/memory.min Yep, I just tested it, and you're right ;) > > > Hmm... I'm a bit confused about that. > > I agree, the calculation of effective protection wrt reclaim target can > be confusing. > > The effects you see are documented for memory.min: > > > Putting more memory than generally available under this > > protection is discouraged and may lead to constant OOMs. Thanks a lot again for your time! Lance > > HTH, > Michal > > [1] https://lore.kernel.org/all/20200729140537.13345-2-mkoutny@suse.com/ ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-08-02 1:56 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-01 4:54 [BUG] mm/cgroupv2: memory.min may lead to an OOM error Lance Yang 2024-08-01 10:35 ` Vlastimil Babka (SUSE) 2024-08-01 11:40 ` Lance Yang 2024-08-01 22:58 ` Michal Koutný 2024-08-02 1:56 ` Lance Yang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).