* OOM kill of privileged processes when exhausting a single NUMA node
@ 2025-06-26 22:27 Felix Abecassis
2025-06-26 23:21 ` Pedro Falcato
0 siblings, 1 reply; 5+ messages in thread
From: Felix Abecassis @ 2025-06-26 22:27 UTC (permalink / raw)
To: linux-mm@kvack.org; +Cc: Zi Yan, John Hubbard
Hello linux-mm team,
I have found an interesting behavior in the Linux kernel: an unprivileged user
with access to user namespaces can cause privileged processes to be killed due
to an OOM situation on a single NUMA node, even if the system has plenty of
memory available on other NUMA nodes.
This might lead to a local denial of service in some situations, so please
review and let me know if the current behavior is expected.
The steps are simple:
1. Use a Linux system with multiple NUMA nodes
2. Enable unprivileged user namespaces (often distro dependent)
3. As an unprivileged user, create a user namespace + mount namespace
and mount a tmpfs bound to NUMA node 1
4. Attempt to fill the tmpfs with more data than it can possibly store
5. The OOM killer will kill a significant amount of system daemons
(UID 0).
The possible mitigations I currently know of are: create a swap space, disable
unprivileged user namespaces, or set sysctl vm.oom_kill_allocating_task=1.
To be 100% clear, this does not require elevated privileges, and we are only
using a fraction of the total system memory.
Below is an example on a Ubuntu 25.04 VM under qemu where I hotplugged a new
NUMA node with 1GB of memory, I also place the current process under a 2GB
memory cgroup to show that it's not an effective mitigation.
$ uname -a
Linux ubuntu 6.14.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC Wed May 21 15:01:51 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
$ id -u
1000
# Enable unprivileged user namespaces (this is an Ubuntu feature)
$ sudo sysctl kernel.apparmor_restrict_unprivileged_userns=0
$ sudo sh -c 'echo 2G > /sys/fs/cgroup/user.slice/user-1000.slice/memory.max'
$ numastat -mzc
Per-node system memory usage (in MBs):
Token Unaccepted not in hash table.
Token Unaccepted not in hash table.
Node 0 Node 1 Total
------ ------ -----
MemTotal 7940 1024 8964
MemFree 7533 1024 8557
MemUsed 407 0 407
Active 176 0 176
Inactive 44 0 44
Active(anon) 42 0 42
Active(file) 134 0 134
Inactive(file) 44 0 44
Unevictable 26 0 26
Mlocked 26 0 26
Dirty 0 0 0
FilePages 186 0 186
Mapped 57 0 57
AnonPages 59 0 59
Shmem 1 0 1
KernelStack 2 0 2
PageTables 2 0 2
Slab 84 0 84
SReclaimable 17 0 17
SUnreclaim 68 0 68
KReclaimable 17 0 17
$ unshare -U -r -m sh -xc 'mount -t tmpfs -o mpol=bind:1 tmpfs /dev/shm ; dd if=/dev/zero of=/dev/shm/file bs=64K count=25000'
+ mount -t tmpfs -o mpol=bind:1 tmpfs /dev/shm
+ dd if=/dev/zero of=/dev/shm/file bs=64K count=25000
[ 294.046130] Out of memory: Killed process 1074 (systemd) total-vm:21968kB, anon-rss:2048kB, file-rss:10164kB, shmem-rss:0kB, UID:1000 pgtables:88kB oom_score_adj:100
[ 294.052224] Out of memory: Killed process 1076 ((sd-pam)) total-vm:21992kB, anon-rss:1772kB, file-rss:1832kB, shmem-rss:0kB, UID:1000 pgtables:76kB oom_score_adj:100
[ 294.058446] Out of memory: Killed process 821 (unattended-upgr) total-vm:121388kB, anon-rss:13272kB, file-rss:16004kB, shmem-rss:0kB, UID:0 pgtables:140kB oom_score_adj:0
[ 294.064551] Out of memory: Killed process 423 (systemd-resolve) total-vm:23200kB, anon-rss:2560kB, file-rss:11504kB, shmem-rss:0kB, UID:990 pgtables:88kB oom_score_adj:0
[ 294.070491] Out of memory: Killed process 789 (udisksd) total-vm:470572kB, anon-rss:1920kB, file-rss:11840kB, shmem-rss:0kB, UID:0 pgtables:136kB oom_score_adj:0
[ 294.076371] Out of memory: Killed process 848 (ModemManager) total-vm:391392kB, anon-rss:1792kB, file-rss:10516kB, shmem-rss:0kB, UID:0 pgtables:124kB oom_score_adj:0
[ 294.082350] Out of memory: Killed process 733 (systemd-network) total-vm:20804kB, anon-rss:1296kB, file-rss:10068kB, shmem-rss:0kB, UID:998 pgtables:76kB oom_score_adj:0
[ 294.088273] Out of memory: Killed process 1141 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8556kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
[ 294.094350] Out of memory: Killed process 788 (systemd-logind) total-vm:18896kB, anon-rss:896kB, file-rss:7968kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
[ 294.100461] Out of memory: Killed process 1151 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:7732kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
[ 294.106462] Out of memory: Killed process 1154 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8036kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
[ 294.112592] Out of memory: Killed process 1155 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8648kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
[ 294.118725] Out of memory: Killed process 1161 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8648kB, shmem-rss:0kB, UID:998 pgtables:84kB oom_score_adj:0
[ 294.124827] Out of memory: Killed process 1165 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8484kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0
[ 294.131138] Out of memory: Killed process 1169 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8604kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
[ 294.137548] Out of memory: Killed process 1177 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8592kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
[ 294.144659] Out of memory: Killed process 1187 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8800kB, shmem-rss:0kB, UID:998 pgtables:80kB oom_score_adj:0
[ 294.151118] Out of memory: Killed process 1179 (systemd-logind) total-vm:18728kB, anon-rss:1024kB, file-rss:7972kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:0
[ 294.157569] Out of memory: Killed process 1194 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8596kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
[ 294.163877] Out of memory: Killed process 417 (systemd-timesyn) total-vm:91608kB, anon-rss:896kB, file-rss:7132kB, shmem-rss:0kB, UID:996 pgtables:88kB oom_score_adj:0
[ 294.170240] Out of memory: Killed process 783 (polkitd) total-vm:306832kB, anon-rss:640kB, file-rss:7264kB, shmem-rss:0kB, UID:988 pgtables:96kB oom_score_adj:0
[ 294.176668] Out of memory: Killed process 1200 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:7776kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
[ 294.183107] Out of memory: Killed process 1205 (9) total-vm:20136kB, anon-rss:1152kB, file-rss:6584kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
[ 294.189627] Out of memory: Killed process 1210 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:7844kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
[ 294.196227] Out of memory: Killed process 1209 ((d-logind)) total-vm:20140kB, anon-rss:1280kB, file-rss:7284kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
[ 294.202956] Out of memory: Killed process 1212 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8568kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
[ 294.209719] Out of memory: Killed process 1223 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8556kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
[ 294.216356] Out of memory: Killed process 851 (rsyslogd) total-vm:220676kB, anon-rss:1280kB, file-rss:4292kB, shmem-rss:0kB, UID:101 pgtables:80kB oom_score_adj:0
[ 294.223146] Out of memory: Killed process 1220 (systemd-logind) total-vm:18728kB, anon-rss:1024kB, file-rss:8044kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0
[ 294.229888] Out of memory: Killed process 1234 ((systemd)) total-vm:21992kB, anon-rss:1664kB, file-rss:8852kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:100
[ 294.236624] Out of memory: Killed process 952 (login) total-vm:11220kB, anon-rss:768kB, file-rss:4616kB, shmem-rss:0kB, UID:0 pgtables:64kB oom_score_adj:0
[ 294.243266] Out of memory: Killed process 940 (cron) total-vm:7512kB, anon-rss:256kB, file-rss:2760kB, shmem-rss:0kB, UID:0 pgtables:56kB oom_score_adj:0
[ 294.249871] Out of memory: Killed process 956 (agetty) total-vm:8516kB, anon-rss:128kB, file-rss:2492kB, shmem-rss:0kB, UID:0 pgtables:60kB oom_score_adj:0
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: OOM kill of privileged processes when exhausting a single NUMA node
2025-06-26 22:27 OOM kill of privileged processes when exhausting a single NUMA node Felix Abecassis
@ 2025-06-26 23:21 ` Pedro Falcato
2025-06-26 23:27 ` Zi Yan
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Pedro Falcato @ 2025-06-26 23:21 UTC (permalink / raw)
To: Felix Abecassis
Cc: linux-mm@kvack.org, Zi Yan, John Hubbard, Johannes Weiner,
Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song
On Thu, Jun 26, 2025 at 10:27:36PM +0000, Felix Abecassis wrote:
> Hello linux-mm team,
>
> I have found an interesting behavior in the Linux kernel: an unprivileged user
> with access to user namespaces can cause privileged processes to be killed due
> to an OOM situation on a single NUMA node, even if the system has plenty of
> memory available on other NUMA nodes.
>
> This might lead to a local denial of service in some situations, so please
> review and let me know if the current behavior is expected.
>
> The steps are simple:
> 1. Use a Linux system with multiple NUMA nodes
> 2. Enable unprivileged user namespaces (often distro dependent)
> 3. As an unprivileged user, create a user namespace + mount namespace
> and mount a tmpfs bound to NUMA node 1
> 4. Attempt to fill the tmpfs with more data than it can possibly store
> 5. The OOM killer will kill a significant amount of system daemons
> (UID 0).
>
I somewhat agree that this is somewhat unintended tmpfs behavior, but you can
(probably) pull this off in other ways:
- use set_mempolicy()/mbind to bind to a NUMA node and use a big mmap() mapping
- just use a lot of memory
and it's not limited to NUMA either.
AFAIK user namespaces aren't really isolating in the sense that you need a
cgroup on top to further control software you don't trust (or want to limit
for other reasons)
And in this case the particular problem is that tmpfs really can't track
what process "owns" a file, even if O_TMPFILE was specified. So you can quite
trivially run out of memory in a regular Linux distro by filling up the /tmp
(if tmpfs, of course), if you have write perms for /tmp, which by default you do.
The only alarming bit (to me) is that cgroups don't work in this case as well.
The most adhoc solution I have would be to possibly limit the tmpfs size to
memory.max. Adding the memcg folks for more comments.
--
Pedro
> The possible mitigations I currently know of are: create a swap space, disable
> unprivileged user namespaces, or set sysctl vm.oom_kill_allocating_task=1.
>
> To be 100% clear, this does not require elevated privileges, and we are only
> using a fraction of the total system memory.
>
> Below is an example on a Ubuntu 25.04 VM under qemu where I hotplugged a new
> NUMA node with 1GB of memory, I also place the current process under a 2GB
> memory cgroup to show that it's not an effective mitigation.
>
> $ uname -a
> Linux ubuntu 6.14.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC Wed May 21 15:01:51 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
>
> $ id -u
> 1000
>
> # Enable unprivileged user namespaces (this is an Ubuntu feature)
> $ sudo sysctl kernel.apparmor_restrict_unprivileged_userns=0
>
> $ sudo sh -c 'echo 2G > /sys/fs/cgroup/user.slice/user-1000.slice/memory.max'
>
> $ numastat -mzc
>
> Per-node system memory usage (in MBs):
> Token Unaccepted not in hash table.
> Token Unaccepted not in hash table.
> Node 0 Node 1 Total
> ------ ------ -----
> MemTotal 7940 1024 8964
> MemFree 7533 1024 8557
> MemUsed 407 0 407
> Active 176 0 176
> Inactive 44 0 44
> Active(anon) 42 0 42
> Active(file) 134 0 134
> Inactive(file) 44 0 44
> Unevictable 26 0 26
> Mlocked 26 0 26
> Dirty 0 0 0
> FilePages 186 0 186
> Mapped 57 0 57
> AnonPages 59 0 59
> Shmem 1 0 1
> KernelStack 2 0 2
> PageTables 2 0 2
> Slab 84 0 84
> SReclaimable 17 0 17
> SUnreclaim 68 0 68
> KReclaimable 17 0 17
>
> $ unshare -U -r -m sh -xc 'mount -t tmpfs -o mpol=bind:1 tmpfs /dev/shm ; dd if=/dev/zero of=/dev/shm/file bs=64K count=25000'
> + mount -t tmpfs -o mpol=bind:1 tmpfs /dev/shm
> + dd if=/dev/zero of=/dev/shm/file bs=64K count=25000
> [ 294.046130] Out of memory: Killed process 1074 (systemd) total-vm:21968kB, anon-rss:2048kB, file-rss:10164kB, shmem-rss:0kB, UID:1000 pgtables:88kB oom_score_adj:100
> [ 294.052224] Out of memory: Killed process 1076 ((sd-pam)) total-vm:21992kB, anon-rss:1772kB, file-rss:1832kB, shmem-rss:0kB, UID:1000 pgtables:76kB oom_score_adj:100
> [ 294.058446] Out of memory: Killed process 821 (unattended-upgr) total-vm:121388kB, anon-rss:13272kB, file-rss:16004kB, shmem-rss:0kB, UID:0 pgtables:140kB oom_score_adj:0
> [ 294.064551] Out of memory: Killed process 423 (systemd-resolve) total-vm:23200kB, anon-rss:2560kB, file-rss:11504kB, shmem-rss:0kB, UID:990 pgtables:88kB oom_score_adj:0
> [ 294.070491] Out of memory: Killed process 789 (udisksd) total-vm:470572kB, anon-rss:1920kB, file-rss:11840kB, shmem-rss:0kB, UID:0 pgtables:136kB oom_score_adj:0
> [ 294.076371] Out of memory: Killed process 848 (ModemManager) total-vm:391392kB, anon-rss:1792kB, file-rss:10516kB, shmem-rss:0kB, UID:0 pgtables:124kB oom_score_adj:0
> [ 294.082350] Out of memory: Killed process 733 (systemd-network) total-vm:20804kB, anon-rss:1296kB, file-rss:10068kB, shmem-rss:0kB, UID:998 pgtables:76kB oom_score_adj:0
> [ 294.088273] Out of memory: Killed process 1141 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8556kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
> [ 294.094350] Out of memory: Killed process 788 (systemd-logind) total-vm:18896kB, anon-rss:896kB, file-rss:7968kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
> [ 294.100461] Out of memory: Killed process 1151 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:7732kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
> [ 294.106462] Out of memory: Killed process 1154 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8036kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
> [ 294.112592] Out of memory: Killed process 1155 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8648kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
> [ 294.118725] Out of memory: Killed process 1161 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8648kB, shmem-rss:0kB, UID:998 pgtables:84kB oom_score_adj:0
> [ 294.124827] Out of memory: Killed process 1165 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8484kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0
> [ 294.131138] Out of memory: Killed process 1169 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8604kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
> [ 294.137548] Out of memory: Killed process 1177 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8592kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
> [ 294.144659] Out of memory: Killed process 1187 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8800kB, shmem-rss:0kB, UID:998 pgtables:80kB oom_score_adj:0
> [ 294.151118] Out of memory: Killed process 1179 (systemd-logind) total-vm:18728kB, anon-rss:1024kB, file-rss:7972kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:0
> [ 294.157569] Out of memory: Killed process 1194 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8596kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
> [ 294.163877] Out of memory: Killed process 417 (systemd-timesyn) total-vm:91608kB, anon-rss:896kB, file-rss:7132kB, shmem-rss:0kB, UID:996 pgtables:88kB oom_score_adj:0
> [ 294.170240] Out of memory: Killed process 783 (polkitd) total-vm:306832kB, anon-rss:640kB, file-rss:7264kB, shmem-rss:0kB, UID:988 pgtables:96kB oom_score_adj:0
> [ 294.176668] Out of memory: Killed process 1200 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:7776kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
> [ 294.183107] Out of memory: Killed process 1205 (9) total-vm:20136kB, anon-rss:1152kB, file-rss:6584kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
> [ 294.189627] Out of memory: Killed process 1210 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:7844kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
> [ 294.196227] Out of memory: Killed process 1209 ((d-logind)) total-vm:20140kB, anon-rss:1280kB, file-rss:7284kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
> [ 294.202956] Out of memory: Killed process 1212 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8568kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
> [ 294.209719] Out of memory: Killed process 1223 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8556kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
> [ 294.216356] Out of memory: Killed process 851 (rsyslogd) total-vm:220676kB, anon-rss:1280kB, file-rss:4292kB, shmem-rss:0kB, UID:101 pgtables:80kB oom_score_adj:0
> [ 294.223146] Out of memory: Killed process 1220 (systemd-logind) total-vm:18728kB, anon-rss:1024kB, file-rss:8044kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0
> [ 294.229888] Out of memory: Killed process 1234 ((systemd)) total-vm:21992kB, anon-rss:1664kB, file-rss:8852kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:100
> [ 294.236624] Out of memory: Killed process 952 (login) total-vm:11220kB, anon-rss:768kB, file-rss:4616kB, shmem-rss:0kB, UID:0 pgtables:64kB oom_score_adj:0
> [ 294.243266] Out of memory: Killed process 940 (cron) total-vm:7512kB, anon-rss:256kB, file-rss:2760kB, shmem-rss:0kB, UID:0 pgtables:56kB oom_score_adj:0
> [ 294.249871] Out of memory: Killed process 956 (agetty) total-vm:8516kB, anon-rss:128kB, file-rss:2492kB, shmem-rss:0kB, UID:0 pgtables:60kB oom_score_adj:0
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: OOM kill of privileged processes when exhausting a single NUMA node
2025-06-26 23:21 ` Pedro Falcato
@ 2025-06-26 23:27 ` Zi Yan
2025-06-27 3:15 ` Felix Abecassis
2025-06-27 8:17 ` Michal Hocko
2 siblings, 0 replies; 5+ messages in thread
From: Zi Yan @ 2025-06-26 23:27 UTC (permalink / raw)
To: Pedro Falcato
Cc: Felix Abecassis, linux-mm, John Hubbard, Johannes Weiner,
Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song
On 26 Jun 2025, at 19:21, Pedro Falcato wrote:
> On Thu, Jun 26, 2025 at 10:27:36PM +0000, Felix Abecassis wrote:
>> Hello linux-mm team,
>>
>> I have found an interesting behavior in the Linux kernel: an unprivileged user
>> with access to user namespaces can cause privileged processes to be killed due
>> to an OOM situation on a single NUMA node, even if the system has plenty of
>> memory available on other NUMA nodes.
>>
>> This might lead to a local denial of service in some situations, so please
>> review and let me know if the current behavior is expected.
>>
>> The steps are simple:
>> 1. Use a Linux system with multiple NUMA nodes
>> 2. Enable unprivileged user namespaces (often distro dependent)
>> 3. As an unprivileged user, create a user namespace + mount namespace
>> and mount a tmpfs bound to NUMA node 1
>> 4. Attempt to fill the tmpfs with more data than it can possibly store
>> 5. The OOM killer will kill a significant amount of system daemons
>> (UID 0).
>>
>
> I somewhat agree that this is somewhat unintended tmpfs behavior, but you can
> (probably) pull this off in other ways:
>
> - use set_mempolicy()/mbind to bind to a NUMA node and use a big mmap() mapping
> - just use a lot of memory
OOM will kill the app using a lot of memory, but with tmpfs, like you mentioned
below, OOM is not able to find a victim process to kill.
>
> and it's not limited to NUMA either.
>
> AFAIK user namespaces aren't really isolating in the sense that you need a
> cgroup on top to further control software you don't trust (or want to limit
> for other reasons)
>
>
> And in this case the particular problem is that tmpfs really can't track
> what process "owns" a file, even if O_TMPFILE was specified. So you can quite
> trivially run out of memory in a regular Linux distro by filling up the /tmp
> (if tmpfs, of course), if you have write perms for /tmp, which by default you do.
>
>
> The only alarming bit (to me) is that cgroups don't work in this case as well.
> The most adhoc solution I have would be to possibly limit the tmpfs size to
> memory.max. Adding the memcg folks for more comments.
>
> --
> Pedro
>
>> The possible mitigations I currently know of are: create a swap space, disable
>> unprivileged user namespaces, or set sysctl vm.oom_kill_allocating_task=1.
>>
>> To be 100% clear, this does not require elevated privileges, and we are only
>> using a fraction of the total system memory.
>>
>> Below is an example on a Ubuntu 25.04 VM under qemu where I hotplugged a new
>> NUMA node with 1GB of memory, I also place the current process under a 2GB
>> memory cgroup to show that it's not an effective mitigation.
>>
>> $ uname -a
>> Linux ubuntu 6.14.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC Wed May 21 15:01:51 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
>>
>> $ id -u
>> 1000
>>
>> # Enable unprivileged user namespaces (this is an Ubuntu feature)
>> $ sudo sysctl kernel.apparmor_restrict_unprivileged_userns=0
>>
>> $ sudo sh -c 'echo 2G > /sys/fs/cgroup/user.slice/user-1000.slice/memory.max'
>>
>> $ numastat -mzc
>>
>> Per-node system memory usage (in MBs):
>> Token Unaccepted not in hash table.
>> Token Unaccepted not in hash table.
>> Node 0 Node 1 Total
>> ------ ------ -----
>> MemTotal 7940 1024 8964
>> MemFree 7533 1024 8557
>> MemUsed 407 0 407
>> Active 176 0 176
>> Inactive 44 0 44
>> Active(anon) 42 0 42
>> Active(file) 134 0 134
>> Inactive(file) 44 0 44
>> Unevictable 26 0 26
>> Mlocked 26 0 26
>> Dirty 0 0 0
>> FilePages 186 0 186
>> Mapped 57 0 57
>> AnonPages 59 0 59
>> Shmem 1 0 1
>> KernelStack 2 0 2
>> PageTables 2 0 2
>> Slab 84 0 84
>> SReclaimable 17 0 17
>> SUnreclaim 68 0 68
>> KReclaimable 17 0 17
>>
>> $ unshare -U -r -m sh -xc 'mount -t tmpfs -o mpol=bind:1 tmpfs /dev/shm ; dd if=/dev/zero of=/dev/shm/file bs=64K count=25000'
>> + mount -t tmpfs -o mpol=bind:1 tmpfs /dev/shm
>> + dd if=/dev/zero of=/dev/shm/file bs=64K count=25000
>> [ 294.046130] Out of memory: Killed process 1074 (systemd) total-vm:21968kB, anon-rss:2048kB, file-rss:10164kB, shmem-rss:0kB, UID:1000 pgtables:88kB oom_score_adj:100
>> [ 294.052224] Out of memory: Killed process 1076 ((sd-pam)) total-vm:21992kB, anon-rss:1772kB, file-rss:1832kB, shmem-rss:0kB, UID:1000 pgtables:76kB oom_score_adj:100
>> [ 294.058446] Out of memory: Killed process 821 (unattended-upgr) total-vm:121388kB, anon-rss:13272kB, file-rss:16004kB, shmem-rss:0kB, UID:0 pgtables:140kB oom_score_adj:0
>> [ 294.064551] Out of memory: Killed process 423 (systemd-resolve) total-vm:23200kB, anon-rss:2560kB, file-rss:11504kB, shmem-rss:0kB, UID:990 pgtables:88kB oom_score_adj:0
>> [ 294.070491] Out of memory: Killed process 789 (udisksd) total-vm:470572kB, anon-rss:1920kB, file-rss:11840kB, shmem-rss:0kB, UID:0 pgtables:136kB oom_score_adj:0
>> [ 294.076371] Out of memory: Killed process 848 (ModemManager) total-vm:391392kB, anon-rss:1792kB, file-rss:10516kB, shmem-rss:0kB, UID:0 pgtables:124kB oom_score_adj:0
>> [ 294.082350] Out of memory: Killed process 733 (systemd-network) total-vm:20804kB, anon-rss:1296kB, file-rss:10068kB, shmem-rss:0kB, UID:998 pgtables:76kB oom_score_adj:0
>> [ 294.088273] Out of memory: Killed process 1141 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8556kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [ 294.094350] Out of memory: Killed process 788 (systemd-logind) total-vm:18896kB, anon-rss:896kB, file-rss:7968kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [ 294.100461] Out of memory: Killed process 1151 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:7732kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [ 294.106462] Out of memory: Killed process 1154 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8036kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [ 294.112592] Out of memory: Killed process 1155 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8648kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [ 294.118725] Out of memory: Killed process 1161 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8648kB, shmem-rss:0kB, UID:998 pgtables:84kB oom_score_adj:0
>> [ 294.124827] Out of memory: Killed process 1165 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8484kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0
>> [ 294.131138] Out of memory: Killed process 1169 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8604kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [ 294.137548] Out of memory: Killed process 1177 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8592kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [ 294.144659] Out of memory: Killed process 1187 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8800kB, shmem-rss:0kB, UID:998 pgtables:80kB oom_score_adj:0
>> [ 294.151118] Out of memory: Killed process 1179 (systemd-logind) total-vm:18728kB, anon-rss:1024kB, file-rss:7972kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:0
>> [ 294.157569] Out of memory: Killed process 1194 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8596kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [ 294.163877] Out of memory: Killed process 417 (systemd-timesyn) total-vm:91608kB, anon-rss:896kB, file-rss:7132kB, shmem-rss:0kB, UID:996 pgtables:88kB oom_score_adj:0
>> [ 294.170240] Out of memory: Killed process 783 (polkitd) total-vm:306832kB, anon-rss:640kB, file-rss:7264kB, shmem-rss:0kB, UID:988 pgtables:96kB oom_score_adj:0
>> [ 294.176668] Out of memory: Killed process 1200 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:7776kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [ 294.183107] Out of memory: Killed process 1205 (9) total-vm:20136kB, anon-rss:1152kB, file-rss:6584kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [ 294.189627] Out of memory: Killed process 1210 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:7844kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [ 294.196227] Out of memory: Killed process 1209 ((d-logind)) total-vm:20140kB, anon-rss:1280kB, file-rss:7284kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [ 294.202956] Out of memory: Killed process 1212 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8568kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [ 294.209719] Out of memory: Killed process 1223 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8556kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [ 294.216356] Out of memory: Killed process 851 (rsyslogd) total-vm:220676kB, anon-rss:1280kB, file-rss:4292kB, shmem-rss:0kB, UID:101 pgtables:80kB oom_score_adj:0
>> [ 294.223146] Out of memory: Killed process 1220 (systemd-logind) total-vm:18728kB, anon-rss:1024kB, file-rss:8044kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0
>> [ 294.229888] Out of memory: Killed process 1234 ((systemd)) total-vm:21992kB, anon-rss:1664kB, file-rss:8852kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:100
>> [ 294.236624] Out of memory: Killed process 952 (login) total-vm:11220kB, anon-rss:768kB, file-rss:4616kB, shmem-rss:0kB, UID:0 pgtables:64kB oom_score_adj:0
>> [ 294.243266] Out of memory: Killed process 940 (cron) total-vm:7512kB, anon-rss:256kB, file-rss:2760kB, shmem-rss:0kB, UID:0 pgtables:56kB oom_score_adj:0
>> [ 294.249871] Out of memory: Killed process 956 (agetty) total-vm:8516kB, anon-rss:128kB, file-rss:2492kB, shmem-rss:0kB, UID:0 pgtables:60kB oom_score_adj:0
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: OOM kill of privileged processes when exhausting a single NUMA node
2025-06-26 23:21 ` Pedro Falcato
2025-06-26 23:27 ` Zi Yan
@ 2025-06-27 3:15 ` Felix Abecassis
2025-06-27 8:17 ` Michal Hocko
2 siblings, 0 replies; 5+ messages in thread
From: Felix Abecassis @ 2025-06-27 3:15 UTC (permalink / raw)
To: Pedro Falcato
Cc: linux-mm@kvack.org, Zi Yan, John Hubbard, Johannes Weiner,
Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song
On Fri, Jun 27, 2025 at 12:21:57AM +0100, Pedro Falcato wrote:
> External email: Use caution opening links or attachments
>
>
> On Thu, Jun 26, 2025 at 10:27:36PM +0000, Felix Abecassis wrote:
> > Hello linux-mm team,
> >
> > I have found an interesting behavior in the Linux kernel: an unprivileged user
> > with access to user namespaces can cause privileged processes to be killed due
> > to an OOM situation on a single NUMA node, even if the system has plenty of
> > memory available on other NUMA nodes.
> >
> > This might lead to a local denial of service in some situations, so please
> > review and let me know if the current behavior is expected.
> >
> > The steps are simple:
> > 1. Use a Linux system with multiple NUMA nodes
> > 2. Enable unprivileged user namespaces (often distro dependent)
> > 3. As an unprivileged user, create a user namespace + mount namespace
> > and mount a tmpfs bound to NUMA node 1
> > 4. Attempt to fill the tmpfs with more data than it can possibly store
> > 5. The OOM killer will kill a significant amount of system daemons
> > (UID 0).
> >
>
> I somewhat agree that this is somewhat unintended tmpfs behavior, but you can
> (probably) pull this off in other ways:
>
> - use set_mempolicy()/mbind to bind to a NUMA node and use a big mmap() mapping
> - just use a lot of memory
>
> and it's not limited to NUMA either.
>
> AFAIK user namespaces aren't really isolating in the sense that you need a
> cgroup on top to further control software you don't trust (or want to limit
> for other reasons)
>
Yes, but inside a user namespace you are able to mount a tmpfs bound to a
single NUMA node and that is the key to triggering the bug I described. You
would not have the same problem when trying to fill /tmp (assuming a tmpfs),
because it's not bound to a single NUMA node and thus you would be limited by
the memory cgroup (in addition, the default is size=50% for a tmpfs).
>
> And in this case the particular problem is that tmpfs really can't track
> what process "owns" a file, even if O_TMPFILE was specified. So you can quite
> trivially run out of memory in a regular Linux distro by filling up the /tmp
> (if tmpfs, of course), if you have write perms for /tmp, which by default you do.
>
>
> The only alarming bit (to me) is that cgroups don't work in this case as well.
> The most adhoc solution I have would be to possibly limit the tmpfs size to
> memory.max. Adding the memcg folks for more comments.
>
The cgroup memory limit cannot work because we are far from the limit. The
process is limited to 2GB, it attempts to write 1.5GB to the tmpfs, and after
writing 1GB it will trigger the OOM kill of privileged processes, despite the
fact that the system still has ~8GB of free memory on NUMA node 0.
> --
> Pedro
>
> > The possible mitigations I currently know of are: create a swap space, disable
> > unprivileged user namespaces, or set sysctl vm.oom_kill_allocating_task=1.
> >
> > To be 100% clear, this does not require elevated privileges, and we are only
> > using a fraction of the total system memory.
> >
> > Below is an example on a Ubuntu 25.04 VM under qemu where I hotplugged a new
> > NUMA node with 1GB of memory, I also place the current process under a 2GB
> > memory cgroup to show that it's not an effective mitigation.
> >
> > $ uname -a
> > Linux ubuntu 6.14.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC Wed May 21 15:01:51 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
> >
> > $ id -u
> > 1000
> >
> > # Enable unprivileged user namespaces (this is an Ubuntu feature)
> > $ sudo sysctl kernel.apparmor_restrict_unprivileged_userns=0
> >
> > $ sudo sh -c 'echo 2G > /sys/fs/cgroup/user.slice/user-1000.slice/memory.max'
> >
> > $ numastat -mzc
> >
> > Per-node system memory usage (in MBs):
> > Token Unaccepted not in hash table.
> > Token Unaccepted not in hash table.
> > Node 0 Node 1 Total
> > ------ ------ -----
> > MemTotal 7940 1024 8964
> > MemFree 7533 1024 8557
> > MemUsed 407 0 407
> > Active 176 0 176
> > Inactive 44 0 44
> > Active(anon) 42 0 42
> > Active(file) 134 0 134
> > Inactive(file) 44 0 44
> > Unevictable 26 0 26
> > Mlocked 26 0 26
> > Dirty 0 0 0
> > FilePages 186 0 186
> > Mapped 57 0 57
> > AnonPages 59 0 59
> > Shmem 1 0 1
> > KernelStack 2 0 2
> > PageTables 2 0 2
> > Slab 84 0 84
> > SReclaimable 17 0 17
> > SUnreclaim 68 0 68
> > KReclaimable 17 0 17
> >
> > $ unshare -U -r -m sh -xc 'mount -t tmpfs -o mpol=bind:1 tmpfs /dev/shm ; dd if=/dev/zero of=/dev/shm/file bs=64K count=25000'
> > + mount -t tmpfs -o mpol=bind:1 tmpfs /dev/shm
> > + dd if=/dev/zero of=/dev/shm/file bs=64K count=25000
> > [ 294.046130] Out of memory: Killed process 1074 (systemd) total-vm:21968kB, anon-rss:2048kB, file-rss:10164kB, shmem-rss:0kB, UID:1000 pgtables:88kB oom_score_adj:100
> > [ 294.052224] Out of memory: Killed process 1076 ((sd-pam)) total-vm:21992kB, anon-rss:1772kB, file-rss:1832kB, shmem-rss:0kB, UID:1000 pgtables:76kB oom_score_adj:100
> > [ 294.058446] Out of memory: Killed process 821 (unattended-upgr) total-vm:121388kB, anon-rss:13272kB, file-rss:16004kB, shmem-rss:0kB, UID:0 pgtables:140kB oom_score_adj:0
> > [ 294.064551] Out of memory: Killed process 423 (systemd-resolve) total-vm:23200kB, anon-rss:2560kB, file-rss:11504kB, shmem-rss:0kB, UID:990 pgtables:88kB oom_score_adj:0
> > [ 294.070491] Out of memory: Killed process 789 (udisksd) total-vm:470572kB, anon-rss:1920kB, file-rss:11840kB, shmem-rss:0kB, UID:0 pgtables:136kB oom_score_adj:0
> > [ 294.076371] Out of memory: Killed process 848 (ModemManager) total-vm:391392kB, anon-rss:1792kB, file-rss:10516kB, shmem-rss:0kB, UID:0 pgtables:124kB oom_score_adj:0
> > [ 294.082350] Out of memory: Killed process 733 (systemd-network) total-vm:20804kB, anon-rss:1296kB, file-rss:10068kB, shmem-rss:0kB, UID:998 pgtables:76kB oom_score_adj:0
> > [ 294.088273] Out of memory: Killed process 1141 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8556kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
> > [ 294.094350] Out of memory: Killed process 788 (systemd-logind) total-vm:18896kB, anon-rss:896kB, file-rss:7968kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
> > [ 294.100461] Out of memory: Killed process 1151 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:7732kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
> > [ 294.106462] Out of memory: Killed process 1154 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8036kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
> > [ 294.112592] Out of memory: Killed process 1155 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8648kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
> > [ 294.118725] Out of memory: Killed process 1161 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8648kB, shmem-rss:0kB, UID:998 pgtables:84kB oom_score_adj:0
> > [ 294.124827] Out of memory: Killed process 1165 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8484kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0
> > [ 294.131138] Out of memory: Killed process 1169 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8604kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
> > [ 294.137548] Out of memory: Killed process 1177 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8592kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
> > [ 294.144659] Out of memory: Killed process 1187 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8800kB, shmem-rss:0kB, UID:998 pgtables:80kB oom_score_adj:0
> > [ 294.151118] Out of memory: Killed process 1179 (systemd-logind) total-vm:18728kB, anon-rss:1024kB, file-rss:7972kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:0
> > [ 294.157569] Out of memory: Killed process 1194 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8596kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
> > [ 294.163877] Out of memory: Killed process 417 (systemd-timesyn) total-vm:91608kB, anon-rss:896kB, file-rss:7132kB, shmem-rss:0kB, UID:996 pgtables:88kB oom_score_adj:0
> > [ 294.170240] Out of memory: Killed process 783 (polkitd) total-vm:306832kB, anon-rss:640kB, file-rss:7264kB, shmem-rss:0kB, UID:988 pgtables:96kB oom_score_adj:0
> > [ 294.176668] Out of memory: Killed process 1200 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:7776kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
> > [ 294.183107] Out of memory: Killed process 1205 (9) total-vm:20136kB, anon-rss:1152kB, file-rss:6584kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
> > [ 294.189627] Out of memory: Killed process 1210 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:7844kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
> > [ 294.196227] Out of memory: Killed process 1209 ((d-logind)) total-vm:20140kB, anon-rss:1280kB, file-rss:7284kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
> > [ 294.202956] Out of memory: Killed process 1212 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8568kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
> > [ 294.209719] Out of memory: Killed process 1223 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8556kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
> > [ 294.216356] Out of memory: Killed process 851 (rsyslogd) total-vm:220676kB, anon-rss:1280kB, file-rss:4292kB, shmem-rss:0kB, UID:101 pgtables:80kB oom_score_adj:0
> > [ 294.223146] Out of memory: Killed process 1220 (systemd-logind) total-vm:18728kB, anon-rss:1024kB, file-rss:8044kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0
> > [ 294.229888] Out of memory: Killed process 1234 ((systemd)) total-vm:21992kB, anon-rss:1664kB, file-rss:8852kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:100
> > [ 294.236624] Out of memory: Killed process 952 (login) total-vm:11220kB, anon-rss:768kB, file-rss:4616kB, shmem-rss:0kB, UID:0 pgtables:64kB oom_score_adj:0
> > [ 294.243266] Out of memory: Killed process 940 (cron) total-vm:7512kB, anon-rss:256kB, file-rss:2760kB, shmem-rss:0kB, UID:0 pgtables:56kB oom_score_adj:0
> > [ 294.249871] Out of memory: Killed process 956 (agetty) total-vm:8516kB, anon-rss:128kB, file-rss:2492kB, shmem-rss:0kB, UID:0 pgtables:60kB oom_score_adj:0
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: OOM kill of privileged processes when exhausting a single NUMA node
2025-06-26 23:21 ` Pedro Falcato
2025-06-26 23:27 ` Zi Yan
2025-06-27 3:15 ` Felix Abecassis
@ 2025-06-27 8:17 ` Michal Hocko
2 siblings, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2025-06-27 8:17 UTC (permalink / raw)
To: Pedro Falcato
Cc: Felix Abecassis, linux-mm@kvack.org, Zi Yan, John Hubbard,
Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song
On Fri 27-06-25 00:21:57, Pedro Falcato wrote:
> On Thu, Jun 26, 2025 at 10:27:36PM +0000, Felix Abecassis wrote:
> > Hello linux-mm team,
> >
> > I have found an interesting behavior in the Linux kernel: an unprivileged user
> > with access to user namespaces can cause privileged processes to be killed due
> > to an OOM situation on a single NUMA node, even if the system has plenty of
> > memory available on other NUMA nodes.
> >
> > This might lead to a local denial of service in some situations, so please
> > review and let me know if the current behavior is expected.
> >
> > The steps are simple:
> > 1. Use a Linux system with multiple NUMA nodes
> > 2. Enable unprivileged user namespaces (often distro dependent)
> > 3. As an unprivileged user, create a user namespace + mount namespace
> > and mount a tmpfs bound to NUMA node 1
> > 4. Attempt to fill the tmpfs with more data than it can possibly store
> > 5. The OOM killer will kill a significant amount of system daemons
> > (UID 0).
This is really something that our OOM handling is not able to deal with
because we cannot simply remove persistent (even if boot time scoped)
data. Even if we managed to kill a task that has consumed an excessive
amount of tmpfs data then the data will be left with the current
implementation. Changing the behavior would require defining disposable
tmpfs mounts and make any userspace aware of the fact. Otherwise we are
causing active data corruption bugs.
> I somewhat agree that this is somewhat unintended tmpfs behavior, but you can
> (probably) pull this off in other ways:
Well, it is a filesystem and as such we do not allow data corruptions.
The same way we do not simply allow removing data on ENOSPC. This
filesystem just happens to be backed by memory rather than a real
storage.
> - use set_mempolicy()/mbind to bind to a NUMA node and use a big mmap() mapping
> - just use a lot of memory
>
> and it's not limited to NUMA either.
Right there are ways to deplete memory and therefore it is generally
recommended to contain untrusted users by memory cgroups and make sure
the untrusted user cannot consume any specific resource. NUMA topology
makes that more complicated because that adds to the resource constrains
as pointed out in the below example (hard limit harder than a single
numa node while tmpfs is configured to consume the full Numa node).
My experience with unprivileged user namespaces is limited but I would
say that you need some policy built on top if you want to allow
arbitrary tmpfs mounts.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-06-27 8:17 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-26 22:27 OOM kill of privileged processes when exhausting a single NUMA node Felix Abecassis
2025-06-26 23:21 ` Pedro Falcato
2025-06-26 23:27 ` Zi Yan
2025-06-27 3:15 ` Felix Abecassis
2025-06-27 8:17 ` Michal Hocko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).