From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA8FFC7EE31 for ; Fri, 27 Jun 2025 08:17:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0FF488D0006; Fri, 27 Jun 2025 04:17:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D7B86B00BE; Fri, 27 Jun 2025 04:17:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 013978D0006; Fri, 27 Jun 2025 04:17:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id DE2DD6B00BD for ; Fri, 27 Jun 2025 04:17:52 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 83E07104F21 for ; Fri, 27 Jun 2025 08:17:52 +0000 (UTC) X-FDA: 83600477184.21.2FBB5CD Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) by imf02.hostedemail.com (Postfix) with ESMTP id 63CA480006 for ; Fri, 27 Jun 2025 08:17:50 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=I1fKfTSu; spf=pass (imf02.hostedemail.com: domain of mhocko@suse.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751012270; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TZCTNh0IK+TARI435yR+Xk9KVDqfzNvObYLea6L6hsg=; b=QMF3POESM5p8xznuBr1JCWoAhsLoKoDC15PU9DXZldHUFk0Lg+d6k7/CywEL3S9undbp5g NrL7NMfGXDgtWUHsrqPByhKIJF+I769d0L9G8s2GvRcqhqqOffBXDmgpVrzU4pGPq1rsI3 ANtMv3tIMABMMyipvyQUqUghoZoGwOk= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=I1fKfTSu; spf=pass (imf02.hostedemail.com: domain of mhocko@suse.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751012270; a=rsa-sha256; cv=none; b=q81KsHaAvbN+9dd104pCq4/PeVs08Wz49f+jMvmk/4mL89vFnE8P+ix8+vxzgHZ+y9JZti LHn5ykc9Dlg/g7r3SfJPo8CwC3BQ1z6BT0Muvcd7SBzzRgMwgj51wEHT2vntIoa+T56dtd SiMexmGPxNvEULXDRGRjL0W7Iiio6Nk= Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-ae0ccfd5ca5so303918666b.3 for ; Fri, 27 Jun 2025 01:17:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1751012269; x=1751617069; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=TZCTNh0IK+TARI435yR+Xk9KVDqfzNvObYLea6L6hsg=; b=I1fKfTSulwKiFmtIznmbC3DdAXH1bmFc5IPtbl/U96pkSt2hVfV4q8+CvjmAg1CbO3 cw0WSI/0fRt1fUB3Pk8lT1012+rhkANOff0g/vw+4d6mUPK2SYeReEKPougwv+MNxxcs b3KdpPCkWpl0ks6wjbQRwi8LDZ2CQmq+C6jVc+01HbkvGKKIC05CUhYUoEx/GgE6CaQW jCOyIqE4085PRDoJ3/Bpew8FtlO30lyrcvvp1cExTY0yRrjM3iNBxxehEAb2Nk6oarLh 1mh2vVYmLZFhtsLlRc5IK6dy/w/2UnzKzH2AY92y+pdvqWQ3v1OwgRa5tK6pRmghaRv0 /OKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751012269; x=1751617069; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=TZCTNh0IK+TARI435yR+Xk9KVDqfzNvObYLea6L6hsg=; b=WLqIfHKvK1r5pgRYx4g4WhO/3eYZI+0SjzQkBC9WNL2H15gn+n25uvluhbpNli7ytD MaMSF2MZv77IoWtYOKE0dLkNlu2+9eQKLt/h+o7QtFFcGutxkNZ9fYLBYmL7eAVwv4dV 2mzhmQydbj6q62HhV0Q5WZ+B60Ke2D1/NfzaGagAJ5Ia3GaDSsfGCWvTX8WOT1hB/EqP tXYXuDHlt3p1XVxgiC8CFzd8SelqiKp/kpa3/RTOUa63EZ4H7u1Mv5AdqKFoJr7RFzS8 7S7WLn5H7GHVeQTE2yVZ18Vovpcuh31HXRUQXcPoYt42u1wIotPKvxl2kw63DF+Im2kP lJAg== X-Forwarded-Encrypted: i=1; AJvYcCUgSzxIZ3cnMT9RIeItMiC2r4FaOQ5iClPskOgye8jO+YKxoEsjbjRfkbymPAL1vnQZ6FNEH32iew==@kvack.org X-Gm-Message-State: AOJu0YweOIMyNjA7TuWUQ0hX9KfOplothJVmTAsdTY+sqGfXvU9D+Qgq SPj1rem5Bs/SWngmbxPib/AlTFBN8uNVRk1GEUHZ7KJPviqDdV8Ah0uKFjXEQ6566Cc= X-Gm-Gg: ASbGncuEzktiWSCYcLgcJ7Ml7/iLXMK7R0NYv6NUZBVL6B9zTZvYHl4kdur4DssIFcw yrKFEVooqeMcfpUI3GywrRrpyt9dpC4DVwvuRugLW1bqrDT/8zro755bEyp6mWcnKkoywgxNUSo qrdlgew3/o3NuE5zuhDWUZz9WbNqWbNPWY1bUSWL52JZsEJ3S/49HHkzdw5vxL2oE5Pja+DgIqF d8uq8cfBSs/Pr4WX783jTXQkaztNMgPTELJdtja7VkUL+eOEdZ9jfgVGrrGLPlPQ6gpiV7sESWw z7Lq0APWXco5UhlnvF9vaek65Oh0BlWk/ZkDn3xAfL8L/pRImGPZWjItuKJdPzD2l448P2EaWNo = X-Google-Smtp-Source: AGHT+IGpeup9aHqgdtis7g2C7Ctal+p+SGL5UlJfre0yJoC8Ibufw5auGIzxaX9cEuMQ+xJT2zTQiA== X-Received: by 2002:a17:907:26cd:b0:ae0:cca0:e6af with SMTP id a640c23a62f3a-ae34fcf4648mr173988766b.1.1751012268678; Fri, 27 Jun 2025 01:17:48 -0700 (PDT) Received: from localhost (109-81-20-253.rct.o2.cz. [109.81.20.253]) by smtp.gmail.com with UTF8SMTPSA id a640c23a62f3a-ae353c6bdcasm82298366b.137.2025.06.27.01.17.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Jun 2025 01:17:48 -0700 (PDT) Date: Fri, 27 Jun 2025 10:17:47 +0200 From: Michal Hocko To: Pedro Falcato Cc: Felix Abecassis , "linux-mm@kvack.org" , Zi Yan , John Hubbard , Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song Subject: Re: OOM kill of privileged processes when exhausting a single NUMA node Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 63CA480006 X-Stat-Signature: yoakw3yjnbmj754cu858xabgqjxikrdt X-Rspam-User: X-HE-Tag: 1751012270-166914 X-HE-Meta: U2FsdGVkX1/dLi08rHKnipNLCDoIAxiquqnbkwqGLRXurXN3JCyrDF4//Xt4EnFubdZDBb+kGOvdemcszllQYOkuKDdSnvxMNUzum7/GE15z3i6byrAngn0dK8Nt9rluEgnFGFv/T+Dh5O1Q5tN5NjyakbDUN6v4POZ2URFyjQ2PRdj353KyEWYpz74xVwVsHhPqTmZVhISUkfKNLrCBOh5KWwoDRKmCdVXGRW78wQn+ZysKMZkHaNo1L37fF7p+nQ7pP0NNUZ7gs3XA+SkbDkwlsm6infBtVMPapZ6sGF+y3mlrSPpR/BpgPJ1Vnhi/cGO/IEKIxmDnvhh/L/sZEYo95C2fwACIP3nstiwNY8AHsgNtlIEHvfRi7UcIIbnTXC8vQuvSt4DzMxX7a2hVnBSGrfDWYMwQEWsyAXERTpgTXDM1ho+MIJkgIOnZQEukTq2yK0AdeYPP8BHXNZJupI26F22Mk7IE3UKKg0BKaCuzhaSPLnlUvlYDezlZ3qOy3Jk0ctdNZy17PlxRo9OdqQKWCObJUgCnvcB+SZc1Bb1p4TVwagD0JwI6C4C3e/T/EI7GeHFT3PmVVbI0TuZIOYeWxT8FQltXdNbsK0Bb7ub6OW5mB1SE695vaPP5QkEgRv95NSvlFpWZ5j+wlvxdJ8UnBnerv2JyC1K291QqgORSNFbZX18bPjWIL1IZ/jg5vQEPikupOQ1zqfetmDqFHBVfHqi5Y0+julXAD7LSAz/jaXdHnBDfQLY3tMgNzcVvOnx2tcUoWF6MBYE13nOW0+YVospLpTK8CdoF3hgnuPTvv24T7oAOMjffqmvfYTbGPxXT5dooIl1mvttEvC/NI+9n7ej87SMUMe4Yu/a7iCUtlcgJICFieAz5aBSxgK+fjrjXxqNJSpKuwBWHLyZRsOsFH6V0YwUMqQgra1zh9roRiRlKqrR3D1Radse500eAJ4razCOkrj9RbeRwAxj O8beLt8N 6ogBD2qf508kwdNF5e8AXe5ZZmPuhB9PyRjzYL4FI9bILBvaUVvmgvChoCJUJA2wvu9wAhkrB5sXfnm4EStgYiKhtOBiOGJ1xcY7030ha7JZPhkONl6KPDoZMTmDzlc3lAu8F/KuLWnCx6OAjJFFuyAk+/dA57L7v5DDnNjJyQ1O2I/PZXXIJUzKmE6oDEIdExhOAa9veaHo3VUCHJ6xeO+NkeAkWL3fHkwFHzPMcw20lYEg+2slGkFQLACL0wWGVKswtoNi8ew8T5YTD1m6o7aDXjPiguepg9CM3xq3WHBccppuzO2eLNev2ivolkzwW57OxrwDRMHh4ymG5ZMoAVN6o7RS/Kkh5nPAH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri 27-06-25 00:21:57, Pedro Falcato wrote: > On Thu, Jun 26, 2025 at 10:27:36PM +0000, Felix Abecassis wrote: > > Hello linux-mm team, > > > > I have found an interesting behavior in the Linux kernel: an unprivileged user > > with access to user namespaces can cause privileged processes to be killed due > > to an OOM situation on a single NUMA node, even if the system has plenty of > > memory available on other NUMA nodes. > > > > This might lead to a local denial of service in some situations, so please > > review and let me know if the current behavior is expected. > > > > The steps are simple: > > 1. Use a Linux system with multiple NUMA nodes > > 2. Enable unprivileged user namespaces (often distro dependent) > > 3. As an unprivileged user, create a user namespace + mount namespace > > and mount a tmpfs bound to NUMA node 1 > > 4. Attempt to fill the tmpfs with more data than it can possibly store > > 5. The OOM killer will kill a significant amount of system daemons > > (UID 0). This is really something that our OOM handling is not able to deal with because we cannot simply remove persistent (even if boot time scoped) data. Even if we managed to kill a task that has consumed an excessive amount of tmpfs data then the data will be left with the current implementation. Changing the behavior would require defining disposable tmpfs mounts and make any userspace aware of the fact. Otherwise we are causing active data corruption bugs. > I somewhat agree that this is somewhat unintended tmpfs behavior, but you can > (probably) pull this off in other ways: Well, it is a filesystem and as such we do not allow data corruptions. The same way we do not simply allow removing data on ENOSPC. This filesystem just happens to be backed by memory rather than a real storage. > - use set_mempolicy()/mbind to bind to a NUMA node and use a big mmap() mapping > - just use a lot of memory > > and it's not limited to NUMA either. Right there are ways to deplete memory and therefore it is generally recommended to contain untrusted users by memory cgroups and make sure the untrusted user cannot consume any specific resource. NUMA topology makes that more complicated because that adds to the resource constrains as pointed out in the below example (hard limit harder than a single numa node while tmpfs is configured to consume the full Numa node). My experience with unprivileged user namespaces is limited but I would say that you need some policy built on top if you want to allow arbitrary tmpfs mounts. -- Michal Hocko SUSE Labs