From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3493157487 for ; Thu, 16 Apr 2026 06:54:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.47 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776322498; cv=none; b=VAy1PUDHvqZX9zgy1o6GbrPwisMCmun6MmTWPSWLeRTK2mnaunRNGDF/xvq6+zcfJrksdEKZaS5NhuKxgyTRhGHTQtpO/ioYPC9u37H6lSLStctjd33gDpVKHNAwHs2M6SgOBPbR9pi7gVMg7lYsCa02nhmOMu6w3PLc4sWj/0c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776322498; c=relaxed/simple; bh=9t4+YSwLdLS+2/IZkrpl+e1eDaI9c7cEBSU5BKPcllI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ZmkZZ3fZp+c6I6F/FnGFSSFSvnI782COzybuv8RB9SontwgQXIHuMU4b/BCvCYUHfNrUvJMJQjd6rddeS0vXJORkzltlXFMD2mWlSmCPi5fc0CkvkkMNvdT7IsRr9wViOa2mP7Yy/eju1FVO5Edq3wazgtIEHK5xEgF489UAFRM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=AlC2p56m; arc=none smtp.client-ip=209.85.221.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="AlC2p56m" Received: by mail-wr1-f47.google.com with SMTP id ffacd0b85a97d-43d43e09de5so4533920f8f.1 for ; Wed, 15 Apr 2026 23:54:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1776322495; x=1776927295; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Ipi0uIFSe7o2MABqVxFvIYk327aSoG15vNdex7MTd4w=; b=AlC2p56mE5sHDh6TfzRb3roCr1bpVuhgIHZ+LuO823Rr0ZrfnTnWoWN/TRf30EE3rC gw0MxLIyxQdm/WymSMluu1gqH4fv4OfTaPq5clr4uEJuHijAs28F9z3NRqoRJTLeGcu+ m76mOID5vbI8DpIJq+e8QzKbqKUgaNJW8HjvbLXlMiGdanRU0fs4sBC4nckp3FtBfgMl XeE5iZP+PxVJbdcg2Xa4km+ejOhg32FPKbrx0O8Uf94cT/+jp1TeOKUWkXZldhogmxRY 8YifIkZ9eQ+pb4mDB+X6WsJD7paEiOQEQ+mlYQDU2nn5G6tRXZ4LwbCy2NlXlJuUU7PP DFhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776322495; x=1776927295; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ipi0uIFSe7o2MABqVxFvIYk327aSoG15vNdex7MTd4w=; b=XTHP+N8uUgP6qcDuO/wscCtSrVsqo0zuw26iHl9Z+EDC6K3Jk6snSJF7siadraOSQk exCVSomE4SCjGeuJBJP4DpGu2B+lDNaAtHh0kX9DRSB486yyRZwojPeC5Vk3L8ve+aU6 z34HmR57m+1hGv7xuGDd4hBnHTiCCs6nocMZs191JBS28/vqsBvv5IJ8Q6qM2o5poDS4 xVcN9Wn5menHq88mOOMcjfnufQDcSUaTGGEiETiTb8GqnsnuSU5riSM8bjgeMDmhlkNa v/EgC/xoNQEMOb53shRJRMw5Plfp4gsfYGxeJ71RCEc5mdKHRrJuzmdXxKziDcmyDTiy lJUA== X-Forwarded-Encrypted: i=1; AFNElJ85hgOadgKYfdia92t1dT92bNfbcgFNxJ+HDfaumpXlaLMvr/n3UzAO/v8Ox5YYCV7rg+2m+O/BSNoLtwY=@vger.kernel.org X-Gm-Message-State: AOJu0Yx/0GgWSeZojyYWxCleKXrERu5zXDCyZOrn3LNLDNpEktEsGnHw 3dp49ZlqSXQzUkLkVzhaNqojxYUsdQTsL34fqppndrS8ztNDB0octag9bLybjqvakrI= X-Gm-Gg: AeBDievzG+5yrxPx/4gG97QCy0/heetrl5WRsszPtKXpVrTIL8zGu3p12BizYyAD+GK d35r77FpUomnSk6TCq+0H4n4ZG4Mv+Qvi1AFYi8IFHCoqChS8KAC/jhlWYWbhfjSViok25UCMLw xcHlwEcDT5aKESlIgq1I2I3Tlb+HDSME6R7sAgPEHPo0bbvqR+ehgLdEkMFNF+6z6hWi6CQrlQn YFMshSM/92w7Uof/keTcRUseKJEhKH+tpcFp/XDiYVSQQu8m6b8n3u3N+xcJTkG7uTVnwCRl8Js y6fmS6krzfD25VEA6ZvIkEiYv45kF0hk1VWcw24P3+5VSUnFKoyZKTZkn7orsongfub1QwjHQsJ pbnZfgVUQfhGIKQf3mQV7Q24WEngoulwE1Oxw/KzYPpd07BBwJEhbJklqHBulvONs1oQ+uLYD+L F9pFXgY5kHFJXbhqSImaR9HS0qMUVUiSm6k21Csi2mCzzwUsE= X-Received: by 2002:a05:6000:2085:b0:43e:a70d:763c with SMTP id ffacd0b85a97d-43ea70d79b3mr15352595f8f.42.1776322494947; Wed, 15 Apr 2026 23:54:54 -0700 (PDT) Received: from localhost (109-81-20-115.rct.o2.cz. [109.81.20.115]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43ead3e040fsm12798539f8f.28.2026.04.15.23.54.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Apr 2026 23:54:54 -0700 (PDT) Date: Thu, 16 Apr 2026 08:54:53 +0200 From: Michal Hocko To: Minchan Kim Cc: akpm@linux-foundation.org, david@kernel.org, brauner@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, surenb@google.com, timmurray@google.com Subject: Re: [RFC 0/3] mm: process_mrelease: expedited reclaim and auto-kill support Message-ID: References: <20260413223948.556351-1-minchan@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed 15-04-26 16:26:34, Minchan Kim wrote: > On Wed, Apr 15, 2026 at 09:38:05AM +0200, Michal Hocko wrote: > > On Tue 14-04-26 13:00:16, Minchan Kim wrote: > > > On Tue, Apr 14, 2026 at 08:57:57AM +0200, Michal Hocko wrote: > > > > On Mon 13-04-26 15:39:45, Minchan Kim wrote: > > > > > This patch series introduces optimizations to expedite memory reclamation > > > > > in process_mrelease() and provides a secure, race-free "auto-kill" > > > > > mechanism for efficient container shutdown and OOM handling. > > > > > > > > > > Currently, process_mrelease() unmaps pages but leaves clean file folios > > > > > on the LRU list, relying on standard memory reclaim to eventually free > > > > > them. Furthermore, requiring userspace to send a SIGKILL prior to > > > > > invoking process_mrelease() introduces scheduling race conditions where > > > > > the victim task may enter the exit path prematurely, bypassing expedited > > > > > reclamation hooks. > > > > > > > > > > This series addresses these limitations in three logical steps. > > > > > > > > > > Patch #1: mm: process_mrelease: expedite clean file folio reclaim via mmu_gather > > > > > Integrates clean file folio eviction directly into the low-level TLB > > > > > batching (mmu_gather) infrastructure. Symmetrically truncates clean file > > > > > folios alongside anonymous pages during the unmap loop. > > > > > > > > Why do we need to care about clean page cache? Is this a form of > > > > drop_caches? > > > > > > The goal is to ensure the memory is actually freed by the time > > > process_mrelease returns. Currently, process_mrelease unmaps pages, but > > > page caches remain on the LRU, leaving them to be reclaimed later > > > by kswapd or direct reclaim. > > > > Correct. This was the initial design decision because there is not much > > you can assume about page cache pages which are very often shared. Even > > if they are not mapped by all users. > > Fair point. However, that's the trade-off: > > Leaving unmapped caches to be reclaimed asynchronously keeps system memory > pressure high for too long. In Android, this delay forces the LMKD to > unnecessarily kill additional innocent background apps before the memory > from the original victim is recovered. OK, this is really not clear to me. How come you end up triggering LMKD (or any OOM handling) when there is a considerable amount of clean page cache? [...] > > > The race occurs when the victim process starts its own exit path (after > > > SIGKILL) before the caller can invoke process_mrelease. If the victim > > > reaches the exit path first, the caller might lose the window to apply > > > these expedited reclamation optimizations. > > > > Isn't this the problem you are trying to solve then? You are special > > casing process_mrelease while you really want to expedite the process > > memory clean up. > > > > The same situation happens with the global OOM and your approach doesn't > > really close the race anyway. You send SIGKILL first and the victim can > > hit the exit path right after that before you start processing the rest. > > That is not fundamentally different from doing that in two syscalls, > > race window is just smaller. > > No, this approach completely close the race. > > When it invokes do_send_sig_info(SIGKILL) with the KILL_MRELEASE code, > the kernel sets the MMF_UNSTABLE flag on the victim's mm_struct in the signal > delivery path (kernel/signal.c) *before* the task begins processing the signal. OK, I have missed this part. I haven't really looked into specific patches at this stage. I am still trying to understand the motivation and your reasoning. So effectivelly you want to get SIGOOMKILL more or less. > When the victim gets scheduled and wakes up to process the fatal signal, > the MMF_UNSTABLE flag is already set. > > This guarantees that the victim's own exit path (do_exit -> exit_mmap) will > utilize the expedited reclamation optimizations automatically, regardless of > whether the reaper or the victim gets scheduled first. > > For the OOM, we can use the same idea. > > > > > All that being said, I do not think those special hacks for > > process_mrelease is the right approach. I very much agree that the > > address space tear down for a dying process could be improved and we > > should be focusing on that part. > > I think process_mrelease is crucial here because relying on the exit path is > non-deterministic. I suspect you are missing my point. I am arguing that those special hacks in the address space release path shouldn't be process_mrelease specific. I do recognize the value of the sync tear down need. I am also in favor of something like SIGOOMKILL. process_mrelease might even be the right syscall for that purpose. -- Michal Hocko SUSE Labs