From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F7761F2380 for ; Wed, 15 Apr 2026 07:38:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.42 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776238691; cv=none; b=Nv+c9QoveyWbd4NfGVdH+ssqWABCqHvkLeM071TOlIy0wsqwbbuisAQg8y0tZV/AEZYJuA3BFTft1Ow/ad3hp6lNEhRlXMvNmg+K+lYatLFDhvlvfQ91BOT+kl2HDrNAB/vlYeXSk6wmqBDw3z90YLbQeT9pCTNuZDlh7B1zd20= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776238691; c=relaxed/simple; bh=l8UMC/8lUcrtb/0pU4kAD9MesPx47+8Lb4wUPdJHlok=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=DT6NmSVXE1dfBQYqlOlm0XPyR7uncUTj5iCTFh8DJyW/IJWJnUH/enmaI/szeQxeZUWWcOAcR5m3km/wq1XKym5+oXlrOAaggiHbx8vQMK725kE5D3VipkPAcFGcM1lz15ceAz69A2oHDjUxAbyYn+LJzdsvXTUW07rohsEgG0Q= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=Y20VMGpy; arc=none smtp.client-ip=209.85.221.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="Y20VMGpy" Received: by mail-wr1-f42.google.com with SMTP id ffacd0b85a97d-43d0deb7ad5so5035575f8f.2 for ; Wed, 15 Apr 2026 00:38:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1776238688; x=1776843488; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=L8AC0yBpBWtYbBwil3XeUbd7KgN+8+gj84W8YqrBJck=; b=Y20VMGpyRrZ05nWiE4iGUX3Otg0dnHKjw0MHFnBvqyhL9I6jXFmXM89XiZj/0XAL2y QyWXl5QWvf9J+shOoNSsxzGBifLxbhpLfkh8A7zyWXzRM8TcddlpWdqdXVgYyHhNcWRl 5azQBv2mT0TNTMyT6VPwpFUvIYdfUldiYIUbx9OvvuPvXl2yfbEgkIkpgGqw7QQzQOs+ gGTVieGDhWDPEq1KDscUrXafoPRelB4J0iigF5S0tmCMNKvI68CADQgwlecgyrooYk2z cpyLyycT5I1kGfgcxxKvQ6iuJd0YgSCVxgg8XkgbkGzPXpC3HXz3Hw+K1vSlaOBgW95C 1PVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776238688; x=1776843488; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=L8AC0yBpBWtYbBwil3XeUbd7KgN+8+gj84W8YqrBJck=; b=mbK16ojrpKc8l22lmgLxi5qN8O5D6Lmc4BV2AVrzlmqBwaTnoRRoDBBFl69jvH3Xcx f/LxqYtXfel9AgpJ4lF3k7XaL41aEMXjoDg8SNc7pfXms2J9D6tPnPdLll1qJdbnt60J vFsy/aNH/vowHufmwzDi6c045BCW8r+aD+LfTcqpKMZzO2FsRC9Ku2MxgsYuWfNBWWo4 h0CgQRBu7nCpd6tfSz8siTfCt0+UJqwLAmdZDTahxEZdnczPBjBGjviJY4za8xI1f8Lc pj3KhJokC0hqjk4u41zUthh3utFkpVRtm5favEcoE1OiU2BasCMQko1c1wkt8P+ZIMdk 0azA== X-Forwarded-Encrypted: i=1; AFNElJ+iY3XgS1h2uv9q4Qcr5SaxazoHUodAThoch8c5IxYFil3AlJoTSxCze5rGWzA0yuou6d1vm+zz8tw13gs=@vger.kernel.org X-Gm-Message-State: AOJu0YzmNxXEHXFmLpwp2WuBK+R+o9WD4GEtBs3QVuXvOvDUR1ODx+Lh C2WlcKX2UeA1U1aBo70losoTJ4Wr414n2MpIAoR/uA+fV6ZNcpnhMtlhCwIO5OarW48= X-Gm-Gg: AeBDiet6ah4pULe5YlpQU/1OywVpx453jZRvYeP8uh20ei7rKtAbeBI9+3rqtI+LdYZ HJDDsU62l0jDMNpaJcbVXAKa8Bj0imGuEnVquBpj6pbJ5WFe/B+hjN4ROTqcY/7nBPbXRnabf0x HCkkMmgozLPyU5LcyLPRb3IBUaBrnhHaLx5Ai4Qe2Cz8KJykFrxdNIkdYqGewLo/G+32Z7QT69B NNGFs23w+JLuYDrVWDTowmyDB9kRo51thUwn5ZltheNoL3i6aWIbjMxuNSJd0VTWQBcXbg/F5Vn 60CE8veUciZIGRdNyvg5KceaIPLZa7zGqBo8uXeSmD1r/5O1gThNjrQE1JIjR6+opPnHlosW9L8 ImmzwClC1lJOua8X99zMsGVSy2xgjGdmEsesWt0NojKEBJ5Jyc5tsx77UZFwYvZ2YKSCixsotcq /9j+HtU3wk6gdo4t3oJrCRM9AW/xq0SXK3+4b9WHxM4nzr X-Received: by 2002:a05:6000:1a8b:b0:43d:7e6f:37fa with SMTP id ffacd0b85a97d-43d7e6f38cemr11251115f8f.19.1776238687592; Wed, 15 Apr 2026 00:38:07 -0700 (PDT) Received: from localhost (109-81-29-22.rct.o2.cz. [109.81.29.22]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43ead35c026sm2822191f8f.15.2026.04.15.00.38.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Apr 2026 00:38:07 -0700 (PDT) Date: Wed, 15 Apr 2026 09:38:05 +0200 From: Michal Hocko To: Minchan Kim Cc: akpm@linux-foundation.org, david@kernel.org, brauner@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, surenb@google.com, timmurray@google.com Subject: Re: [RFC 0/3] mm: process_mrelease: expedited reclaim and auto-kill support Message-ID: References: <20260413223948.556351-1-minchan@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Tue 14-04-26 13:00:16, Minchan Kim wrote: > On Tue, Apr 14, 2026 at 08:57:57AM +0200, Michal Hocko wrote: > > On Mon 13-04-26 15:39:45, Minchan Kim wrote: > > > This patch series introduces optimizations to expedite memory reclamation > > > in process_mrelease() and provides a secure, race-free "auto-kill" > > > mechanism for efficient container shutdown and OOM handling. > > > > > > Currently, process_mrelease() unmaps pages but leaves clean file folios > > > on the LRU list, relying on standard memory reclaim to eventually free > > > them. Furthermore, requiring userspace to send a SIGKILL prior to > > > invoking process_mrelease() introduces scheduling race conditions where > > > the victim task may enter the exit path prematurely, bypassing expedited > > > reclamation hooks. > > > > > > This series addresses these limitations in three logical steps. > > > > > > Patch #1: mm: process_mrelease: expedite clean file folio reclaim via mmu_gather > > > Integrates clean file folio eviction directly into the low-level TLB > > > batching (mmu_gather) infrastructure. Symmetrically truncates clean file > > > folios alongside anonymous pages during the unmap loop. > > > > Why do we need to care about clean page cache? Is this a form of > > drop_caches? > > The goal is to ensure the memory is actually freed by the time > process_mrelease returns. Currently, process_mrelease unmaps pages, but > page caches remain on the LRU, leaving them to be reclaimed later > by kswapd or direct reclaim. Correct. This was the initial design decision because there is not much you can assume about page cache pages which are very often shared. Even if they are not mapped by all users. > This delay defeats the purpose of > "expedited" release. It’s not a global drop_caches, but rather a > targeted eviction for the victim process to make its memory immediately > available for other urgent allocations. Clean page cache reclaim should be quite effective. Why doesn't kswapd keep up in that regards? Or is this more a per-memcg problem where there is no background reclaim and you are hitting direct reclaim to clean up those pages? > > > Patch #2: mm: process_mrelease: skip LRU movement for exclusive file folios > > > Skips costly LRU marking (folio_mark_accessed) for exclusive file-backed > > > folios undergoing process_mrelease reclaim. Perf profiling reveals that > > > LRU movement accounts for ~55% of overhead during unmap. > > > > OK, but why is this not desirable behavior fir mrelease? > > In Android, lmkd kills background apps under memory pressure and then calls > process_mrelease. If the memory release is slow due to LRU overhead (~55% as noted), > it cannot keep up with the allocation speed of the foreground app. > This delay often leads to "over-killing" - killing more background apps > than necessary because the system hasn't yet "seen" the memory freed > from the first kill. OK, I see. More on that below. > > > Patch #3: mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag > > > Adds an auto-kill flag supporting atomic teardown. Utilizes a dedicated > > > signal code (KILL_MRELEASE) to guarantee MMF_UNSTABLE is marked in the > > > signal delivery path, preventing scheduling races. > > > > Could you explain why those races are a real problem? > > The race occurs when the victim process starts its own exit path (after > SIGKILL) before the caller can invoke process_mrelease. If the victim > reaches the exit path first, the caller might lose the window to apply > these expedited reclamation optimizations. Isn't this the problem you are trying to solve then? You are special casing process_mrelease while you really want to expedite the process memory clean up. The same situation happens with the global OOM and your approach doesn't really close the race anyway. You send SIGKILL first and the victim can hit the exit path right after that before you start processing the rest. That is not fundamentally different from doing that in two syscalls, race window is just smaller. All that being said, I do not think those special hacks for process_mrelease is the right approach. I very much agree that the address space tear down for a dying process could be improved and we should be focusing on that part. -- Michal Hocko SUSE Labs