From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-dl1-f51.google.com (mail-dl1-f51.google.com [74.125.82.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E43DF38655B for ; Tue, 21 Apr 2026 23:02:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776812573; cv=none; b=u9gQ77xAyqaf9eSEzWkTY15KKQ2XqAzLCj1Nk2T53pJQbdNJR28xZUEnCh1bkqkqS2nNe0rbmvydck8EkGLjHw+WI+HM8WQex+TVSSx6xPAfB7Tlv69duty7dRZGTMCoTslbY0tLdIR81q7/5FmewN1KdaQZzlwzNQwphq0bSdw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776812573; c=relaxed/simple; bh=uCqTMx2Vrw9oUnN/NUn/zWaQzWq7nFi9S8FCWPuGSDE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hhyT0NilLRaFujtjPz8OnME8L8HrsSgNR2bA6mcjrt/2zqwAHi0a+JeNyBQhc8n9kRXPOsUOJf3nPZWnXb9wbZou1UgBVDGTY28Q8MyX+pjMjpL7hnMTxr6aiSoEkcgiKfI7X1XALPTOrZ8vMcQbSKosO4vsqw44QvXKJFdI9R4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XlcNzM/I; arc=none smtp.client-ip=74.125.82.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XlcNzM/I" Received: by mail-dl1-f51.google.com with SMTP id a92af1059eb24-12c7212836bso12630257c88.0 for ; Tue, 21 Apr 2026 16:02:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776812571; x=1777417371; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=lwpK6htKdPchlWbZaSeEAHqzDpDsOuggPLlaoXdeCJA=; b=XlcNzM/IyJTeUf5regBMn924oOIzWAhakBeum3kCBO7S42JL3e7yNlI3ERVgtNWrlr wmP7luqzFVpWHlffLkRaLQJbBIVSlPCbuolRR17NyoUYl14EfhwyOxEUgXqb/YKqnq+j 6gGqbKVqLgOLIcyh3l+Lt6/1pzpVKcpfJaHQshErNChm580Rlyopc9saATwJgJ7TWV/k OttQGhe2MRPT0TeQBrOPruleBlgUScKtbLIks2xmaC9iVonYjuYIEunw9FLZsJpVExv8 C3BuA/sRPuEoNzqEhqrJd2+h3JyD3z2f4HChAXFV6emm7GbptaYCc76gg2cXD/kVvFLX n37Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776812571; x=1777417371; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lwpK6htKdPchlWbZaSeEAHqzDpDsOuggPLlaoXdeCJA=; b=Z/uZcMNVbLNB0ZmuRnjpUdjYHC7kC0Gg4RgoL7WXBB5/XUgKwuyB2hxaEkMdTnQLJP ZAN9Mlh+Ht/NfwV2zmaTJhWyV9fD3e2b5IXtlEZ67GtnHxHarGZetJbpuT4mLLl9PDVW wOlRIXwCrlFU8MxVx6OD461asS/lSEsCU8j4RxIaRK+UZNJMSL4iqa4Z5mYlakr5pRY+ 2Jl8uzZZ24gs7J9ZsS8U2CEuB/zTMCKHPqihgmYc0/eClI/CeWRoLRcEGroz9Jxxe/6C BvGyZ/2l69L4fDNE0r4j4x5eAqvmiX1T9eF1o90qm6D0WgSKB9cyd8YvrDjpqw0DhG1z 8vXQ== X-Forwarded-Encrypted: i=1; AFNElJ8kvJ6IsWu1XZd/Lw8amWZRqfZjQHjsQ2nMY4DN8RlyLH5qi1NirmfmegMW9mrzNPRIRsen/FD3WhJLNiI=@vger.kernel.org X-Gm-Message-State: AOJu0Yw8+IHZPri2Irb92u4Ssf4jsF9emTHh1vsVeGM3Q2V4Zy3ebX/R hEGdUqfBGqa8HUF7/hTzd6tgn+Q7/dOScrIEQlVvz3ATWbuD6voXUdlZ X-Gm-Gg: AeBDietWm8dPkinNX2j8Z9IsUBD7YbLD8SYVIHfgY8T92ZV0svj7Z3Unb5JMO7ywx4S N2gSer/WTvO76LsSEQHMWVd6dBJP+qObw2wSweOV1/0neEcm1y+zRE6vZw2n5IpDl1X7zDNw/2M OneDyyLRMWwM0n+JD1ijywRTOZiM1JpWX/nnoWBKsJDZMb6C2tH6ILoD4hnw6UAHTytrSwv579d o6N19LTiMAslaMamAup9AV9KGnTCSmeDAb2tS8kHzi0PGf3BKC6rik5jGHlQ5YIimBRQ1xfGZkc E/Tr0EBZx1f0t9D1BID9B6+gSPS5rbG1Q+zICmv8Awwypnk7F0yuKkcwGHY9KYJmp3D95pNzflf ELKulO+7IkZoUz3iCQ0kF9kcLubc4uwwinCIccuRbShOZNgjCI1ldIbf/o5CMNzYtbDvuTKBgd4 X3NSptDvG6XEMdy/54/BkCTb8PqqjLYQo0PZEW71jHqHPNFTH0ug6gq4ismvGQ+sZgVXD07jVx/ v8KeWCwfFZvmQ== X-Received: by 2002:a05:701b:2415:b0:12d:b26f:cafd with SMTP id a92af1059eb24-12db26fcc16mr2120556c88.5.1776812570960; Tue, 21 Apr 2026 16:02:50 -0700 (PDT) Received: from bbox-1.mtv.corp.google.com ([2a00:79e0:2e7c:8:4678:d28b:b946:bcc]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-12c74a20eb5sm26453546c88.14.2026.04.21.16.02.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Apr 2026 16:02:50 -0700 (PDT) Sender: Minchan Kim From: Minchan Kim To: akpm@linux-foundation.org Cc: hca@linux.ibm.com, linux-s390@vger.kernel.org, david@kernel.org, mhocko@suse.com, brauner@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, surenb@google.com, timmurray@google.com, Minchan Kim , Minchan Kim Subject: [PATCH v1 3/3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag Date: Tue, 21 Apr 2026 16:02:39 -0700 Message-ID: <20260421230239.172582-4-minchan@kernel.org> X-Mailer: git-send-email 2.54.0.rc1.555.g9c883467ad-goog In-Reply-To: <20260421230239.172582-1-minchan@kernel.org> References: <20260421230239.172582-1-minchan@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Currently, process_mrelease() requires userspace to send a SIGKILL signal prior to the call. This separation introduces a scheduling race window where the victim task may receive the signal and enter the exit path before the reaper can invoke process_mrelease(). When the victim enters the exit path (do_exit -> exit_mm), it clears its task->mm immediately. This causes process_mrelease() to fail with -ESRCH, leaving the actual address space teardown (exit_mmap) to be deferred until the mm's reference count drops to zero. In Android, arbitrary reference counts (e.g., async I/O, reading /proc//cmdline, or various other remote VM accesses) frequently delay this teardown indefinitely, defeating the purpose of expedited reclamation. This delay keeps memory pressure high, forcing the system to unnecessarily kill additional innocent background apps before the memory from the first victim is recovered. This patch introduces the PROCESS_MRELEASE_REAP_KILL UAPI flag to support an integrated auto-kill mode. When specified, process_mrelease() directly injects a SIGKILL into the target task. To solve the race condition deterministically, we grab the mm reference via mmget() and set the MMF_UNSTABLE flag *before* sending the SIGKILL. Using mmget() instead of mmgrab() keeps mm_users > 0, preventing the victim from calling exit_mmap() in its own exit path. This ensures that the memory is reclaimed synchronously and deterministically by the reaper in the context of process_mrelease(), avoiding delays caused by non-deterministic scheduling of the victim task. Signed-off-by: Minchan Kim --- include/uapi/linux/mman.h | 4 +++ mm/oom_kill.c | 56 +++++++++++++++++++++++++++------------ 2 files changed, 43 insertions(+), 17 deletions(-) diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h index e89d00528f2f..4266976b45ad 100644 --- a/include/uapi/linux/mman.h +++ b/include/uapi/linux/mman.h @@ -56,4 +56,8 @@ struct cachestat { __u64 nr_recently_evicted; }; +/* Flags for process_mrelease */ +#define PROCESS_MRELEASE_REAP_KILL (1 << 0) +#define PROCESS_MRELEASE_VALID_FLAGS (PROCESS_MRELEASE_REAP_KILL) + #endif /* _UAPI_LINUX_MMAN_H */ diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 5c6c95c169ee..730ba0d19b53 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -20,6 +20,7 @@ #include #include +#include #include #include #include @@ -850,7 +851,7 @@ bool oom_killer_disable(signed long timeout) return true; } -static inline bool __task_will_free_mem(struct task_struct *task) +static inline bool __task_will_free_mem(struct task_struct *task, bool ignore_exit) { struct signal_struct *sig = task->signal; @@ -862,6 +863,9 @@ static inline bool __task_will_free_mem(struct task_struct *task) if (sig->core_state) return false; + if (ignore_exit) + return true; + if (sig->flags & SIGNAL_GROUP_EXIT) return true; @@ -878,7 +882,7 @@ static inline bool __task_will_free_mem(struct task_struct *task) * Caller has to make sure that task->mm is stable (hold task_lock or * it operates on the current). */ -static bool task_will_free_mem(struct task_struct *task) +static bool task_will_free_mem(struct task_struct *task, bool ignore_exit) { struct mm_struct *mm = task->mm; struct task_struct *p; @@ -892,7 +896,7 @@ static bool task_will_free_mem(struct task_struct *task) if (!mm) return false; - if (!__task_will_free_mem(task)) + if (!__task_will_free_mem(task, ignore_exit)) return false; /* @@ -916,7 +920,7 @@ static bool task_will_free_mem(struct task_struct *task) continue; if (same_thread_group(task, p)) continue; - ret = __task_will_free_mem(p); + ret = __task_will_free_mem(p, false); if (!ret) break; } @@ -1034,7 +1038,7 @@ static void oom_kill_process(struct oom_control *oc, const char *message) * so it can die quickly */ task_lock(victim); - if (task_will_free_mem(victim)) { + if (task_will_free_mem(victim, false)) { mark_oom_victim(victim); queue_oom_reaper(victim); task_unlock(victim); @@ -1135,7 +1139,7 @@ bool out_of_memory(struct oom_control *oc) * select it. The goal is to allow it to allocate so that it may * quickly exit and free its memory. */ - if (task_will_free_mem(current)) { + if (task_will_free_mem(current, false)) { mark_oom_victim(current); queue_oom_reaper(current); return true; @@ -1217,8 +1221,9 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) unsigned int f_flags; bool reap = false; long ret = 0; + bool reap_kill; - if (flags) + if (flags & ~PROCESS_MRELEASE_VALID_FLAGS) return -EINVAL; task = pidfd_get_task(pidfd, &f_flags); @@ -1236,19 +1241,33 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) } mm = p->mm; - mmgrab(mm); - if (task_will_free_mem(p)) - reap = true; - else { - /* Error only if the work has not been done already */ - if (!mm_flags_test(MMF_OOM_SKIP, mm)) + reap_kill = !!(flags & PROCESS_MRELEASE_REAP_KILL); + reap = task_will_free_mem(p, reap_kill); + if (!reap) { + if (reap_kill || !mm_flags_test(MMF_OOM_SKIP, mm)) ret = -EINVAL; + + task_unlock(p); + goto put_task; } - task_unlock(p); - if (!reap) - goto drop_mm; + if (reap_kill) { + /* + * We use mmget() instead of mmgrab() to keep mm_users > 0, + * preventing the victim from calling exit_mmap() in its + * own exit path. This ensures that the memory is reclaimed + * synchronously and deterministically by the reaper. + */ + mmget(mm); + task_unlock(p); + ret = kill_pid(task_tgid(task), SIGKILL, 0); + if (ret) + goto drop_mm; + } else { + mmgrab(mm); + task_unlock(p); + } if (mmap_read_lock_killable(mm)) { ret = -EINTR; @@ -1263,7 +1282,10 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) mmap_read_unlock(mm); drop_mm: - mmdrop(mm); + if (reap_kill) + mmput(mm); + else + mmdrop(mm); put_task: put_task_struct(task); return ret; -- 2.54.0.rc1.555.g9c883467ad-goog