From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 170AB35839C for ; Thu, 23 Apr 2026 23:43:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776987831; cv=none; b=uKmbTnNpf+uvlXq46vjXK/l+dJ6wpjIpgI+frUoI6r3WsJpoFPyjIpOhc3xglbsAEApuhdNg2vB7KlD3BrqQ/OqU8KdtrOxm9Hejiculr6CxQH9m/MHIN9heTi2yMW5tw9/ySn1OfBPTuNlF2FW3byJLPlLZhT1lz4m3u7goiyE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776987831; c=relaxed/simple; bh=L8UjojQ4C9OHvxfnYPiaBUQe+uUACvjG8z63K3NPYcE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=mBZi1YRQd5VZAZ8idI/obi3Ou7+ySL7VLMx4Gl/5ZuNpTssHOMmcSEEHiJNCJOVah1wiqBdK8mennBUNXzwr3LiRPNfjdk+kEursSMHVJK5qSScxNoMI8ADVHNioQUEKM8Pm5bxaHQ6ETpCNJwE4bzZD3I3UmjUGfGc8Vhkw95k= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VdgSbMUb; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VdgSbMUb" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4776AC2BCAF; Thu, 23 Apr 2026 23:43:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776987830; bh=L8UjojQ4C9OHvxfnYPiaBUQe+uUACvjG8z63K3NPYcE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=VdgSbMUbmTZtntLFWfuHRn0Ckp56gsL/T/g1o4+oezFMJCeAnfvYUPSGxMsgFjXLU q1aBdJTqBOaVLEAFDl+VWbtlsGepfoz3K38b/I1/txOYjhW92x3UDsMlZ/wvA9vD1B JMdkgcGqpz0mUCOwffTZLnum87aPWLWy2jAJf74VQGssWigSRDrQzW91Rgof1c2JTR uFdh90VZt6WBKlI7zAVcgpmBjnhGyA1q9CXIMgOEUsyeDD1W2Wqz6ul1+7LK8XwUfR hUdgQy0LfeYjj7goWK0NjgbvjB4EZbg2VNMV22yz9JUwYx07eUaWL9CTtR+FUpNEo0 PhewFnAmCnU9A== Date: Thu, 23 Apr 2026 16:43:48 -0700 From: Minchan Kim To: Michal Hocko Cc: Christian Brauner , akpm@linux-foundation.org, david@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, surenb@google.com, timmurray@google.com Subject: Re: [RFC 3/3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag Message-ID: References: <20260413223948.556351-1-minchan@kernel.org> <20260413223948.556351-4-minchan@kernel.org> <20260416-planktont-abwinken-b9499483b939@brauner> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Thu, Apr 23, 2026 at 09:17:39AM +0200, Michal Hocko wrote: > On Mon 20-04-26 14:47:04, Minchan Kim wrote: > > On Fri, Apr 17, 2026 at 09:04:31AM +0200, Michal Hocko wrote: > > > On Thu 16-04-26 23:30:09, Minchan Kim wrote: > > > > If I send the SIGKILL first to satisfy the process_mrelease() requirement, > > > > we immediately run into the scheduling race condition where the victim can > > > > enter the exit path before the reaper can set the flag. > > > > > > Why don't you just grab the mm before you send the signal and then continue > > > with reaping? You just want to avoid a race where the victim manages to > > > process fatal signal, start its exit path and mrelease path losing that > > > race so you rely on the exit path, right? > > > > The problem is that process_mrelease() operates on a task obtained from a pidfd. > > > > Once the victim process receives the SIGKILL and enters the exit path (exit_mm), > > the kernel sets task->mm to NULL. > > > > Even if we could somehow hold a reference to the mm_struct beforehand, > > process_mrelease() would still fail because mm_struct via task returns NULL > > after exit_mm() has been called. > > > > Therefore, we cannot simply "grab the mm" before sending the signal and expect > > process_mrelease() to work after the victim starts exiting. > > I do not follow. Why cannot you simply do this I misunderstood your point. Do you mean this? https://lore.kernel.org/linux-mm/20260421230239.172582-4-minchan@kernel.org/ There are more details to figure out. > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 5c6c95c169ee..b80a96f5460a 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -1241,9 +1241,14 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) > if (task_will_free_mem(p)) > reap = true; > else { > + if (flags & PROCESS_MRELEASE_REAP_KILL) { > + } else { > + /* send SIGKILL */ > + reap = true; > /* Error only if the work has not been done already */ > - if (!mm_flags_test(MMF_OOM_SKIP, mm)) > - ret = -EINVAL; > + if (!mm_flags_test(MMF_OOM_SKIP, mm)) > + ret = -EINVAL; > + } > } > task_unlock(p); > > > -- > Michal Hocko > SUSE Labs