From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 070782D73A6 for ; Wed, 14 Jan 2026 17:06:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.67 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768410421; cv=none; b=h7PTBNKljwP0YO2PY3ulE8D9GeiNcSyGTElA/IlTOJkMTQuOcCELYj289rZf5nRyajkZvgjT1z5IXFBDHU9/LVc7aTmMIpcdtBX/sO5kFt7D++0K6q2+XTXW9d/aZ31iJa3F0gufe6EHANaSm3P6oJJ9hdHDU2AZzKKPgzo8pb4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768410421; c=relaxed/simple; bh=A6bzYs1H0+5OFv59DsWcCt6zVkgchgEpVQValhdOOJo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=oSwG6MpLu6lDE1Om/HT44dqs4oyJsEucQ1gpSQlxpfD2VTBiHPa0MyXZxCyPig6Yylmor/I2bjo4yYcs+Fsy09B7QICr6l7URXPKlUKbqz/1QYtSsjaInAbKdfTJwlMabfo/nH3IXm6m5WGhjh249Bc0jXJBNkZcz3/xM63FQik= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=KTlIPcgy; arc=none smtp.client-ip=209.85.128.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="KTlIPcgy" Received: by mail-wm1-f67.google.com with SMTP id 5b1f17b1804b1-47d1d8a49f5so358315e9.3 for ; Wed, 14 Jan 2026 09:06:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1768410418; x=1769015218; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=W8rPwjfzU+1ry2RQl+0zGZMNXdxcPKW2Gfu/u1uR+Mw=; b=KTlIPcgyJGw5wz3tLPXh/RKnTqtkSYDNP9eDP2s0sQFdDeIzRn5HA2RmAnC65W84p4 7tSiK7HkAl8GGzPilJXGpyFFYKxZzDD+rFnhx5cOSjfym7CikawmgyDRcpX1/02DGgut Tk6J6O87jIRSfjvdIr+xW58mf1tE6T4sTVjy4Wfe+GxyM8T6jVxeeRcDR4OFWsghJ3L8 cG9RvGjSYDWnBoDns/AOzYpILXTXqMlvf8TXF0/CmigdHIfBmC9KxvrbZ2CxnnouyWAb tu2VUsQ7d8w9XpgxCRT2BugIO5/bX9VvKisr6PZXeMNrQugelm6COPU8WMR+62JlvL1p OHxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768410418; x=1769015218; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W8rPwjfzU+1ry2RQl+0zGZMNXdxcPKW2Gfu/u1uR+Mw=; b=HReL2nU5dE4DKlAu5//+qvsVzE7UA9Td2Rryve35mtYtNYqmWRtjlG9W1TWhUl8Nw2 S2QbFVp9O90PTvq4ZWFh51p9WgQTS6kyC+Uk7pDSyuvRHCnsS4qFCYNXI7MyTs4g8059 i9EJhSsXRzq6RrrSH4h3O3zFn6M42qi8K6iC+MP4mjVUtKiiKsnritgDLx7okxbfrzNE aAjVqipYZWNx2olWrZV2olizDHuvmiDOvLTMJTmjHM/jqy8AnREE3RE0e5i52YfWMzgx hGuu77/70G3YiKURcGTKxSdJ4ROXgOkYxto5WaYFSGpKuPHXJw3u91bnP3sZfeDvL6q0 mh+w== X-Forwarded-Encrypted: i=1; AJvYcCXBFmhoHBynW5pC7Q7c5mfuAxMYVFPEFyTzg18K/4ZMt2jD2raHsK8oKmo51nCTd+YHEBbvbDRtlUNV8TQ=@vger.kernel.org X-Gm-Message-State: AOJu0YzQseSi5n4dxt05QeQPPCC0+DOVujQBLwTxC9TZd7JTSGrZDfPO eqkcpZ5Hg8heWuDaOLmi1qJNOJxWNPwwgtshxvH1H6WsdmP2+yWC7iatph2K2x23NYQ= X-Gm-Gg: AY/fxX4Q71Q/Ox6eUbrsa8rTT0Ns4dnnu5PyeWLKkwx8IgLKJ27RG7e5FNAFoEjJckp c/FoQvaFETffnA135HGFxan0sxlso7+eeGOUwpIh1jF0Ve75IoSFKYIjE5n36Hz+Xb9GwyGHZEi HvJ2fV/BWINoTI9ikZmfB0cAnhlpNWtkpXXPMWCcVyP0FeWHzqLp+7ADGiLn7OPwXgmK2bCSUXQ ZCplHwiF4ryEz/UMWR+1eVOIo1Evk9PCnjJp97uzjXMfUo0f5MEYHRx7Guh31x1ojAZn7JtUJwv 4oodctAax9IlxAbe7u86iZcteLW5yvumMCk7ff4DeTEfXfAf8JGJE75LgPHUYah1NFtkmXxfb6P 4oe6swVI31yvS7dmG/RCMC1iPgrFqgiWOOI+KzuqY5yxOFyCZKrRnLG4sZwID9uwIdJqZhSrUll 0ueJ5MZUEx4/S+VGwtQblWQ9kk X-Received: by 2002:a05:600c:4f4d:b0:477:b48d:ba7a with SMTP id 5b1f17b1804b1-47ee338a84cmr41858385e9.32.1768410418292; Wed, 14 Jan 2026 09:06:58 -0800 (PST) Received: from localhost (109-81-19-111.rct.o2.cz. [109.81.19.111]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-47f428cc338sm1933365e9.11.2026.01.14.09.06.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Jan 2026 09:06:57 -0800 (PST) Date: Wed, 14 Jan 2026 18:06:56 +0100 From: Michal Hocko To: Mathieu Desnoyers Cc: Andrew Morton , linux-kernel@vger.kernel.org, "Paul E. McKenney" , Steven Rostedt , Masami Hiramatsu , Dennis Zhou , Tejun Heo , Christoph Lameter , Martin Liu , David Rientjes , christian.koenig@amd.com, Shakeel Butt , SeongJae Park , Johannes Weiner , Sweet Tea Dorminy , Lorenzo Stoakes , "Liam R . Howlett" , Mike Rapoport , Suren Baghdasaryan , Vlastimil Babka , Christian Brauner , Wei Yang , David Hildenbrand , Miaohe Lin , Al Viro , linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Yu Zhao , Roman Gushchin , Mateusz Guzik , Matthew Wilcox , Baolin Wang , Aboorva Devarajan Subject: Re: [PATCH v16 3/3] mm: Reduce latency of OOM killer task selection with 2-pass algorithm Message-ID: References: <20260114145915.49926-1-mathieu.desnoyers@efficios.com> <20260114145915.49926-4-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260114145915.49926-4-mathieu.desnoyers@efficios.com> On Wed 14-01-26 09:59:15, Mathieu Desnoyers wrote: > Use the hierarchical tree counter approximation (hpcc) to implement the > OOM killer task selection with a 2-pass algorithm. The first pass > selects the process that has the highest badness points approximation, > and the second pass compares each process using the current max badness > points approximation. > > The second pass uses an approximate comparison to eliminate all processes > which are below the current max badness points approximation accuracy > range. > > Summing the per-CPU counters to calculate the precise badness of tasks > is only required for tasks with an approximate badness within the > accuracy range of the current max points value. > > Limit to 16 the maximum number of badness sums allowed for an OOM killer > task selection before falling back to the approximated comparison. This > ensures bounded execution time for scenarios where many tasks have > badness within the accuracy of the maximum badness approximation. > > Testing the execution time of select_bad_process() with a single > tail -f /dev/zero: > > AMD EPYC 9654 96-Core (2 sockets) > Within a KVM, configured with 256 logical cpus. > > | precise sum | hpcc | > ----------------------------------|-------------|----------| > nr_processes=40 | 0.5 ms | 0.3 ms | > nr_processes=10000 | 80.0 ms | 7.9 ms | > > Tested with the following script: I am confused by these numbers. Are you saying that 2 pass over all tasks and evaluating all of them is 10 times faster than a single pass with exact sum of pcp counters? > > #!/bin/sh > > for a in $(seq 1 10); do (tail /dev/zero &); done > sleep 5 > for a in $(seq 1 10); do (tail /dev/zero &); done > sleep 2 > for a in $(seq 1 10); do (tail /dev/zero &); done > echo "Waiting for tasks to finish" > wait > > Results: OOM kill order on a 128GB memory system > ================================================ I find this section confusing as well. Is that before/after comparision. If yes it would be great to call out explicit behavior before and after. My overall impression is that the implementation is really involved and at this moment I do not really see a big benefit of all the complexity. It would help to explicitly mention what is the the overall imprecision of the oom victim selection with the new data structure (maybe this is good enough[*]). What if we go with exact precision with the new data structure comparing to the original pcp counters. [*] please keep in mind that oom victim selection is by no means an exact science, we try to pick up a task that is likely to free up some memory to unlock the system from memory depletion. We want that to be a big memory consumer to reduce number of tasks to kill and we want to roughly apply oom_score_adj. -- Michal Hocko SUSE Labs