From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=+lVK=PP=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7FF35C433FE
	for <linux-mm@archiver.kernel.org>; Wed, 27 Oct 2021 10:13:53 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 00260610A4
	for <linux-mm@archiver.kernel.org>; Wed, 27 Oct 2021 10:13:52 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 00260610A4
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org
Received: by kanga.kvack.org (Postfix)
	id 668F2940008; Wed, 27 Oct 2021 06:13:52 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 5EF22940007; Wed, 27 Oct 2021 06:13:52 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 4B71D940008; Wed, 27 Oct 2021 06:13:52 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0004.hostedemail.com [216.40.44.4])
	by kanga.kvack.org (Postfix) with ESMTP id 36BC4940007
	for <linux-mm@kvack.org>; Wed, 27 Oct 2021 06:13:52 -0400 (EDT)
Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay02.hostedemail.com (Postfix) with ESMTP id B681932343
	for <linux-mm@kvack.org>; Wed, 27 Oct 2021 10:13:51 +0000 (UTC)
X-FDA: 78741806262.06.2E046BE
Received: from outbound-smtp22.blacknight.com (outbound-smtp22.blacknight.com [81.17.249.190])
	by imf21.hostedemail.com (Postfix) with ESMTP id B28B8D036A47
	for <linux-mm@kvack.org>; Wed, 27 Oct 2021 10:13:47 +0000 (UTC)
Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152])
	by outbound-smtp22.blacknight.com (Postfix) with ESMTPS id 6DE19BAB2A
	for <linux-mm@kvack.org>; Wed, 27 Oct 2021 11:13:49 +0100 (IST)
Received: (qmail 13566 invoked from network); 27 Oct 2021 10:13:49 -0000
Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.17.29])
  by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 27 Oct 2021 10:13:48 -0000
Date: Wed, 27 Oct 2021 11:13:46 +0100
From: Mel Gorman <mgorman@techsingularity.net>
To: NeilBrown <neilb@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	"Darrick J . Wong" <djwong@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>, Dave Chinner <david@fromorbit.com>,
	Rik van Riel <riel@surriel.com>, Vlastimil Babka <vbabka@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>, Linux-MM <linux-mm@kvack.org>,
	Linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 0/8] Remove dependency on congestion_wait in mm/
Message-ID: <20211027101346.GQ3959@techsingularity.net>
References: <20211019090108.25501-1-mgorman@techsingularity.net>
 <163486531001.17149.13533181049212473096@noble.neil.brown.name>
 <20211022083927.GI3959@techsingularity.net>
 <163490199006.17149.17259708448207042563@noble.neil.brown.name>
 <20211022131732.GK3959@techsingularity.net>
 <163529540259.8576.9186192891154927096@noble.neil.brown.name>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15
Content-Disposition: inline
In-Reply-To: <163529540259.8576.9186192891154927096@noble.neil.brown.name>
User-Agent: Mutt/1.10.1 (2018-07-13)
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: B28B8D036A47
X-Stat-Signature: 31ny1u9jzccgnxqp4zxdrjfweyq6yqbx
Authentication-Results: imf21.hostedemail.com;
	dkim=none;
	spf=pass (imf21.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.190 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net;
	dmarc=none
X-HE-Tag: 1635329627-772509
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Wed, Oct 27, 2021 at 11:43:22AM +1100, NeilBrown wrote:
> On Sat, 23 Oct 2021, Mel Gorman wrote:
> > On Fri, Oct 22, 2021 at 10:26:30PM +1100, NeilBrown wrote:
> > > On Fri, 22 Oct 2021, Mel Gorman wrote:
> > > > On Fri, Oct 22, 2021 at 12:15:10PM +1100, NeilBrown wrote:
> > > > 
> > > > > In general, I still don't like the use of wake_up_all(), though it won't
> > > > > cause incorrect behaviour.
> > > > > 
> > > > 
> > > > Removing wake_up_all would be tricky.
> > > 
> > > I think there is a misunderstanding.  Removing wake_up_all() is as
> > > simple as
> > >    s/wake_up_all/wake_up/
> > > 
> > > If you used prepare_to_wait_exclusive(), then wake_up() would only wake
> > > one waiter, while wake_up_all() would wake all of them.
> > > As you use prepare_to_wait(), wake_up() will wake all waiters - as will
> > > wake_up_all(). 
> > > 
> > 
> > Ok, yes, there was a misunderstanding. I thought you were suggesting a
> > move to exclusive wakeups. I felt that the wake_up_all was explicit in
> > terms of intent and that I really meant for all tasks to wake instead of
> > one at a time.
> 
> Fair enough.  Thanks for changing it :-)
> 
> But this prompts me to wonder if exclusive wakeups would be a good idea
> - which is a useful springboard to try to understand the code better.
> 
> For VMSCAN_THROTTLE_ISOLATED they probably are.
> One pattern for reliable exclusive wakeups is for any thread that
> received a wake-up to then consider sending a wake up.
> 
> Two places receive VMSCAN_THROTTLE_ISOLATED wakeups and both then call
> too_many_isolated() which - on success - sends another wakeup - before
> the caller has had a chance to isolate anything.  If, instead, the
> wakeup was sent sometime later, after pages were isolated by before the
> caller (isoloate_migratepages_block() or shrink_inactive_list())
> returned, then we would get an orderly progression of threads running
> through that code.
> 

That should work as the throttling condition is straight-forward. It
might even reduce a race condition where waking all throttled tasks all
then trigger the same "too many isolated" condition.

> For VMSCAN_THROTTLE_WRITEBACK is a little less straight forward.
> There are two different places that wait for the wakeup, and a wake_up
> is sent to all waiters after a time proportional to the number of
> waiters. It might make sense to wake one thread per time unit?

I'd avoid time as a wakeup condition other than the timeout which is
there to guarantee forward progress. I assume you mean "one thread per
SWAP_CLUSTER_MAX completions".

> That might work well for do_writepages - every SWAP_CLUSTER_MAX writes
> triggers one wakeup.
> I'm less sure that it would work for shrink_node().  Maybe the
> shrink_node() waiters could be non-exclusive so they get woken as soon a
> SWAP_CLUSTER_MAX writes complete, while do_writepages are exclusive and
> get woken one at a time.
> 

It should work for either with the slight caveat that the last waiter
may not see SWAP_CLUSTER_MAX completions.

> For VMSCAN_THROTTLE_NOPROGRESS .... I don't understand.
> If one zone isn't making "enough" progress, we throttle before moving on
> to the next zone.  So we delay processing of the next zone, and only
> indirectly delay re-processing of the current congested zone.
> Maybe it make sense, but I don't see it yet.  I note that the commit
> message says "it's messy".  I can't argue with that!
> 

Yes, we delay the processing of the next zone when a given zone cannot
make progress. The thinking is that circumstances that cause one zone to
fail to make progress could spill over to other zones in the absense of
any throttling. Where it might cause problems is where the preferred zone
is very small. If a bug showed up like that, a potential fix would be to
avoid throttling if the preferred zone is very small relative to the total
amount of memory local to the node or total memory (preferably local node).

> I'll follow up with patches to clarify what I am thinking about the
> first two.  I'm not proposing the patches, just presenting them as part
> of improving my understanding.
> 

If I'm cc'd, I'll review and if I think they're promising, I'll run them
through the same tests and machines.

-- 
Mel Gorman
SUSE Labs