From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FF35C433FE for ; Wed, 27 Oct 2021 10:13:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 00260610A4 for ; Wed, 27 Oct 2021 10:13:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 00260610A4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 668F2940008; Wed, 27 Oct 2021 06:13:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5EF22940007; Wed, 27 Oct 2021 06:13:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B71D940008; Wed, 27 Oct 2021 06:13:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0004.hostedemail.com [216.40.44.4]) by kanga.kvack.org (Postfix) with ESMTP id 36BC4940007 for ; Wed, 27 Oct 2021 06:13:52 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id B681932343 for ; Wed, 27 Oct 2021 10:13:51 +0000 (UTC) X-FDA: 78741806262.06.2E046BE Received: from outbound-smtp22.blacknight.com (outbound-smtp22.blacknight.com [81.17.249.190]) by imf21.hostedemail.com (Postfix) with ESMTP id B28B8D036A47 for ; Wed, 27 Oct 2021 10:13:47 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152]) by outbound-smtp22.blacknight.com (Postfix) with ESMTPS id 6DE19BAB2A for ; Wed, 27 Oct 2021 11:13:49 +0100 (IST) Received: (qmail 13566 invoked from network); 27 Oct 2021 10:13:49 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.17.29]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 27 Oct 2021 10:13:48 -0000 Date: Wed, 27 Oct 2021 11:13:46 +0100 From: Mel Gorman To: NeilBrown Cc: Andrew Morton , Theodore Ts'o , Andreas Dilger , "Darrick J . Wong" , Matthew Wilcox , Michal Hocko , Dave Chinner , Rik van Riel , Vlastimil Babka , Johannes Weiner , Jonathan Corbet , Linux-MM , Linux-fsdevel , LKML Subject: Re: [PATCH v4 0/8] Remove dependency on congestion_wait in mm/ Message-ID: <20211027101346.GQ3959@techsingularity.net> References: <20211019090108.25501-1-mgorman@techsingularity.net> <163486531001.17149.13533181049212473096@noble.neil.brown.name> <20211022083927.GI3959@techsingularity.net> <163490199006.17149.17259708448207042563@noble.neil.brown.name> <20211022131732.GK3959@techsingularity.net> <163529540259.8576.9186192891154927096@noble.neil.brown.name> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <163529540259.8576.9186192891154927096@noble.neil.brown.name> User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B28B8D036A47 X-Stat-Signature: 31ny1u9jzccgnxqp4zxdrjfweyq6yqbx Authentication-Results: imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.190 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none X-HE-Tag: 1635329627-772509 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Oct 27, 2021 at 11:43:22AM +1100, NeilBrown wrote: > On Sat, 23 Oct 2021, Mel Gorman wrote: > > On Fri, Oct 22, 2021 at 10:26:30PM +1100, NeilBrown wrote: > > > On Fri, 22 Oct 2021, Mel Gorman wrote: > > > > On Fri, Oct 22, 2021 at 12:15:10PM +1100, NeilBrown wrote: > > > > > > > > > In general, I still don't like the use of wake_up_all(), though it won't > > > > > cause incorrect behaviour. > > > > > > > > > > > > > Removing wake_up_all would be tricky. > > > > > > I think there is a misunderstanding. Removing wake_up_all() is as > > > simple as > > > s/wake_up_all/wake_up/ > > > > > > If you used prepare_to_wait_exclusive(), then wake_up() would only wake > > > one waiter, while wake_up_all() would wake all of them. > > > As you use prepare_to_wait(), wake_up() will wake all waiters - as will > > > wake_up_all(). > > > > > > > Ok, yes, there was a misunderstanding. I thought you were suggesting a > > move to exclusive wakeups. I felt that the wake_up_all was explicit in > > terms of intent and that I really meant for all tasks to wake instead of > > one at a time. > > Fair enough. Thanks for changing it :-) > > But this prompts me to wonder if exclusive wakeups would be a good idea > - which is a useful springboard to try to understand the code better. > > For VMSCAN_THROTTLE_ISOLATED they probably are. > One pattern for reliable exclusive wakeups is for any thread that > received a wake-up to then consider sending a wake up. > > Two places receive VMSCAN_THROTTLE_ISOLATED wakeups and both then call > too_many_isolated() which - on success - sends another wakeup - before > the caller has had a chance to isolate anything. If, instead, the > wakeup was sent sometime later, after pages were isolated by before the > caller (isoloate_migratepages_block() or shrink_inactive_list()) > returned, then we would get an orderly progression of threads running > through that code. > That should work as the throttling condition is straight-forward. It might even reduce a race condition where waking all throttled tasks all then trigger the same "too many isolated" condition. > For VMSCAN_THROTTLE_WRITEBACK is a little less straight forward. > There are two different places that wait for the wakeup, and a wake_up > is sent to all waiters after a time proportional to the number of > waiters. It might make sense to wake one thread per time unit? I'd avoid time as a wakeup condition other than the timeout which is there to guarantee forward progress. I assume you mean "one thread per SWAP_CLUSTER_MAX completions". > That might work well for do_writepages - every SWAP_CLUSTER_MAX writes > triggers one wakeup. > I'm less sure that it would work for shrink_node(). Maybe the > shrink_node() waiters could be non-exclusive so they get woken as soon a > SWAP_CLUSTER_MAX writes complete, while do_writepages are exclusive and > get woken one at a time. > It should work for either with the slight caveat that the last waiter may not see SWAP_CLUSTER_MAX completions. > For VMSCAN_THROTTLE_NOPROGRESS .... I don't understand. > If one zone isn't making "enough" progress, we throttle before moving on > to the next zone. So we delay processing of the next zone, and only > indirectly delay re-processing of the current congested zone. > Maybe it make sense, but I don't see it yet. I note that the commit > message says "it's messy". I can't argue with that! > Yes, we delay the processing of the next zone when a given zone cannot make progress. The thinking is that circumstances that cause one zone to fail to make progress could spill over to other zones in the absense of any throttling. Where it might cause problems is where the preferred zone is very small. If a bug showed up like that, a potential fix would be to avoid throttling if the preferred zone is very small relative to the total amount of memory local to the node or total memory (preferably local node). > I'll follow up with patches to clarify what I am thinking about the > first two. I'm not proposing the patches, just presenting them as part > of improving my understanding. > If I'm cc'd, I'll review and if I think they're promising, I'll run them through the same tests and machines. -- Mel Gorman SUSE Labs