Re: [PATCH v4 0/8] Remove dependency on congestion_wait in mm/

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@techsingularity.net>
To: NeilBrown <neilb@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	"Darrick J . Wong" <djwong@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Dave Chinner <david@fromorbit.com>,
	Rik van Riel <riel@surriel.com>, Vlastimil Babka <vbabka@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>, Linux-MM <linux-mm@kvack.org>,
	Linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 0/8] Remove dependency on congestion_wait in mm/
Date: Wed, 27 Oct 2021 11:13:46 +0100	[thread overview]
Message-ID: <20211027101346.GQ3959@techsingularity.net> (raw)
In-Reply-To: <163529540259.8576.9186192891154927096@noble.neil.brown.name>

On Wed, Oct 27, 2021 at 11:43:22AM +1100, NeilBrown wrote:
> On Sat, 23 Oct 2021, Mel Gorman wrote:
> > On Fri, Oct 22, 2021 at 10:26:30PM +1100, NeilBrown wrote:
> > > On Fri, 22 Oct 2021, Mel Gorman wrote:
> > > > On Fri, Oct 22, 2021 at 12:15:10PM +1100, NeilBrown wrote:
> > > > 
> > > > > In general, I still don't like the use of wake_up_all(), though it won't
> > > > > cause incorrect behaviour.
> > > > > 
> > > > 
> > > > Removing wake_up_all would be tricky.
> > > 
> > > I think there is a misunderstanding.  Removing wake_up_all() is as
> > > simple as
> > >    s/wake_up_all/wake_up/
> > > 
> > > If you used prepare_to_wait_exclusive(), then wake_up() would only wake
> > > one waiter, while wake_up_all() would wake all of them.
> > > As you use prepare_to_wait(), wake_up() will wake all waiters - as will
> > > wake_up_all(). 
> > > 
> > 
> > Ok, yes, there was a misunderstanding. I thought you were suggesting a
> > move to exclusive wakeups. I felt that the wake_up_all was explicit in
> > terms of intent and that I really meant for all tasks to wake instead of
> > one at a time.
> 
> Fair enough.  Thanks for changing it :-)
> 
> But this prompts me to wonder if exclusive wakeups would be a good idea
> - which is a useful springboard to try to understand the code better.
> 
> For VMSCAN_THROTTLE_ISOLATED they probably are.
> One pattern for reliable exclusive wakeups is for any thread that
> received a wake-up to then consider sending a wake up.
> 
> Two places receive VMSCAN_THROTTLE_ISOLATED wakeups and both then call
> too_many_isolated() which - on success - sends another wakeup - before
> the caller has had a chance to isolate anything.  If, instead, the
> wakeup was sent sometime later, after pages were isolated by before the
> caller (isoloate_migratepages_block() or shrink_inactive_list())
> returned, then we would get an orderly progression of threads running
> through that code.
> 

That should work as the throttling condition is straight-forward. It
might even reduce a race condition where waking all throttled tasks all
then trigger the same "too many isolated" condition.

> For VMSCAN_THROTTLE_WRITEBACK is a little less straight forward.
> There are two different places that wait for the wakeup, and a wake_up
> is sent to all waiters after a time proportional to the number of
> waiters. It might make sense to wake one thread per time unit?

I'd avoid time as a wakeup condition other than the timeout which is
there to guarantee forward progress. I assume you mean "one thread per
SWAP_CLUSTER_MAX completions".

> That might work well for do_writepages - every SWAP_CLUSTER_MAX writes
> triggers one wakeup.
> I'm less sure that it would work for shrink_node().  Maybe the
> shrink_node() waiters could be non-exclusive so they get woken as soon a
> SWAP_CLUSTER_MAX writes complete, while do_writepages are exclusive and
> get woken one at a time.
> 

It should work for either with the slight caveat that the last waiter
may not see SWAP_CLUSTER_MAX completions.

> For VMSCAN_THROTTLE_NOPROGRESS .... I don't understand.
> If one zone isn't making "enough" progress, we throttle before moving on
> to the next zone.  So we delay processing of the next zone, and only
> indirectly delay re-processing of the current congested zone.
> Maybe it make sense, but I don't see it yet.  I note that the commit
> message says "it's messy".  I can't argue with that!
> 

Yes, we delay the processing of the next zone when a given zone cannot
make progress. The thinking is that circumstances that cause one zone to
fail to make progress could spill over to other zones in the absense of
any throttling. Where it might cause problems is where the preferred zone
is very small. If a bug showed up like that, a potential fix would be to
avoid throttling if the preferred zone is very small relative to the total
amount of memory local to the node or total memory (preferably local node).

> I'll follow up with patches to clarify what I am thinking about the
> first two.  I'm not proposing the patches, just presenting them as part
> of improving my understanding.
> 

If I'm cc'd, I'll review and if I think they're promising, I'll run them
through the same tests and machines.

-- 
Mel Gorman
SUSE Labs

     prev parent reply	other threads:[~2021-10-27 10:13 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-19  9:01 [PATCH v4 0/8] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-10-19  9:01 ` [PATCH 1/8] mm/vmscan: Throttle reclaim until some writeback completes if congested Mel Gorman
2021-10-19  9:01 ` [PATCH 2/8] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated Mel Gorman
2021-10-19 17:12   ` Yang Shi
2021-10-19  9:01 ` [PATCH 3/8] mm/vmscan: Throttle reclaim when no progress is being made Mel Gorman
2021-10-19  9:01 ` [PATCH 4/8] mm/writeback: Throttle based on page writeback instead of congestion Mel Gorman
2021-10-19  9:01 ` [PATCH 5/8] mm/page_alloc: Remove the throttling logic from the page allocator Mel Gorman
2021-10-19  9:20   ` Vlastimil Babka
2021-10-19  9:01 ` [PATCH 6/8] mm/vmscan: Centralise timeout values for reclaim_throttle Mel Gorman
2021-10-22  1:06   ` NeilBrown
2021-10-22  8:12     ` Mel Gorman
2021-10-19  9:01 ` [PATCH 7/8] mm/vmscan: Increase the timeout if page reclaim is not making progress Mel Gorman
2021-10-22  1:07   ` NeilBrown
2021-10-22  8:14     ` Mel Gorman
2021-10-19  9:01 ` [PATCH 8/8] mm/vmscan: Delay waking of tasks throttled on NOPROGRESS Mel Gorman
2021-10-19 22:00 ` [PATCH v4 0/8] Remove dependency on congestion_wait in mm/ Andrew Morton
2021-10-20  8:44   ` Mel Gorman
2021-10-22  1:15 ` NeilBrown
2021-10-22  8:39   ` Mel Gorman
2021-10-22 11:26     ` NeilBrown
2021-10-22 13:17       ` Mel Gorman
2021-10-27  0:43         ` NeilBrown
2021-10-27 10:13           ` Mel Gorman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211027101346.GQ3959@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=neilb@suse.de \
    --cc=riel@surriel.com \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).