All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brendan Jackman <jackmanb@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	 Mel Gorman <mgorman@techsingularity.net>,
	Zi Yan <ziy@nvidia.com>, <linux-mm@kvack.org>,
	 <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/5] mm: page_alloc: defrag_mode
Date: Sun, 23 Mar 2025 19:04:29 +0100	[thread overview]
Message-ID: <D8NUEJHT150J.17YZMGLU54JG7@google.com> (raw)
In-Reply-To: <20250323034657.GD1894930@cmpxchg.org>

On Sun Mar 23, 2025 at 4:46 AM CET, Johannes Weiner wrote:
> On Sat, Mar 22, 2025 at 09:34:09PM -0400, Johannes Weiner wrote:
> > On Sat, Mar 22, 2025 at 08:58:27PM -0400, Johannes Weiner wrote:
> > > On Sat, Mar 22, 2025 at 04:05:52PM +0100, Brendan Jackman wrote:
> > > > On Thu Mar 13, 2025 at 10:05 PM CET, Johannes Weiner wrote:
> > > > > +	/* Reclaim/compaction failed to prevent the fallback */
> > > > > +	if (defrag_mode) {
> > > > > +		alloc_flags &= ALLOC_NOFRAGMENT;
> > > > > +		goto retry;
> > > > > +	}
> > > > 
> > > > I can't see where ALLOC_NOFRAGMENT gets cleared, is it supposed to be
> > > > here (i.e. should this be ~ALLOC_NOFRAGMENT)?
> > 
> > Please ignore my previous email, this is actually a much more severe
> > issue than I thought at first. The screwed up clearing is bad, but
> > this will also not check the flag before retrying, which means the
> > thread will retry reclaim/compaction and never reach OOM.
> > 
> > This code has weeks of load testing, with workloads fine-tuned to
> > *avoid* OOM. A blatant OOM test shows this problem immediately.
> > 
> > A simple fix, but I'll put it through the wringer before sending it.
>
> Ok, here is the patch. I verified this with intentional OOMing 100
> times in a loop; this would previously lock up on first try in
> defrag_mode, but kills and recovers reliably with this applied.
>
> I also re-ran the full THP benchmarks, to verify that erroneous
> looping here did not accidentally contribute to fragmentation
> avoidance and thus THP success & latency rates. They were in fact not;
> the improvements claimed for defrag_mode are unchanged with this fix:

Sounds good :)

Off topic, but could you share some details about the
tests/benchmarks you're running here? Do you have any links e.g. to
the scripts you're using to run them?


  reply	other threads:[~2025-03-23 18:04 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-13 21:05 [PATCH 0/5] mm: reliable huge page allocator Johannes Weiner
2025-03-13 21:05 ` [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers Johannes Weiner
2025-03-14 15:08   ` Zi Yan
2025-03-16  4:28   ` Hugh Dickins
2025-03-17 18:18     ` Johannes Weiner
2025-03-21  6:21   ` kernel test robot
2025-03-21 13:55     ` Johannes Weiner
2025-04-10 15:19   ` Vlastimil Babka
2025-04-10 20:17     ` Johannes Weiner
2025-04-11  7:32       ` Vlastimil Babka
2025-03-13 21:05 ` [PATCH 2/5] mm: page_alloc: trace type pollution from compaction capturing Johannes Weiner
2025-03-14 18:36   ` Zi Yan
2025-03-13 21:05 ` [PATCH 3/5] mm: page_alloc: defrag_mode Johannes Weiner
2025-03-14 18:54   ` Zi Yan
2025-03-14 20:50     ` Johannes Weiner
2025-03-14 22:54       ` Zi Yan
2025-03-22 15:05   ` Brendan Jackman
2025-03-23  0:58     ` Johannes Weiner
2025-03-23  1:34       ` Johannes Weiner
2025-03-23  3:46         ` Johannes Weiner
2025-03-23 18:04           ` Brendan Jackman [this message]
2025-03-31 15:55             ` Johannes Weiner
2025-03-13 21:05 ` [PATCH 4/5] mm: page_alloc: defrag_mode kswapd/kcompactd assistance Johannes Weiner
2025-03-13 21:05 ` [PATCH 5/5] mm: page_alloc: defrag_mode kswapd/kcompactd watermarks Johannes Weiner
2025-03-14 21:05   ` Johannes Weiner
2025-04-11  8:19   ` Vlastimil Babka
2025-04-11 15:39     ` Johannes Weiner
2025-04-11 16:51       ` Vlastimil Babka
2025-04-11 18:21         ` Johannes Weiner
2025-04-13  2:20           ` Johannes Weiner
2025-04-15  7:31             ` Vlastimil Babka
2025-04-15  7:44             ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D8NUEJHT150J.17YZMGLU54JG7@google.com \
    --to=jackmanb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.