DAMON development mailing list
 help / color / mirror / Atom feed
From: SeongJae Park <sj@kernel.org>
To: sashiko-bot@kernel.org
Cc: SeongJae Park <sj@kernel.org>,
	"Ravi Jonnalagadda" <ravis.opensrc@gmail.com>,
	damon@lists.linux.dev
Subject: Re: [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk
Date: Sun, 17 May 2026 16:46:14 -0700	[thread overview]
Message-ID: <20260517234616.89496-1-sj@kernel.org> (raw)
In-Reply-To: <20260516235545.18B71C2BCB7@smtp.kernel.org>

On Sat, 16 May 2026 23:55:44 +0000 sashiko-bot@kernel.org wrote:

> Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
> - [High] The amortized time budget check can be permanently bypassed due to variable PFN strides and loop continues.
> - [High] Early termination causes permanent starvation for the tail end of large DAMON regions.
> - [High] The time budget targets the wrong phase, failing to cap the actual latency spike of the migration.
> - [High] PFN advancement after folio conversion permanently misaligns the scan.
> --
> 
> commit 87860cbf6b4be5f161140aa9a1f9b7d0461955ba
> Author: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
> 
> mm/damon/paddr: add time budget to migration page walk
> 
> This patch adds a 100ms wall-clock time budget to damon_pa_migrate() to
> prevent long migration walks from causing high scheduling latencies. The
> time check is amortized by checking ktime_get() every 4096 pages.
> 
> > diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> > index e844c990987b9..a2565287bc10f 100644
> > --- a/mm/damon/paddr.c
> > +++ b/mm/damon/paddr.c
> [ ... ]
> > @@ -283,6 +293,11 @@ static unsigned long damon_pa_migrate(struct damon_region *r,
> >  			}
> >  		}
> >  
> > +		/* Time budget: keep kdamond responsive on long migration walks. */
> > +		if (!(pfn & DAMON_PA_MIGRATE_TIME_CHECK_MASK) &&
> > +		    ktime_after(ktime_get(), deadline))
> > +			break;
> 
> Can this time check be permanently bypassed?
> 
> If pfn advances by variable strides, such as when addr += folio_size(folio)
> adds a multi-page stride that jumps over the 4096-page boundary, the exact
> match !(pfn & DAMON_PA_MIGRATE_TIME_CHECK_MASK) is missed, skipping the
> check for the entire next 4096-page cycle.
> 
> Additionally, if pfn does land exactly on a multiple of 4096, but the
> pageblock is skipped via the continue statement just above this block:
> 
>         if (!page) {
>             addr += pageblock_nr_pages * PAGE_SIZE;
>             continue;
>         }
> 
> the loop advances and leaps over the time check.
> 
> If the loop terminates early here, is there a mechanism to save the PFN
> where the scan aborted?
> 
> Since the DAMON core is unaware of this internal early return, the next time
> the scheme applies to this region, it appears it will restart the walk from
> the very beginning of the region.
> 
> Could this result in a permanent livelock for large regions, where memory
> beyond the 100ms scan horizon is never reached or migrated?
> 
> >  		folio = damon_get_folio(pfn);
> >  		if (damon_pa_invalid_damos_folio(folio, s)) {
> [ ... ]
> >  		if (!folio_isolate_lru(folio))
> >  			goto put_folio;
> >  		list_add(&folio->lru, &folio_list);
> >  put_folio:
> >  		addr += folio_size(folio);
> 
> This wasn't introduced by this patch, but does the address advancement
> correctly handle tail pages?
> 
> Since the region start addr might not be folio-aligned, if it falls within
> a tail page of a large folio, does adding the total folio_size(folio)
> overshoot the end of the current folio and skip the head page of the next
> folio?

As I previously replied [1], I will work on this.

> 
> >  		folio_put(folio);
> >  	}
> >  	applied = damon_migrate_pages(&folio_list, s->target_nid);
> 
> Does the time budget cap the actual latency spike of the migration?
> 
> The 100ms budget limits the fast folio isolation phase. The loop can isolate
> gigabytes of memory within 100ms. Once the deadline is reached, all
> collected folios are passed to damon_migrate_pages().
> 
> Could migrating this massive batch, which requires memory allocation and
> page copying, take seconds to complete in a single blocking call and fail
> to keep the migration responsive?

I will revisit other Sashiko's detailed comments once we finish the high level
discussion and decide to pursue this direction.

[1] https://lore.kernel.org/20260517234112.89245-1-sj@kernel.org

> 
> -- 
> Sashiko AI review · https://sashiko.dev/#/patchset/20260516210357.2247-1-ravis.opensrc@gmail.com?part=5


Thanks,
SJ

  reply	other threads:[~2026-05-17 23:46 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-16 21:03 [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes Ravi Jonnalagadda
2026-05-16 21:03 ` [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum Ravi Jonnalagadda
2026-05-16 22:29   ` sashiko-bot
2026-05-17 18:21     ` SeongJae Park
2026-05-17 18:16   ` SeongJae Park
2026-05-16 21:03 ` [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory Ravi Jonnalagadda
2026-05-16 22:55   ` sashiko-bot
2026-05-17 18:48     ` SeongJae Park
2026-05-17 18:36   ` SeongJae Park
2026-05-18  5:22     ` Ravi Jonnalagadda
2026-05-19  0:38       ` SeongJae Park
2026-05-16 21:03 ` [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size Ravi Jonnalagadda
2026-05-17 18:47   ` SeongJae Park
2026-05-20 18:37     ` Ravi Jonnalagadda
2026-05-21  0:36       ` SeongJae Park
2026-05-16 21:03 ` [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk Ravi Jonnalagadda
2026-05-16 23:36   ` sashiko-bot
2026-05-17 23:41     ` SeongJae Park
2026-05-17 23:37   ` SeongJae Park
2026-05-18  5:38     ` Ravi Jonnalagadda
2026-05-19  1:14       ` SeongJae Park
2026-05-16 21:03 ` [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk Ravi Jonnalagadda
2026-05-16 23:55   ` sashiko-bot
2026-05-17 23:46     ` SeongJae Park [this message]
2026-05-17 23:43   ` SeongJae Park
2026-05-18  5:54     ` Ravi Jonnalagadda
2026-05-19  1:27       ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260517234616.89496-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=damon@lists.linux.dev \
    --cc=ravis.opensrc@gmail.com \
    --cc=sashiko-bot@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox