Re: [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Ravi Jonnalagadda" <ravis.opensrc@gmail.com>
Cc: damon@lists.linux.dev
Subject: Re: [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk
Date: Sat, 16 May 2026 23:55:44 +0000	[thread overview]
Message-ID: <20260516235545.18B71C2BCB7@smtp.kernel.org> (raw)
In-Reply-To: <20260516210357.2247-6-ravis.opensrc@gmail.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] The amortized time budget check can be permanently bypassed due to variable PFN strides and loop continues.
- [High] Early termination causes permanent starvation for the tail end of large DAMON regions.
- [High] The time budget targets the wrong phase, failing to cap the actual latency spike of the migration.
- [High] PFN advancement after folio conversion permanently misaligns the scan.
--

commit 87860cbf6b4be5f161140aa9a1f9b7d0461955ba
Author: Ravi Jonnalagadda <ravis.opensrc@gmail.com>

mm/damon/paddr: add time budget to migration page walk

This patch adds a 100ms wall-clock time budget to damon_pa_migrate() to
prevent long migration walks from causing high scheduling latencies. The
time check is amortized by checking ktime_get() every 4096 pages.

> diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> index e844c990987b9..a2565287bc10f 100644
> --- a/mm/damon/paddr.c
> +++ b/mm/damon/paddr.c
[ ... ]
> @@ -283,6 +293,11 @@ static unsigned long damon_pa_migrate(struct damon_region *r,
>  			}
>  		}
>  
> +		/* Time budget: keep kdamond responsive on long migration walks. */
> +		if (!(pfn & DAMON_PA_MIGRATE_TIME_CHECK_MASK) &&
> +		    ktime_after(ktime_get(), deadline))
> +			break;

Can this time check be permanently bypassed?

If pfn advances by variable strides, such as when addr += folio_size(folio)
adds a multi-page stride that jumps over the 4096-page boundary, the exact
match !(pfn & DAMON_PA_MIGRATE_TIME_CHECK_MASK) is missed, skipping the
check for the entire next 4096-page cycle.

Additionally, if pfn does land exactly on a multiple of 4096, but the
pageblock is skipped via the continue statement just above this block:

        if (!page) {
            addr += pageblock_nr_pages * PAGE_SIZE;
            continue;
        }

the loop advances and leaps over the time check.

If the loop terminates early here, is there a mechanism to save the PFN
where the scan aborted?

Since the DAMON core is unaware of this internal early return, the next time
the scheme applies to this region, it appears it will restart the walk from
the very beginning of the region.

Could this result in a permanent livelock for large regions, where memory
beyond the 100ms scan horizon is never reached or migrated?

>  		folio = damon_get_folio(pfn);
>  		if (damon_pa_invalid_damos_folio(folio, s)) {
[ ... ]
>  		if (!folio_isolate_lru(folio))
>  			goto put_folio;
>  		list_add(&folio->lru, &folio_list);
>  put_folio:
>  		addr += folio_size(folio);

This wasn't introduced by this patch, but does the address advancement
correctly handle tail pages?

Since the region start addr might not be folio-aligned, if it falls within
a tail page of a large folio, does adding the total folio_size(folio)
overshoot the end of the current folio and skip the head page of the next
folio?

>  		folio_put(folio);
>  	}
>  	applied = damon_migrate_pages(&folio_list, s->target_nid);

Does the time budget cap the actual latency spike of the migration?

The 100ms budget limits the fast folio isolation phase. The loop can isolate
gigabytes of memory within 100ms. Once the deadline is reached, all
collected folios are passed to damon_migrate_pages().

Could migrating this massive batch, which requires memory allocation and
page copying, take seconds to complete in a single blocking call and fail
to keep the migration responsive?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260516210357.2247-1-ravis.opensrc@gmail.com?part=5

     prev parent reply	other threads:[~2026-05-16 23:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-16 21:03 [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes Ravi Jonnalagadda
2026-05-16 21:03 ` [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum Ravi Jonnalagadda
2026-05-16 22:29   ` sashiko-bot
2026-05-16 21:03 ` [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory Ravi Jonnalagadda
2026-05-16 22:55   ` sashiko-bot
2026-05-16 21:03 ` [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size Ravi Jonnalagadda
2026-05-16 21:03 ` [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk Ravi Jonnalagadda
2026-05-16 23:36   ` sashiko-bot
2026-05-16 21:03 ` [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk Ravi Jonnalagadda
2026-05-16 23:55   ` sashiko-bot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260516235545.18B71C2BCB7@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=damon@lists.linux.dev \
    --cc=ravis.opensrc@gmail.com \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.