From: Harry Yoo <harry.yoo@oracle.com>
To: Raghavendra K T <raghavendra.kt@amd.com>
Cc: AneeshKumar.KizhakeVeetil@arm.com, Hasan.Maruf@amd.com,
Michael.Day@amd.com, akpm@linux-foundation.org, bharata@amd.com,
dave.hansen@intel.com, david@redhat.com,
dongjoo.linux.dev@gmail.com, feng.tang@intel.com,
gourry@gourry.net, hannes@cmpxchg.org, honggyu.kim@sk.com,
hughd@google.com, jhubbard@nvidia.com, jon.grimm@amd.com,
k.shutemov@gmail.com, kbusch@meta.com, kmanaouil.dev@gmail.com,
leesuyeon0506@gmail.com, leillc@google.com,
liam.howlett@oracle.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, mgorman@techsingularity.net,
mingo@redhat.com, nadav.amit@gmail.com, nphamcs@gmail.com,
peterz@infradead.org, riel@surriel.com, rientjes@google.com,
rppt@kernel.org, santosh.shukla@amd.com, shivankg@amd.com,
shy828301@gmail.com, sj@kernel.org, vbabka@suse.cz,
weixugc@google.com, willy@infradead.org,
ying.huang@linux.alibaba.com, ziy@nvidia.com,
Jonathan.Cameron@huawei.com, dave@stgolabs.net,
yuanchu@google.com, kinseyho@google.com, hdanton@sina.com
Subject: Re: [RFC PATCH V2 03/13] mm: Scan the mm and create a migration list
Date: Thu, 26 Jun 2025 08:05:30 +0900 [thread overview]
Message-ID: <aFyAuvE-dksvHqlG@hyeyoo> (raw)
In-Reply-To: <aFxzEFhTDOyL4y0e@hyeyoo>
On Thu, Jun 26, 2025 at 07:07:12AM +0900, Harry Yoo wrote:
> On Tue, Jun 24, 2025 at 05:56:07AM +0000, Raghavendra K T wrote:
> > Since we already have the list of mm_struct in the system, add a module to
> > scan each mm that walks VMAs of each mm_struct and scan all the pages
> > associated with that.
> >
> > In the scan path: Check for the recently acccessed pages (folios) belonging
> > to slowtier nodes. Add all those folios to a list.
> >
> > Signed-off-by: Raghavendra K T <raghavendra.kt@amd.com>
> > ---
>
> Hi, just taking a quick look...
>
> > mm/kscand.c | 319 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 318 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/kscand.c b/mm/kscand.c
> > index d5b0d3041b0f..0edec1b7730d 100644
> > --- a/mm/kscand.c
> > +++ b/mm/kscand.c
> > @@ -42,6 +55,8 @@ static struct kmem_cache *kscand_slot_cache __read_mostly;
> > @@ -84,11 +122,275 @@ static void kscand_wait_work(void)
> > scan_sleep_jiffies);
> > }
> >
> > +static inline bool is_valid_folio(struct folio *folio)
> > +{
> > + if (!folio || folio_test_unevictable(folio) || !folio_mapped(folio) ||
> > + folio_is_zone_device(folio) || folio_maybe_mapped_shared(folio))
> > + return false;
> > +
> > + return true;
> > +}
>
> What makes it undesirable to migrate shared folios?
>
> > +static bool folio_idle_clear_pte_refs_one(struct folio *folio,
> > + struct vm_area_struct *vma,
> > + unsigned long addr,
> > + pte_t *ptep)
> > +{
> > + bool referenced = false;
> > + struct mm_struct *mm = vma->vm_mm;
> > + pmd_t *pmd = pmd_off(mm, addr);
> > +
> > + if (ptep) {
> > + if (ptep_clear_young_notify(vma, addr, ptep))
> > + referenced = true;
> > + } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > + if (!pmd_present(*pmd))
> > + WARN_ON_ONCE(1);
> > + if (pmdp_clear_young_notify(vma, addr, pmd))
> > + referenced = true;
> > + } else {
> > + WARN_ON_ONCE(1);
> > + }
>
> This does not look good.
>
> I think pmd entry handling should be handled in
> mm_walk_ops.pmd_entry callback?
>
> > +
> > + if (referenced) {
> > + folio_clear_idle(folio);
> > + folio_set_young(folio);
> > + }
> > +
> > + return true;
> > +}
> > +
> > +static void page_idle_clear_pte_refs(struct page *page, pte_t *pte, struct mm_walk *walk)
> > +{
> > + bool need_lock;
> > + struct folio *folio = page_folio(page);
> > + unsigned long address;
> > +
> > + if (!folio_mapped(folio) || !folio_raw_mapping(folio))
> > + return;
> > +
> > + need_lock = !folio_test_anon(folio) || folio_test_ksm(folio);
> > + if (need_lock && !folio_trylock(folio))
> > + return;
>
> Why acquire folio lock here?
>
> And I'm not even sure if it's safe to acquire it?
> The locking order is folio_lock -> pte_lock
>
> page walk should have already acquired pte_lock before calling
> ->pte_entry() callback.
Oops, it's trylock. Nevermind.
Needed more coffee in the morning :)
> > + address = vma_address(walk->vma, page_pgoff(folio, page), compound_nr(page));
> > + VM_BUG_ON_VMA(address == -EFAULT, walk->vma);
> > + folio_idle_clear_pte_refs_one(folio, walk->vma, address, pte);
> > +
> > + if (need_lock)
> > + folio_unlock(folio);
> > +}
> > +
> > +static const struct mm_walk_ops hot_vma_set_idle_ops = {
> > + .pte_entry = hot_vma_idle_pte_entry,
> > + .walk_lock = PGWALK_RDLOCK,
> > +};
> > +
> > +static void kscand_walk_page_vma(struct vm_area_struct *vma, struct kscand_scanctrl *scanctrl)
> > +{
> > + if (!vma_migratable(vma) || !vma_policy_mof(vma) ||
> > + is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) {
> > + return;
> > + }
> > + if (!vma->vm_mm ||
> > + (vma->vm_file && (vma->vm_flags & (VM_READ|VM_WRITE)) == (VM_READ)))
> > + return;
>
> Why not walk writable file VMAs?
>
> > + if (!vma_is_accessible(vma))
> > + return;
> > +
> > + walk_page_vma(vma, &hot_vma_set_idle_ops, scanctrl);
> > +}
>
> --
> Cheers,
> Harry / Hyeonggon
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2025-06-25 23:06 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-24 5:56 [RFC PATCH V2 00/13] mm: slowtier page promotion based on PTE A bit Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 01/13] mm: Add kscand kthread for PTE A bit scan Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 02/13] mm: Maintain mm_struct list in the system Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 03/13] mm: Scan the mm and create a migration list Raghavendra K T
2025-06-25 22:07 ` Harry Yoo
2025-06-25 23:05 ` Harry Yoo [this message]
2025-06-26 6:27 ` Raghavendra K T
2025-07-08 11:17 ` Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 04/13] mm: Create a separate kthread for migration Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 05/13] mm/migration: Migrate accessed folios to toptier node Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 06/13] mm: Add throttling of mm scanning using scan_period Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 07/13] mm: Add throttling of mm scanning using scan_size Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 08/13] mm: Add initial scan delay Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 09/13] mm: Add a heuristic to calculate target node Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 10/13] sysfs: Add sysfs support to tune scanning Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 11/13] vmstat: Add vmstat counters Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 12/13] trace/kscand: Add tracing of scanning and migration Raghavendra K T
2025-06-24 7:09 ` Masami Hiramatsu
2025-06-24 7:50 ` Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 13/13] prctl: Introduce new prctl to control scanning Raghavendra K T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aFyAuvE-dksvHqlG@hyeyoo \
--to=harry.yoo@oracle.com \
--cc=AneeshKumar.KizhakeVeetil@arm.com \
--cc=Hasan.Maruf@amd.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=Michael.Day@amd.com \
--cc=akpm@linux-foundation.org \
--cc=bharata@amd.com \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=dongjoo.linux.dev@gmail.com \
--cc=feng.tang@intel.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=hdanton@sina.com \
--cc=honggyu.kim@sk.com \
--cc=hughd@google.com \
--cc=jhubbard@nvidia.com \
--cc=jon.grimm@amd.com \
--cc=k.shutemov@gmail.com \
--cc=kbusch@meta.com \
--cc=kinseyho@google.com \
--cc=kmanaouil.dev@gmail.com \
--cc=leesuyeon0506@gmail.com \
--cc=leillc@google.com \
--cc=liam.howlett@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=nadav.amit@gmail.com \
--cc=nphamcs@gmail.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=santosh.shukla@amd.com \
--cc=shivankg@amd.com \
--cc=shy828301@gmail.com \
--cc=sj@kernel.org \
--cc=vbabka@suse.cz \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
--cc=yuanchu@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).