From: "David Hildenbrand (Red Hat)" <david@kernel.org>
To: SeongJae Park <sj@kernel.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
Jann Horn <jannh@google.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Michal Hocko <mhocko@suse.com>, Mike Rapoport <rppt@kernel.org>,
Pedro Falcato <pfalcato@suse.de>,
Suren Baghdasaryan <surenb@google.com>,
Vlastimil Babka <vbabka@suse.cz>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH v3 05/37] mm/{mprotect,memory}: (no upstream-aimed hack) implement MM_CP_DAMON
Date: Mon, 8 Dec 2025 12:19:41 +0100 [thread overview]
Message-ID: <3b7ff190-4efe-47d0-82fb-68135a031b0f@kernel.org> (raw)
In-Reply-To: <20251208062943.68824-6-sj@kernel.org>
On 12/8/25 07:29, SeongJae Park wrote:
> Note that this is not upstreamable as-is. This is only for helping
> discussion of other changes of its series.
>
> DAMON is using Accessed bits of page table entries as the major source
> of the access information. It lacks some additional information such as
> which CPU was making the access. Page faults could be another source of
> information for such additional information.
>
> Implement another change_protection() flag for such use cases, namely
> MM_CP_DAMON. DAMON will install PAGE_NONE protections using the flag.
> To avoid interfering with NUMA_BALANCING, which is also using PAGE_NON
> protection, pass the faults to DAMON only when NUMA_BALANCING is
> disabled.
>
> Again, this is not upstreamable as-is. There were comments about this
> on the previous version, and I was unable to take time on addressing
> those. As a result, this version is not addressing any of those
> previous comments. I'm sending this, though, to help discussions on
> patches of its series, except this one. Please forgive me adding this
> to your inbox without addressing your comments, and ignore. I will
> establish another discussion for this part later.
>
> Signed-off-by: SeongJae Park <sj@kernel.org>
> ---
> include/linux/mm.h | 1 +
> mm/memory.c | 60 ++++++++++++++++++++++++++++++++++++++++++++--
> mm/mprotect.c | 5 ++++
> 3 files changed, 64 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 553cf9f438f1..2cba5a0196da 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2848,6 +2848,7 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen);
> #define MM_CP_UFFD_WP_RESOLVE (1UL << 3) /* Resolve wp */
> #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \
> MM_CP_UFFD_WP_RESOLVE)
> +#define MM_CP_DAMON (1UL << 4)
>
> bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
> pte_t pte);
> diff --git a/mm/memory.c b/mm/memory.c
> index 6675e87eb7dd..5dc85adb1e59 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -78,6 +78,7 @@
> #include <linux/sched/sysctl.h>
> #include <linux/pgalloc.h>
> #include <linux/uaccess.h>
> +#include <linux/damon.h>
>
> #include <trace/events/kmem.h>
>
> @@ -6172,6 +6173,54 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
> return VM_FAULT_FALLBACK;
> }
>
> +/*
> + * NOTE: This is only poc purpose "hack" that will not be upstreamed as is.
> + * More discussions between all stakeholders including maintainers of MM core,
> + * NUMA balancing, and DAMON should be made to make this upstreamable.
> + * (https://lore.kernel.org/20251128193947.80866-1-sj@kernel.org)
> + *
> + * This function is called from page fault handler, for page faults on
> + * P{TE,MD}-protected but vma-accessible pages. DAMON is making the fake
> + * protection for access sampling purpose. This function simply clear the
> + * protection and report this access to DAMON, by calling
> + * damon_report_page_fault().
> + *
> + * The protection clear code is copied from NUMA fault handling code for PTE.
> + * Again, this is only poc purpose "hack" to show what information DAMON want
> + * from page fault events, rather than an upstream-aimed version.
> + */
> +static vm_fault_t do_damon_page(struct vm_fault *vmf, bool huge_pmd)
> +{
> + struct vm_area_struct *vma = vmf->vma;
> + struct folio *folio;
> + pte_t pte, old_pte;
> + bool writable = false, ignore_writable = false;
> + bool pte_write_upgrade = vma_wants_manual_pte_write_upgrade(vma);
> +
> + spin_lock(vmf->ptl);
> + old_pte = ptep_get(vmf->pte);
> + if (unlikely(!pte_same(old_pte, vmf->orig_pte))) {
> + pte_unmap_unlock(vmf->pte, vmf->ptl);
> + return 0;
> + }
> + pte = pte_modify(old_pte, vma->vm_page_prot);
> + writable = pte_write(pte);
> + if (!writable && pte_write_upgrade &&
> + can_change_pte_writable(vma, vmf->address, pte))
> + writable = true;
> + folio = vm_normal_folio(vma, vmf->address, pte);
> + if (folio && folio_test_large(folio))
> + numa_rebuild_large_mapping(vmf, vma, folio, pte,
> + ignore_writable, pte_write_upgrade);
> + else
> + numa_rebuild_single_mapping(vmf, vma, vmf->address, vmf->pte,
> + writable);
> + pte_unmap_unlock(vmf->pte, vmf->ptl);
> +
> + damon_report_page_fault(vmf, huge_pmd);
> + return 0;
> +}
> +
> /*
> * These routines also need to handle stuff like marking pages dirty
> * and/or accessed for architectures that don't do it in hardware (most
> @@ -6236,8 +6285,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> if (!pte_present(vmf->orig_pte))
> return do_swap_page(vmf);
>
> - if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma))
> + if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) {
> + if (sysctl_numa_balancing_mode == NUMA_BALANCING_DISABLED)
> + return do_damon_page(vmf, false);
> return do_numa_page(vmf);
> + }
>
> spin_lock(vmf->ptl);
> entry = vmf->orig_pte;
> @@ -6363,8 +6415,12 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> return 0;
> }
> if (pmd_trans_huge(vmf.orig_pmd)) {
> - if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma))
> + if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) {
> + if (sysctl_numa_balancing_mode ==
> + NUMA_BALANCING_DISABLED)
> + return do_damon_page(&vmf, true);
> return do_huge_pmd_numa_page(&vmf);
> + }
I recall that we had a similar discussion already. Ah, it was around
some arm MTE tag storage reuse [1].
The idea was to let do_*_numa_page() handle the restoring so we don't
end up with such duplicated code.
[1]
https://lore.kernel.org/all/20240125164256.4147-1-alexandru.elisei@arm.com/
--
Cheers
David
next prev parent reply other threads:[~2025-12-08 11:19 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-08 6:29 [RFC PATCH v3 00/37] mm/damon: introduce per-CPUs/threads/write/read monitoring SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 01/37] mm/damon/core: implement damon_report_access() SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 02/37] mm/damon: define struct damon_sample_control SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 03/37] mm/damon/core: commit damon_sample_control SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 04/37] mm/damon/core: implement damon_report_page_fault() SeongJae Park
2025-12-12 12:46 ` JaeJoon Jung
2025-12-12 22:47 ` SeongJae Park
2025-12-13 0:31 ` JaeJoon Jung
2025-12-13 0:56 ` SeongJae Park
2025-12-13 1:37 ` JaeJoon Jung
2025-12-08 6:29 ` [RFC PATCH v3 05/37] mm/{mprotect,memory}: (no upstream-aimed hack) implement MM_CP_DAMON SeongJae Park
2025-12-08 11:19 ` David Hildenbrand (Red Hat) [this message]
2025-12-09 4:56 ` SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 06/37] mm/damon/paddr: support page fault access check primitive SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 07/37] mm/damon/core: apply access reports to high level snapshot SeongJae Park
2025-12-12 13:20 ` JaeJoon Jung
2025-12-12 23:11 ` SeongJae Park
2025-12-13 1:10 ` JaeJoon Jung
2025-12-13 3:21 ` SeongJae Park
2025-12-13 4:09 ` JaeJoon Jung
2025-12-13 5:53 ` SeongJae Park
2025-12-13 9:17 ` SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 08/37] mm/damon/sysfs: implement monitoring_attrs/sample/ dir SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 09/37] mm/damon/sysfs: implement sample/primitives/ dir SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 10/37] mm/damon/sysfs: connect primitives directory with core SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 11/37] Docs/mm/damon/design: document page fault sampling primitive SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 12/37] Docs/admin-guide/mm/damon/usage: document sample primitives dir SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 13/37] mm/damon: extend damon_access_report for origin CPU reporting SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 14/37] mm/damon/core: report access origin cpu of page faults SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 15/37] mm/damon: implement sample filter data structure for cpus-only monitoring SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 16/37] mm/damon/core: implement damon_sample_filter manipulations SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 17/37] mm/damon/core: commit damon_sample_filters SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 18/37] mm/damon/core: apply sample filter to access reports SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 19/37] mm/damon/sysfs: implement sample/filters/ directory SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 20/37] mm/damon/sysfs: implement sample filter directory SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 21/37] mm/damon/sysfs: implement type, matching, allow files under sample filter dir SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 22/37] mm/damon/sysfs: implement cpumask file " SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 23/37] mm/damon/sysfs: connect sample filters with core layer SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 24/37] Docs/mm/damon/design: document sample filters SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 25/37] Docs/admin-guide/mm/damon/usage: document sample filters dir SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 26/37] mm/damon: extend damon_access_report for access-origin thread info SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 27/37] mm/damon/core: report access-generated thread id of the fault event SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 28/37] mm/damon: extend damon_sample_filter for threads SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 29/37] mm/damon/core: support threads type sample filter SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 30/37] mm/damon/sysfs: support thread based access sample filtering SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 31/37] Docs/mm/damon/design: document threads type sample filter SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 32/37] Docs/admin-guide/mm/damon/usage: document tids_arr file SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 33/37] mm/damon: support reporting write access SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 34/37] mm/damon/core: report whether the page fault was for writing SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 35/37] mm/damon/core: support write access sample filter SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 36/37] mm/damon/sysfs: support write-type " SeongJae Park
2025-12-08 6:29 ` [RFC PATCH v3 37/37] Docs/mm/damon/design: document write access sample filter type SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3b7ff190-4efe-47d0-82fb-68135a031b0f@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=pfalcato@suse.de \
--cc=rppt@kernel.org \
--cc=sj@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).