From: Oscar Salvador <osalvador@suse.de>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Alistair Popple <apopple@nvidia.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Zi Yan <ziy@nvidia.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH v1 1/2] mm/huge_memory: don't mark refcounted pages special in vmf_insert_folio_pmd()
Date: Fri, 6 Jun 2025 10:20:13 +0200 [thread overview]
Message-ID: <aEKkvdSAplmukcXz@localhost.localdomain> (raw)
In-Reply-To: <20250603211634.2925015-2-david@redhat.com>
On Tue, Jun 03, 2025 at 11:16:33PM +0200, David Hildenbrand wrote:
> Marking PMDs that map a "normal" refcounted folios as special is
> against our rules documented for vm_normal_page().
>
> Fortunately, there are not that many pmd_special() check that can be
> mislead, and most vm_normal_page_pmd()/vm_normal_folio_pmd() users that
> would get this wrong right now are rather harmless: e.g., none so far
> bases decisions whether to grab a folio reference on that decision.
>
> Well, and GUP-fast will fallback to GUP-slow. All in all, so far no big
> implications as it seems.
>
> Getting this right will get more important as we use
> folio_normal_page_pmd() in more places.
>
> Fix it by just inlining the relevant code, making the whole
> pmd_none() handling cleaner. We can now use folio_mk_pmd().
>
> While at it, make sure that a pmd that is not-none is actually present
> before comparing PFNs.
>
> Fixes: 6c88f72691f8 ("mm/huge_memory: add vmf_insert_folio_pmd()")
> Signed-off-by: David Hildenbrand <david@redhat.com>
Hi David,
> ---
> mm/huge_memory.c | 39 ++++++++++++++++++++++++++++++++-------
> 1 file changed, 32 insertions(+), 7 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d3e66136e41a3..f9e23dfea76f8 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1474,9 +1474,10 @@ vm_fault_t vmf_insert_folio_pmd(struct vm_fault *vmf, struct folio *folio,
> struct vm_area_struct *vma = vmf->vma;
> unsigned long addr = vmf->address & PMD_MASK;
> struct mm_struct *mm = vma->vm_mm;
> + pmd_t *pmd = vmf->pmd;
> spinlock_t *ptl;
> pgtable_t pgtable = NULL;
> - int error;
> + pmd_t entry;
>
> if (addr < vma->vm_start || addr >= vma->vm_end)
> return VM_FAULT_SIGBUS;
> @@ -1490,17 +1491,41 @@ vm_fault_t vmf_insert_folio_pmd(struct vm_fault *vmf, struct folio *folio,
> return VM_FAULT_OOM;
> }
>
> - ptl = pmd_lock(mm, vmf->pmd);
> - if (pmd_none(*vmf->pmd)) {
> + ptl = pmd_lock(mm, pmd);
> + if (pmd_none(*pmd)) {
> folio_get(folio);
> folio_add_file_rmap_pmd(folio, &folio->page, vma);
> add_mm_counter(mm, mm_counter_file(folio), HPAGE_PMD_NR);
> +
> + entry = folio_mk_pmd(folio, vma->vm_page_prot);
> + if (write) {
> + entry = pmd_mkyoung(pmd_mkdirty(entry));
> + entry = maybe_pmd_mkwrite(entry, vma);
> + }
> + set_pmd_at(mm, addr, pmd, entry);
> + update_mmu_cache_pmd(vma, addr, pmd);
> +
> + if (pgtable) {
> + pgtable_trans_huge_deposit(mm, pmd, pgtable);
> + mm_inc_nr_ptes(mm);
> + pgtable = NULL;
> + }
> + } else if (pmd_present(*pmd) && write) {
> + /*
> + * We only allow for upgrading write permissions if the
> + * same folio is already mapped.
> + */
> + if (pmd_pfn(*pmd) == folio_pfn(folio)) {
> + entry = pmd_mkyoung(*pmd);
> + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
> + if (pmdp_set_access_flags(vma, addr, pmd, entry, 1))
> + update_mmu_cache_pmd(vma, addr, pmd);
> + } else {
> + WARN_ON_ONCE(!is_huge_zero_pmd(*pmd));
> + }
So, this is pretty much insert_pfn_pmd without pmd_mkdevmap/pmd_mkspecial().
I guess vmf_inser_folio_pmd() doesn't have to be concerned with devmaps
either, right?
Looks good to me, just a nit: would it not be better to pass a boolean
to insert_pfn_pmd() that lets it know whether it "can" create a
devmap/special entries?
--
Oscar Salvador
SUSE Labs
next prev parent reply other threads:[~2025-06-06 8:20 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-03 21:16 [PATCH v1 0/2] mm/huge_memory: don't mark refcounted pages special in vmf_insert_folio_*() David Hildenbrand
2025-06-03 21:16 ` [PATCH v1 1/2] mm/huge_memory: don't mark refcounted pages special in vmf_insert_folio_pmd() David Hildenbrand
2025-06-06 8:20 ` Oscar Salvador [this message]
2025-06-06 8:23 ` David Hildenbrand
2025-06-06 8:26 ` Oscar Salvador
2025-06-06 8:52 ` David Hildenbrand
2025-06-06 18:41 ` David Hildenbrand
2025-06-06 8:27 ` Oscar Salvador
2025-06-03 21:16 ` [PATCH v1 2/2] mm/huge_memory: don't mark refcounted pages special in vmf_insert_folio_pud() David Hildenbrand
2025-06-03 22:02 ` David Hildenbrand
2025-06-06 8:27 ` Oscar Salvador
2025-06-03 21:36 ` [PATCH v1 0/2] mm/huge_memory: don't mark refcounted pages special in vmf_insert_folio_*() David Hildenbrand
2025-06-05 23:47 ` Dan Williams
2025-06-06 7:28 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aEKkvdSAplmukcXz@localhost.localdomain \
--to=osalvador@suse.de \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=nvdimm@lists.linux.dev \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.