From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, David Hildenbrand <david@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Alistair Popple <apopple@nvidia.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Zi Yan <ziy@nvidia.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Dan Williams <dan.j.williams@intel.com>,
Oscar Salvador <osalvador@suse.de>,
Jason Gunthorpe <jgg@nvidia.com>
Subject: [PATCH v3 2/3] mm/huge_memory: don't mark refcounted folios special in vmf_insert_folio_pmd()
Date: Fri, 13 Jun 2025 11:27:01 +0200 [thread overview]
Message-ID: <20250613092702.1943533-3-david@redhat.com> (raw)
In-Reply-To: <20250613092702.1943533-1-david@redhat.com>
Marking PMDs that map a "normal" refcounted folios as special is
against our rules documented for vm_normal_page(): normal (refcounted)
folios shall never have the page table mapping marked as special.
Fortunately, there are not that many pmd_special() check that can be
mislead, and most vm_normal_page_pmd()/vm_normal_folio_pmd() users that
would get this wrong right now are rather harmless: e.g., none so far
bases decisions whether to grab a folio reference on that decision.
Well, and GUP-fast will fallback to GUP-slow. All in all, so far no big
implications as it seems.
Getting this right will get more important as we use
folio_normal_page_pmd() in more places.
Fix it by teaching insert_pfn_pmd() to properly handle folios and
pfns -- moving refcount/mapcount/etc handling in there, renaming it to
insert_pmd(), and distinguishing between both cases using a new simple
"struct folio_or_pfn" structure.
Use folio_mk_pmd() to create a pmd for a folio cleanly.
Fixes: 6c88f72691f8 ("mm/huge_memory: add vmf_insert_folio_pmd()")
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/huge_memory.c | 59 ++++++++++++++++++++++++++++++++----------------
1 file changed, 40 insertions(+), 19 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 49b98082c5401..d1e3e253c714a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1372,9 +1372,17 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
return __do_huge_pmd_anonymous_page(vmf);
}
-static int insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
- pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write,
- pgtable_t pgtable)
+struct folio_or_pfn {
+ union {
+ struct folio *folio;
+ pfn_t pfn;
+ };
+ bool is_folio;
+};
+
+static int insert_pmd(struct vm_area_struct *vma, unsigned long addr,
+ pmd_t *pmd, struct folio_or_pfn fop, pgprot_t prot,
+ bool write, pgtable_t pgtable)
{
struct mm_struct *mm = vma->vm_mm;
pmd_t entry;
@@ -1382,8 +1390,11 @@ static int insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
lockdep_assert_held(pmd_lockptr(mm, pmd));
if (!pmd_none(*pmd)) {
+ const unsigned long pfn = fop.is_folio ? folio_pfn(fop.folio) :
+ pfn_t_to_pfn(fop.pfn);
+
if (write) {
- if (pmd_pfn(*pmd) != pfn_t_to_pfn(pfn)) {
+ if (pmd_pfn(*pmd) != pfn) {
WARN_ON_ONCE(!is_huge_zero_pmd(*pmd));
return -EEXIST;
}
@@ -1396,11 +1407,20 @@ static int insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
return -EEXIST;
}
- entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
- if (pfn_t_devmap(pfn))
- entry = pmd_mkdevmap(entry);
- else
- entry = pmd_mkspecial(entry);
+ if (fop.is_folio) {
+ entry = folio_mk_pmd(fop.folio, vma->vm_page_prot);
+
+ folio_get(fop.folio);
+ folio_add_file_rmap_pmd(fop.folio, &fop.folio->page, vma);
+ add_mm_counter(mm, mm_counter_file(fop.folio), HPAGE_PMD_NR);
+ } else {
+ entry = pmd_mkhuge(pfn_t_pmd(fop.pfn, prot));
+
+ if (pfn_t_devmap(fop.pfn))
+ entry = pmd_mkdevmap(entry);
+ else
+ entry = pmd_mkspecial(entry);
+ }
if (write) {
entry = pmd_mkyoung(pmd_mkdirty(entry));
entry = maybe_pmd_mkwrite(entry, vma);
@@ -1431,6 +1451,9 @@ vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write)
unsigned long addr = vmf->address & PMD_MASK;
struct vm_area_struct *vma = vmf->vma;
pgprot_t pgprot = vma->vm_page_prot;
+ struct folio_or_pfn fop = {
+ .pfn = pfn,
+ };
pgtable_t pgtable = NULL;
spinlock_t *ptl;
int error;
@@ -1458,8 +1481,8 @@ vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write)
pfnmap_setup_cachemode_pfn(pfn_t_to_pfn(pfn), &pgprot);
ptl = pmd_lock(vma->vm_mm, vmf->pmd);
- error = insert_pfn_pmd(vma, addr, vmf->pmd, pfn, pgprot, write,
- pgtable);
+ error = insert_pmd(vma, addr, vmf->pmd, fop, pgprot, write,
+ pgtable);
spin_unlock(ptl);
if (error && pgtable)
pte_free(vma->vm_mm, pgtable);
@@ -1474,6 +1497,10 @@ vm_fault_t vmf_insert_folio_pmd(struct vm_fault *vmf, struct folio *folio,
struct vm_area_struct *vma = vmf->vma;
unsigned long addr = vmf->address & PMD_MASK;
struct mm_struct *mm = vma->vm_mm;
+ struct folio_or_pfn fop = {
+ .folio = folio,
+ .is_folio = true,
+ };
spinlock_t *ptl;
pgtable_t pgtable = NULL;
int error;
@@ -1491,14 +1518,8 @@ vm_fault_t vmf_insert_folio_pmd(struct vm_fault *vmf, struct folio *folio,
}
ptl = pmd_lock(mm, vmf->pmd);
- if (pmd_none(*vmf->pmd)) {
- folio_get(folio);
- folio_add_file_rmap_pmd(folio, &folio->page, vma);
- add_mm_counter(mm, mm_counter_file(folio), HPAGE_PMD_NR);
- }
- error = insert_pfn_pmd(vma, addr, vmf->pmd,
- pfn_to_pfn_t(folio_pfn(folio)), vma->vm_page_prot,
- write, pgtable);
+ error = insert_pmd(vma, addr, vmf->pmd, fop, vma->vm_page_prot,
+ write, pgtable);
spin_unlock(ptl);
if (error && pgtable)
pte_free(mm, pgtable);
--
2.49.0
next prev parent reply other threads:[~2025-06-13 9:27 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-13 9:26 [PATCH v3 0/3] mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes David Hildenbrand
2025-06-13 9:27 ` [PATCH v3 1/3] mm/huge_memory: don't ignore queried cachemode in vmf_insert_pfn_pud() David Hildenbrand
2025-06-13 13:34 ` Oscar Salvador
2025-06-13 9:27 ` David Hildenbrand [this message]
2025-06-13 13:49 ` [PATCH v3 2/3] mm/huge_memory: don't mark refcounted folios special in vmf_insert_folio_pmd() Oscar Salvador
2025-06-13 13:51 ` Oscar Salvador
2025-06-13 13:53 ` David Hildenbrand
2025-06-13 14:00 ` Lorenzo Stoakes
2025-06-13 16:06 ` David Hildenbrand
2025-06-13 9:27 ` [PATCH v3 3/3] mm/huge_memory: don't mark refcounted folios special in vmf_insert_folio_pud() David Hildenbrand
2025-06-13 14:01 ` Oscar Salvador
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250613092702.1943533-3-david@redhat.com \
--to=david@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=dan.j.williams@intel.com \
--cc=dev.jain@arm.com \
--cc=jgg@nvidia.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=nvdimm@lists.linux.dev \
--cc=osalvador@suse.de \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.