From: Peter Xu <peterx@redhat.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Gavin Shan <gshan@redhat.com>,
Catalin Marinas <catalin.marinas@arm.com>,
x86@kernel.org, Ingo Molnar <mingo@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Alistair Popple <apopple@nvidia.com>,
kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
Sean Christopherson <seanjc@google.com>,
peterx@redhat.com, Oscar Salvador <osalvador@suse.de>,
Jason Gunthorpe <jgg@nvidia.com>, Borislav Petkov <bp@alien8.de>,
Zi Yan <ziy@nvidia.com>,
Axel Rasmussen <axelrasmussen@google.com>,
David Hildenbrand <david@redhat.com>,
Yan Zhao <yan.y.zhao@intel.com>, Will Deacon <will@kernel.org>,
Kefeng Wang <wangkefeng.wang@huawei.com>,
Alex Williamson <alex.williamson@redhat.com>
Subject: [PATCH v2 09/19] mm: New follow_pfnmap API
Date: Mon, 26 Aug 2024 16:43:43 -0400 [thread overview]
Message-ID: <20240826204353.2228736-10-peterx@redhat.com> (raw)
In-Reply-To: <20240826204353.2228736-1-peterx@redhat.com>
Introduce a pair of APIs to follow pfn mappings to get entry information.
It's very similar to what follow_pte() does before, but different in that
it recognizes huge pfn mappings.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/linux/mm.h | 31 ++++++++++
mm/memory.c | 150 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 181 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d900f15b7650..161d496bfd18 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2373,6 +2373,37 @@ int follow_pte(struct vm_area_struct *vma, unsigned long address,
int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
void *buf, int len, int write);
+struct follow_pfnmap_args {
+ /**
+ * Inputs:
+ * @vma: Pointer to @vm_area_struct struct
+ * @address: the virtual address to walk
+ */
+ struct vm_area_struct *vma;
+ unsigned long address;
+ /**
+ * Internals:
+ *
+ * The caller shouldn't touch any of these.
+ */
+ spinlock_t *lock;
+ pte_t *ptep;
+ /**
+ * Outputs:
+ *
+ * @pfn: the PFN of the address
+ * @pgprot: the pgprot_t of the mapping
+ * @writable: whether the mapping is writable
+ * @special: whether the mapping is a special mapping (real PFN maps)
+ */
+ unsigned long pfn;
+ pgprot_t pgprot;
+ bool writable;
+ bool special;
+};
+int follow_pfnmap_start(struct follow_pfnmap_args *args);
+void follow_pfnmap_end(struct follow_pfnmap_args *args);
+
extern void truncate_pagecache(struct inode *inode, loff_t new);
extern void truncate_setsize(struct inode *inode, loff_t newsize);
void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to);
diff --git a/mm/memory.c b/mm/memory.c
index 93c0c25433d0..0b136c398257 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6173,6 +6173,156 @@ int follow_pte(struct vm_area_struct *vma, unsigned long address,
}
EXPORT_SYMBOL_GPL(follow_pte);
+static inline void pfnmap_args_setup(struct follow_pfnmap_args *args,
+ spinlock_t *lock, pte_t *ptep,
+ pgprot_t pgprot, unsigned long pfn_base,
+ unsigned long addr_mask, bool writable,
+ bool special)
+{
+ args->lock = lock;
+ args->ptep = ptep;
+ args->pfn = pfn_base + ((args->address & ~addr_mask) >> PAGE_SHIFT);
+ args->pgprot = pgprot;
+ args->writable = writable;
+ args->special = special;
+}
+
+static inline void pfnmap_lockdep_assert(struct vm_area_struct *vma)
+{
+#ifdef CONFIG_LOCKDEP
+ struct address_space *mapping = vma->vm_file->f_mapping;
+
+ if (mapping)
+ lockdep_assert(lockdep_is_held(&vma->vm_file->f_mapping->i_mmap_rwsem) ||
+ lockdep_is_held(&vma->vm_mm->mmap_lock));
+ else
+ lockdep_assert(lockdep_is_held(&vma->vm_mm->mmap_lock));
+#endif
+}
+
+/**
+ * follow_pfnmap_start() - Look up a pfn mapping at a user virtual address
+ * @args: Pointer to struct @follow_pfnmap_args
+ *
+ * The caller needs to setup args->vma and args->address to point to the
+ * virtual address as the target of such lookup. On a successful return,
+ * the results will be put into other output fields.
+ *
+ * After the caller finished using the fields, the caller must invoke
+ * another follow_pfnmap_end() to proper releases the locks and resources
+ * of such look up request.
+ *
+ * During the start() and end() calls, the results in @args will be valid
+ * as proper locks will be held. After the end() is called, all the fields
+ * in @follow_pfnmap_args will be invalid to be further accessed. Further
+ * use of such information after end() may require proper synchronizations
+ * by the caller with page table updates, otherwise it can create a
+ * security bug.
+ *
+ * If the PTE maps a refcounted page, callers are responsible to protect
+ * against invalidation with MMU notifiers; otherwise access to the PFN at
+ * a later point in time can trigger use-after-free.
+ *
+ * Only IO mappings and raw PFN mappings are allowed. The mmap semaphore
+ * should be taken for read, and the mmap semaphore cannot be released
+ * before the end() is invoked.
+ *
+ * This function must not be used to modify PTE content.
+ *
+ * Return: zero on success, negative otherwise.
+ */
+int follow_pfnmap_start(struct follow_pfnmap_args *args)
+{
+ struct vm_area_struct *vma = args->vma;
+ unsigned long address = args->address;
+ struct mm_struct *mm = vma->vm_mm;
+ spinlock_t *lock;
+ pgd_t *pgdp;
+ p4d_t *p4dp, p4d;
+ pud_t *pudp, pud;
+ pmd_t *pmdp, pmd;
+ pte_t *ptep, pte;
+
+ pfnmap_lockdep_assert(vma);
+
+ if (unlikely(address < vma->vm_start || address >= vma->vm_end))
+ goto out;
+
+ if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+ goto out;
+retry:
+ pgdp = pgd_offset(mm, address);
+ if (pgd_none(*pgdp) || unlikely(pgd_bad(*pgdp)))
+ goto out;
+
+ p4dp = p4d_offset(pgdp, address);
+ p4d = READ_ONCE(*p4dp);
+ if (p4d_none(p4d) || unlikely(p4d_bad(p4d)))
+ goto out;
+
+ pudp = pud_offset(p4dp, address);
+ pud = READ_ONCE(*pudp);
+ if (pud_none(pud))
+ goto out;
+ if (pud_leaf(pud)) {
+ lock = pud_lock(mm, pudp);
+ if (!unlikely(pud_leaf(pud))) {
+ spin_unlock(lock);
+ goto retry;
+ }
+ pfnmap_args_setup(args, lock, NULL, pud_pgprot(pud),
+ pud_pfn(pud), PUD_MASK, pud_write(pud),
+ pud_special(pud));
+ return 0;
+ }
+
+ pmdp = pmd_offset(pudp, address);
+ pmd = pmdp_get_lockless(pmdp);
+ if (pmd_leaf(pmd)) {
+ lock = pmd_lock(mm, pmdp);
+ if (!unlikely(pmd_leaf(pmd))) {
+ spin_unlock(lock);
+ goto retry;
+ }
+ pfnmap_args_setup(args, lock, NULL, pmd_pgprot(pmd),
+ pmd_pfn(pmd), PMD_MASK, pmd_write(pmd),
+ pmd_special(pmd));
+ return 0;
+ }
+
+ ptep = pte_offset_map_lock(mm, pmdp, address, &lock);
+ if (!ptep)
+ goto out;
+ pte = ptep_get(ptep);
+ if (!pte_present(pte))
+ goto unlock;
+ pfnmap_args_setup(args, lock, ptep, pte_pgprot(pte),
+ pte_pfn(pte), PAGE_MASK, pte_write(pte),
+ pte_special(pte));
+ return 0;
+unlock:
+ pte_unmap_unlock(ptep, lock);
+out:
+ return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(follow_pfnmap_start);
+
+/**
+ * follow_pfnmap_end(): End a follow_pfnmap_start() process
+ * @args: Pointer to struct @follow_pfnmap_args
+ *
+ * Must be used in pair of follow_pfnmap_start(). See the start() function
+ * above for more information.
+ */
+void follow_pfnmap_end(struct follow_pfnmap_args *args)
+{
+ if (args->lock)
+ spin_unlock(args->lock);
+ if (args->ptep)
+ pte_unmap(args->ptep);
+}
+EXPORT_SYMBOL_GPL(follow_pfnmap_end);
+
#ifdef CONFIG_HAVE_IOREMAP_PROT
/**
* generic_access_phys - generic implementation for iomem mmap access
--
2.45.0
next prev parent reply other threads:[~2024-08-26 20:44 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-26 20:43 [PATCH v2 00/19] mm: Support huge pfnmaps Peter Xu
2024-08-26 20:43 ` [PATCH v2 01/19] mm: Introduce ARCH_SUPPORTS_HUGE_PFNMAP and special bits to pmd/pud Peter Xu
2024-08-26 20:43 ` [PATCH v2 02/19] mm: Drop is_huge_zero_pud() Peter Xu
2024-08-26 20:43 ` [PATCH v2 03/19] mm: Mark special bits for huge pfn mappings when inject Peter Xu
2024-08-28 15:31 ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 04/19] mm: Allow THP orders for PFNMAPs Peter Xu
2024-08-28 15:31 ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 05/19] mm/gup: Detect huge pfnmap entries in gup-fast Peter Xu
2024-08-26 20:43 ` [PATCH v2 06/19] mm/pagewalk: Check pfnmap for folio_walk_start() Peter Xu
2024-08-28 7:44 ` David Hildenbrand
2024-08-28 14:24 ` Peter Xu
2024-08-28 15:30 ` David Hildenbrand
2024-08-28 19:45 ` Peter Xu
2024-08-28 23:46 ` Jason Gunthorpe
2024-08-29 6:35 ` David Hildenbrand
2024-08-29 18:45 ` Peter Xu
2024-08-29 15:10 ` David Hildenbrand
2024-08-29 18:49 ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries Peter Xu
2024-08-29 15:10 ` David Hildenbrand
2024-08-29 18:26 ` Peter Xu
2024-08-29 19:44 ` David Hildenbrand
2024-08-29 20:01 ` Peter Xu
2024-09-02 7:58 ` Yan Zhao
2024-09-03 21:23 ` Peter Xu
2024-09-09 22:25 ` Andrew Morton
2024-09-09 22:43 ` Peter Xu
2024-09-09 23:15 ` Andrew Morton
2024-09-10 0:08 ` Peter Xu
2024-09-10 2:52 ` Yan Zhao
2024-09-10 12:16 ` Peter Xu
2024-09-11 2:16 ` Yan Zhao
2024-09-11 14:34 ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 08/19] mm: Always define pxx_pgprot() Peter Xu
2024-08-26 20:43 ` Peter Xu [this message]
2024-08-26 20:43 ` [PATCH v2 10/19] KVM: Use follow_pfnmap API Peter Xu
2024-08-26 20:43 ` [PATCH v2 11/19] s390/pci_mmio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 12/19] mm/x86/pat: Use the new " Peter Xu
2024-08-26 20:43 ` [PATCH v2 13/19] vfio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 14/19] acrn: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 15/19] mm/access_process_vm: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 16/19] mm: Remove follow_pte() Peter Xu
2024-09-01 4:33 ` Yu Zhao
2024-09-01 13:39 ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 17/19] mm/x86: Support large pfn mappings Peter Xu
2024-08-26 20:43 ` [PATCH v2 18/19] mm/arm64: " Peter Xu
2025-03-19 22:22 ` Keith Busch
2025-03-19 22:46 ` Peter Xu
2025-03-19 22:53 ` Keith Busch
2024-08-26 20:43 ` [PATCH v2 19/19] vfio/pci: Implement huge_fault support Peter Xu
2024-08-27 22:36 ` [PATCH v2 00/19] mm: Support huge pfnmaps Jiaqi Yan
2024-08-27 22:57 ` Peter Xu
2024-08-28 0:42 ` Jiaqi Yan
2024-08-28 0:46 ` Jiaqi Yan
2024-08-28 14:24 ` Jason Gunthorpe
2024-08-28 16:10 ` Jiaqi Yan
2024-08-28 23:49 ` Jason Gunthorpe
2024-08-29 19:21 ` Jiaqi Yan
2024-09-04 15:52 ` Jason Gunthorpe
2024-09-04 16:38 ` Jiaqi Yan
2024-09-04 16:43 ` Jason Gunthorpe
2024-09-04 16:58 ` Jiaqi Yan
2024-09-04 17:00 ` Jason Gunthorpe
2024-09-04 17:07 ` Jiaqi Yan
2024-09-09 3:56 ` Ankit Agrawal
2024-08-28 14:41 ` Peter Xu
2024-08-28 16:23 ` Jiaqi Yan
2024-09-09 4:03 ` Ankit Agrawal
2024-09-09 15:03 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240826204353.2228736-10-peterx@redhat.com \
--to=peterx@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=apopple@nvidia.com \
--cc=axelrasmussen@google.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=gshan@redhat.com \
--cc=jgg@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=osalvador@suse.de \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yan.y.zhao@intel.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).