All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Matthew Wilcox <willy@infradead.org>, Guo Ren <guoren@kernel.org>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	"David S . Miller" <davem@davemloft.net>,
	Andreas Larsson <andreas@gaisler.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Nicolas Pitre <nico@fluxnic.net>,
	Muchun Song <muchun.song@linux.dev>,
	Oscar Salvador <osalvador@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Konstantin Komarov <almaz.alexandrovich@paragon-software.com>,
	Baoquan He <bhe@redhat.com>, Vivek Goyal <vgoyal@redhat.com>,
	Dave Young <dyoung@redhat.com>, Tony Luck <tony.luck@intel.com>,
	Reinette Chatre <reinette.chatre@intel.com>,
	Dave Martin <Dave.Martin@arm.com>,
	James Morse <james.morse@arm.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Hugh Dickins <hughd@google.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	Andrey Konovalov <andreyknvl@gmail.com>,
	Jann Horn <jannh@google.com>, Pedro Falcato <pfalcato@suse.de>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-csky@vger.kernel.org,
	linux-mips@vger.kernel.org, linux-s390@vger.kernel.org,
	sparclinux@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-mm@kvack.org,
	ntfs3@lists.linux.dev, kexec@lists.infradead.org,
	kasan-dev@googlegroups.com, iommu@lists.linux.dev,
	Kevin Tian <kevin.tian@intel.com>, Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>
Subject: Re: [PATCH v3 06/13] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete()
Date: Tue, 16 Sep 2025 14:07:23 -0300	[thread overview]
Message-ID: <20250916170723.GO1086830@nvidia.com> (raw)
In-Reply-To: <7c050219963aade148332365f8d2223f267dd89a.1758031792.git.lorenzo.stoakes@oracle.com>

On Tue, Sep 16, 2025 at 03:11:52PM +0100, Lorenzo Stoakes wrote:
> We need the ability to split PFN remap between updating the VMA and
> performing the actual remap, in order to do away with the legacy
> f_op->mmap hook.
> 
> To do so, update the PFN remap code to provide shared logic, and also make
> remap_pfn_range_notrack() static, as its one user, io_mapping_map_user()
> was removed in commit 9a4f90e24661 ("mm: remove mm/io-mapping.c").
> 
> Then, introduce remap_pfn_range_prepare(), which accepts VMA descriptor
> and PFN parameters, and remap_pfn_range_complete() which accepts the same
> parameters as remap_pfn_rangte().
> 
> remap_pfn_range_prepare() will set the cow vma->vm_pgoff if necessary, so
> it must be supplied with a correct PFN to do so.  If the caller must hold
> locks to be able to do this, those locks should be held across the
> operation, and mmap_abort() should be provided to revoke the lock should
> an error arise.

It looks like the patches have changed to remove mmap_abort so this
paragraph can probably be dropped.

>  static int remap_pfn_range_internal(struct vm_area_struct *vma, unsigned long addr,
> -		unsigned long pfn, unsigned long size, pgprot_t prot)
> +		unsigned long pfn, unsigned long size, pgprot_t prot, bool set_vma)
>  {
>  	pgd_t *pgd;
>  	unsigned long next;
> @@ -2912,32 +2931,17 @@ static int remap_pfn_range_internal(struct vm_area_struct *vma, unsigned long ad
>  	if (WARN_ON_ONCE(!PAGE_ALIGNED(addr)))
>  		return -EINVAL;
>  
> -	/*
> -	 * Physically remapped pages are special. Tell the
> -	 * rest of the world about it:
> -	 *   VM_IO tells people not to look at these pages
> -	 *	(accesses can have side effects).
> -	 *   VM_PFNMAP tells the core MM that the base pages are just
> -	 *	raw PFN mappings, and do not have a "struct page" associated
> -	 *	with them.
> -	 *   VM_DONTEXPAND
> -	 *      Disable vma merging and expanding with mremap().
> -	 *   VM_DONTDUMP
> -	 *      Omit vma from core dump, even when VM_IO turned off.
> -	 *
> -	 * There's a horrible special case to handle copy-on-write
> -	 * behaviour that some programs depend on. We mark the "original"
> -	 * un-COW'ed pages by matching them up with "vma->vm_pgoff".
> -	 * See vm_normal_page() for details.
> -	 */
> -	if (is_cow_mapping(vma->vm_flags)) {
> -		if (addr != vma->vm_start || end != vma->vm_end)
> -			return -EINVAL;
> -		vma->vm_pgoff = pfn;
> +	if (set_vma) {
> +		err = get_remap_pgoff(vma->vm_flags, addr, end,
> +				      vma->vm_start, vma->vm_end,
> +				      pfn, &vma->vm_pgoff);
> +		if (err)
> +			return err;
> +		vm_flags_set(vma, VM_REMAP_FLAGS);
> +	} else {
> +		VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) != VM_REMAP_FLAGS);
>  	}

It looks like you can avoid the changes to add set_vma by making
remap_pfn_range_internal() only do the complete portion and giving
the legacy calls a helper to do prepare in line:

int remap_pfn_range_prepare_vma(..)
{
	int err;

	err = get_remap_pgoff(vma->vm_flags, addr, end,
			      vma->vm_start, vma->vm_end,
			      pfn, &vma->vm_pgoff);
	if (err)
		return err;
	vm_flags_set(vma, VM_REMAP_FLAGS);
	return 0;
}

int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
	    	    unsigned long pfn, unsigned long size, pgprot_t prot)
{
	int err;

	err = remap_pfn_range_prepare_vma(vma, addr, pfn, size)
	if (err)
	     return err;

	if (IS_ENABLED(__HAVE_PFNMAP_TRACKING))
		return remap_pfn_range_track(vma, addr, pfn, size, prot);
	return remap_pfn_range_notrack(vma, addr, pfn, size, prot);
}

(fix pgtable_Types.h to #define to 1 so IS_ENABLED works)

But the logic here is all fine

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason


  reply	other threads:[~2025-09-16 20:08 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-16 14:11 [PATCH v3 00/13] expand mmap_prepare functionality, port more users Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 01/13] mm/shmem: update shmem to use mmap_prepare Lorenzo Stoakes
2025-09-16 16:42   ` Jason Gunthorpe
2025-09-17 10:30   ` Pedro Falcato
2025-09-16 14:11 ` [PATCH v3 02/13] device/dax: update devdax " Lorenzo Stoakes
2025-09-16 16:43   ` Jason Gunthorpe
2025-09-17 10:37   ` Pedro Falcato
2025-09-17 13:54     ` Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 03/13] mm: add vma_desc_size(), vma_desc_pages() helpers Lorenzo Stoakes
2025-09-16 16:46   ` Jason Gunthorpe
2025-09-17 10:39   ` Pedro Falcato
2025-09-16 14:11 ` [PATCH v3 04/13] relay: update relay to use mmap_prepare Lorenzo Stoakes
2025-09-16 16:48   ` Jason Gunthorpe
2025-09-17 10:41   ` Pedro Falcato
2025-09-16 14:11 ` [PATCH v3 05/13] mm/vma: rename __mmap_prepare() function to avoid confusion Lorenzo Stoakes
2025-09-16 16:48   ` Jason Gunthorpe
2025-09-17 10:49   ` Pedro Falcato
2025-09-17 13:24     ` Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 06/13] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete() Lorenzo Stoakes
2025-09-16 17:07   ` Jason Gunthorpe [this message]
2025-09-16 17:37     ` Lorenzo Stoakes
2025-09-17 11:07   ` Pedro Falcato
2025-09-17 11:16     ` Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 07/13] mm: introduce io_remap_pfn_range_[prepare, complete]() Lorenzo Stoakes
2025-09-16 17:19   ` Jason Gunthorpe
2025-09-16 17:34     ` Lorenzo Stoakes
2025-09-17 11:12   ` Pedro Falcato
2025-09-17 11:15     ` Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 08/13] mm: add ability to take further action in vm_area_desc Lorenzo Stoakes
2025-09-16 17:28   ` Jason Gunthorpe
2025-09-16 17:57     ` Lorenzo Stoakes
2025-09-16 18:08       ` Jason Gunthorpe
2025-09-16 19:31         ` Lorenzo Stoakes
2025-09-17 11:32   ` Pedro Falcato
2025-09-17 15:34     ` Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 09/13] doc: update porting, vfs documentation for mmap_prepare actions Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 10/13] mm/hugetlbfs: update hugetlbfs to use mmap_prepare Lorenzo Stoakes
2025-09-16 17:30   ` Jason Gunthorpe
2025-09-16 14:11 ` [PATCH v3 11/13] mm: update mem char driver " Lorenzo Stoakes
2025-09-16 17:40   ` Jason Gunthorpe
2025-09-16 18:02     ` Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 12/13] mm: update resctl " Lorenzo Stoakes
2025-09-16 17:40   ` Jason Gunthorpe
2025-09-16 18:02     ` Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 13/13] iommufd: update " Lorenzo Stoakes
2025-09-16 15:40   ` Jason Gunthorpe
2025-09-16 16:23     ` Lorenzo Stoakes
2025-09-17  1:32       ` Andrew Morton
2025-09-17 10:13         ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250916170723.GO1086830@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=Dave.Martin@arm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=almaz.alexandrovich@paragon-software.com \
    --cc=andreas@gaisler.com \
    --cc=andreyknvl@gmail.com \
    --cc=arnd@arndb.de \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=davem@davemloft.net \
    --cc=david@redhat.com \
    --cc=dvyukov@google.com \
    --cc=dyoung@redhat.com \
    --cc=gor@linux.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=guoren@kernel.org \
    --cc=hca@linux.ibm.com \
    --cc=hughd@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jack@suse.cz \
    --cc=james.morse@arm.com \
    --cc=jannh@google.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=kevin.tian@intel.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-csky@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nico@fluxnic.net \
    --cc=ntfs3@lists.linux.dev \
    --cc=nvdimm@lists.linux.dev \
    --cc=osalvador@suse.de \
    --cc=pfalcato@suse.de \
    --cc=reinette.chatre@intel.com \
    --cc=robin.murphy@arm.com \
    --cc=rppt@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=svens@linux.ibm.com \
    --cc=tony.luck@intel.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=vgoyal@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.