From: Peter Xu <peterx@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kvm@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>,
Alex Williamson <alex.williamson@redhat.com>,
Zi Yan <ziy@nvidia.com>, Alex Mastro <amastro@fb.com>,
David Hildenbrand <david@redhat.com>,
Nico Pache <npache@redhat.com>
Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings
Date: Tue, 24 Jun 2025 20:48:45 -0400 [thread overview]
Message-ID: <aFtHbXFO1ZpAsnV8@x1.local> (raw)
In-Reply-To: <20250624234032.GC167785@nvidia.com>
On Tue, Jun 24, 2025 at 08:40:32PM -0300, Jason Gunthorpe wrote:
> On Tue, Jun 24, 2025 at 04:37:26PM -0400, Peter Xu wrote:
> > On Thu, Jun 19, 2025 at 03:40:41PM -0300, Jason Gunthorpe wrote:
> > > Even with this new version you have to decide to return PUD_SIZE or
> > > bar_size in pci and your same reasoning that PUD_SIZE make sense
> > > applies (though I would probably return bar_size and just let the core
> > > code cap it to PUD_SIZE)
> >
> > Yes.
> >
> > Today I went back to look at this, I was trying to introduce this for
> > file_operations:
> >
> > int (*get_mapping_order)(struct file *, unsigned long, size_t);
> >
> > It looks almost good, except that it so far has no way to return the
> > physical address for further calculation on the alignment.
> >
> > For THP, VA is always calculated against pgoff not physical address on the
> > alignment. I think it's OK for THP, because every 2M THP folio will be
> > naturally 2M aligned on the physical address, so it fits when e.g. pgoff=0
> > in the calculation of thp_get_unmapped_area_vmflags().
> >
> > Logically it should even also work for vfio-pci, as long as VFIO keeps
> > using the lower 40 bits of the device_fd to represent the bar offset,
> > meanwhile it'll also require PCIe spec asking the PCI bars to be mapped
> > aligned with bar sizes.
> >
> > But from an API POV, get_mapping_order() logically should return something
> > for further calculation of the alignment to get the VA. pgoff here may not
> > always be the right thing to use to align to the VA: after all, pgtable
> > mapping is about VA -> PA, the only reasonable and reliable way is to align
> > VA to the PA to be mappped, and as an API we shouldn't assume pgoff is
> > always aligned to PA address space.
>
> My feeling, and the reason I used the phrase "pgoff aligned address",
> is that the owner of the file should already ensure that for the large
> PTEs/folios:
> pgoff % 2**order == 0
> physical % 2**order == 0
IMHO there shouldn't really be any hard requirement in mm that pgoff and
physical address space need to be aligned.. but I confess I don't have an
example driver that didn't do that in the linux tree.
>
> So, things like VFIO do need to hand out high alignment pgoffs to make
> this work - which it already does.
>
> To me this just keeps thing simpler. I guess if someone comes up with
> a case where they really can't get a pgoff alignment and really need a
> high order mapping then maybe we can add a new return field of some
> kind (pgoff adjustment?) but that is so weird I'd leave it to the
> future person to come and justfiy it.
When looking more, I also found some special cased get_unmapped_area() that
may not be trivially converted into the new API even for CONFIG_MMU, namely:
- io_uring_get_unmapped_area
- arena_get_unmapped_area (from bpf_map->ops->map_get_unmapped_area)
I'll need to have some closer look tomorrow. If any of them cannot be 100%
safely converted to the new API, I'd also think we should not introduce the
new API, but reuse get_unmapped_area() until we know a way out.
--
Peter Xu
next prev parent reply other threads:[~2025-06-25 0:48 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-13 13:41 [PATCH 0/5] mm/vfio: huge pfnmaps with !MAP_FIXED mappings Peter Xu
2025-06-13 13:41 ` [PATCH 1/5] mm: Deduplicate mm_get_unmapped_area() Peter Xu
2025-06-13 14:12 ` Jason Gunthorpe
2025-06-13 14:55 ` Oscar Salvador
2025-06-13 14:58 ` Zi Yan
2025-06-13 15:57 ` Lorenzo Stoakes
2025-06-13 17:00 ` Pedro Falcato
2025-06-13 18:00 ` David Hildenbrand
2025-06-16 8:01 ` David Laight
2025-06-17 21:13 ` Peter Xu
2025-06-13 13:41 ` [PATCH 2/5] mm/hugetlb: Remove prepare_hugepage_range() Peter Xu
2025-06-13 14:12 ` Jason Gunthorpe
2025-06-13 14:59 ` Oscar Salvador
2025-06-13 15:13 ` Zi Yan
2025-06-13 16:24 ` Peter Xu
2025-06-13 18:01 ` David Hildenbrand
2025-06-14 4:11 ` Liam R. Howlett
2025-06-17 21:07 ` Peter Xu
2025-06-13 13:41 ` [PATCH 3/5] mm: Rename __thp_get_unmapped_area to mm_get_unmapped_area_aligned Peter Xu
2025-06-13 14:17 ` Jason Gunthorpe
2025-06-13 15:13 ` Peter Xu
2025-06-13 16:00 ` Jason Gunthorpe
2025-06-13 18:31 ` Peter Xu
2025-06-13 15:19 ` Zi Yan
2025-06-13 18:33 ` Peter Xu
2025-06-13 15:36 ` Lorenzo Stoakes
2025-06-13 18:45 ` Peter Xu
2025-06-13 19:18 ` Lorenzo Stoakes
2025-06-13 20:34 ` Peter Xu
2025-06-14 5:58 ` Lorenzo Stoakes
2025-06-14 5:23 ` Liam R. Howlett
2025-06-16 12:14 ` Jason Gunthorpe
2025-06-16 12:20 ` Lorenzo Stoakes
2025-06-16 12:26 ` Jason Gunthorpe
2025-06-13 13:41 ` [PATCH 4/5] vfio: Introduce vfio_device_ops.get_unmapped_area hook Peter Xu
2025-06-13 14:18 ` Jason Gunthorpe
2025-06-13 18:03 ` David Hildenbrand
2025-06-14 14:46 ` kernel test robot
2025-06-17 15:39 ` Peter Xu
2025-06-17 15:41 ` Jason Gunthorpe
2025-06-17 16:47 ` Peter Xu
2025-06-17 19:39 ` Peter Xu
2025-06-17 19:46 ` Jason Gunthorpe
2025-06-17 20:01 ` Peter Xu
2025-06-17 23:00 ` Jason Gunthorpe
2025-06-17 23:26 ` Peter Xu
2025-06-13 13:41 ` [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Peter Xu
2025-06-13 14:29 ` Jason Gunthorpe
2025-06-13 15:26 ` Peter Xu
2025-06-13 16:09 ` Jason Gunthorpe
2025-06-13 19:15 ` Peter Xu
2025-06-13 23:16 ` Jason Gunthorpe
2025-06-16 22:06 ` Peter Xu
2025-06-16 23:00 ` Jason Gunthorpe
2025-06-17 20:56 ` Peter Xu
2025-06-17 23:18 ` Jason Gunthorpe
2025-06-17 23:36 ` Peter Xu
2025-06-18 16:56 ` Peter Xu
2025-06-18 17:46 ` Jason Gunthorpe
2025-06-18 19:15 ` Peter Xu
2025-06-19 13:58 ` Jason Gunthorpe
2025-06-19 14:55 ` Peter Xu
2025-06-19 18:40 ` Jason Gunthorpe
2025-06-24 20:37 ` Peter Xu
2025-06-24 20:51 ` Peter Xu
2025-06-24 23:40 ` Jason Gunthorpe
2025-06-25 0:48 ` Peter Xu [this message]
2025-06-25 13:07 ` Jason Gunthorpe
2025-06-25 17:12 ` Peter Xu
2025-06-25 18:41 ` Jason Gunthorpe
2025-06-25 19:26 ` Peter Xu
2025-06-30 14:05 ` Jason Gunthorpe
2025-07-02 20:58 ` Peter Xu
2025-07-02 23:32 ` Jason Gunthorpe
2025-06-13 17:44 ` Alex Mastro
2025-06-13 18:53 ` Peter Xu
2025-06-13 18:09 ` David Hildenbrand
2025-06-13 19:21 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aFtHbXFO1ZpAsnV8@x1.local \
--to=peterx@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=amastro@fb.com \
--cc=david@redhat.com \
--cc=jgg@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.