* mapping files for iommufd @ 2024-08-12 18:09 Steven Sistare 2024-08-12 18:40 ` Jason Gunthorpe 0 siblings, 1 reply; 7+ messages in thread From: Steven Sistare @ 2024-08-12 18:09 UTC (permalink / raw) To: Alex Williamson, Jason Gunthorpe; +Cc: iommu I have been using QEMU to test legacy vfio vs iommufd compatibility vfio. I find that backing guest memory with an ext4 file /root/guest.ram succeeds for legacy but fails for iommufd, with EFAULT in IOMMU_IOAS_MAP -> ... __get_user_pages -> check_vma_flags. Ditto for a hugetlbfs file. This is with unmodified kernels, no live update changes. Is this expected? (a tmpfs file works fine, as does memory-backend-memfd). - Steve ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mapping files for iommufd 2024-08-12 18:09 mapping files for iommufd Steven Sistare @ 2024-08-12 18:40 ` Jason Gunthorpe 2024-08-12 18:56 ` Steven Sistare 0 siblings, 1 reply; 7+ messages in thread From: Jason Gunthorpe @ 2024-08-12 18:40 UTC (permalink / raw) To: Steven Sistare; +Cc: Alex Williamson, iommu On Mon, Aug 12, 2024 at 02:09:11PM -0400, Steven Sistare wrote: > I have been using QEMU to test legacy vfio vs iommufd compatibility vfio. > I find that backing guest memory with an ext4 file /root/guest.ram > succeeds for legacy but fails for iommufd, with EFAULT in > IOMMU_IOAS_MAP -> ... __get_user_pages -> check_vma_flags. Ditto for a > hugetlbfs file. This is with unmodified kernels, no live update changes. > > Is this expected? Yes - I'm surprised that VFIO is not failing though? /* * Writing to file-backed mappings which require folio dirty tracking using GUP * is a fundamentally broken operation, as kernel write access to GUP mappings * do not adhere to the semantics expected by a file system. * * Consider the following scenario:- * * 1. A folio is written to via GUP which write-faults the memory, notifying * the file system and dirtying the folio. * 2. Later, writeback is triggered, resulting in the folio being cleaned and * the PTE being marked read-only. * 3. The GUP caller writes to the folio, as it is mapped read/write via the * direct mapping. * 4. The GUP caller, now done with the page, unpins it and sets it dirty * (though it does not have to). * * This results in both data being written to a folio without writenotify, and * the folio being dirtied unexpectedly (if the caller decides to do so). */ static bool writable_file_mapping_allowed(struct vm_area_struct *vma, unsigned long gup_flags) { Why doesn't that fail in the VFIO case? I didn't notice anything special there that would make it work??? I recall we talked about allowing old stuff to keep working but I don't see any implementation of that? Jason ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mapping files for iommufd 2024-08-12 18:40 ` Jason Gunthorpe @ 2024-08-12 18:56 ` Steven Sistare 2024-08-12 21:05 ` Alex Williamson 0 siblings, 1 reply; 7+ messages in thread From: Steven Sistare @ 2024-08-12 18:56 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Alex Williamson, iommu On 8/12/2024 2:40 PM, Jason Gunthorpe wrote: > On Mon, Aug 12, 2024 at 02:09:11PM -0400, Steven Sistare wrote: >> I have been using QEMU to test legacy vfio vs iommufd compatibility vfio. >> I find that backing guest memory with an ext4 file /root/guest.ram >> succeeds for legacy but fails for iommufd, with EFAULT in >> IOMMU_IOAS_MAP -> ... __get_user_pages -> check_vma_flags. Ditto for a >> hugetlbfs file. This is with unmodified kernels, no live update changes. >> >> Is this expected? > > Yes - I'm surprised that VFIO is not failing though? > > /* > * Writing to file-backed mappings which require folio dirty tracking using GUP > * is a fundamentally broken operation, as kernel write access to GUP mappings > * do not adhere to the semantics expected by a file system. > * > * Consider the following scenario:- > * > * 1. A folio is written to via GUP which write-faults the memory, notifying > * the file system and dirtying the folio. > * 2. Later, writeback is triggered, resulting in the folio being cleaned and > * the PTE being marked read-only. > * 3. The GUP caller writes to the folio, as it is mapped read/write via the > * direct mapping. > * 4. The GUP caller, now done with the page, unpins it and sets it dirty > * (though it does not have to). > * > * This results in both data being written to a folio without writenotify, and > * the folio being dirtied unexpectedly (if the caller decides to do so). > */ > static bool writable_file_mapping_allowed(struct vm_area_struct *vma, > unsigned long gup_flags) > { > > Why doesn't that fail in the VFIO case? I didn't notice anything > special there that would make it work??? vfio_dma_do_map -> vfio_pin_map_dma -> vfio_pin_pages_remote does not call GUP, it just grabs a reference to each page. > I recall we talked about allowing old stuff to keep working but I > don't see any implementation of that? Nor I. Unless "keep old stuff working" is implemented by omitting the CONFIG definitions for iommufd. - Steve ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mapping files for iommufd 2024-08-12 18:56 ` Steven Sistare @ 2024-08-12 21:05 ` Alex Williamson 2024-08-13 14:23 ` Steven Sistare 0 siblings, 1 reply; 7+ messages in thread From: Alex Williamson @ 2024-08-12 21:05 UTC (permalink / raw) To: Steven Sistare; +Cc: Jason Gunthorpe, iommu On Mon, 12 Aug 2024 14:56:03 -0400 Steven Sistare <steven.sistare@oracle.com> wrote: > On 8/12/2024 2:40 PM, Jason Gunthorpe wrote: > > On Mon, Aug 12, 2024 at 02:09:11PM -0400, Steven Sistare wrote: > >> I have been using QEMU to test legacy vfio vs iommufd compatibility vfio. > >> I find that backing guest memory with an ext4 file /root/guest.ram > >> succeeds for legacy but fails for iommufd, with EFAULT in > >> IOMMU_IOAS_MAP -> ... __get_user_pages -> check_vma_flags. Ditto for a > >> hugetlbfs file. This is with unmodified kernels, no live update changes. > >> > >> Is this expected? > > > > Yes - I'm surprised that VFIO is not failing though? > > > > /* > > * Writing to file-backed mappings which require folio dirty tracking using GUP > > * is a fundamentally broken operation, as kernel write access to GUP mappings > > * do not adhere to the semantics expected by a file system. > > * > > * Consider the following scenario:- > > * > > * 1. A folio is written to via GUP which write-faults the memory, notifying > > * the file system and dirtying the folio. > > * 2. Later, writeback is triggered, resulting in the folio being cleaned and > > * the PTE being marked read-only. > > * 3. The GUP caller writes to the folio, as it is mapped read/write via the > > * direct mapping. > > * 4. The GUP caller, now done with the page, unpins it and sets it dirty > > * (though it does not have to). > > * > > * This results in both data being written to a folio without writenotify, and > > * the folio being dirtied unexpectedly (if the caller decides to do so). > > */ > > static bool writable_file_mapping_allowed(struct vm_area_struct *vma, > > unsigned long gup_flags) > > { > > > > Why doesn't that fail in the VFIO case? I didn't notice anything > > special there that would make it work??? > > vfio_dma_do_map -> vfio_pin_map_dma -> vfio_pin_pages_remote does not call GUP, > it just grabs a reference to each page. Can you be more specific, because... vfio_pin_pages_remote vaddr_get_pfns pin_user_pages_remote __gup_longterm_locked __get_user_pages_locked __get_user_pages Thanks, Alex ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mapping files for iommufd 2024-08-12 21:05 ` Alex Williamson @ 2024-08-13 14:23 ` Steven Sistare 2024-08-13 14:36 ` Jason Gunthorpe 0 siblings, 1 reply; 7+ messages in thread From: Steven Sistare @ 2024-08-13 14:23 UTC (permalink / raw) To: Alex Williamson; +Cc: Jason Gunthorpe, iommu On 8/12/2024 5:05 PM, Alex Williamson wrote: > On Mon, 12 Aug 2024 14:56:03 -0400 > Steven Sistare <steven.sistare@oracle.com> wrote: > >> On 8/12/2024 2:40 PM, Jason Gunthorpe wrote: >>> On Mon, Aug 12, 2024 at 02:09:11PM -0400, Steven Sistare wrote: >>>> I have been using QEMU to test legacy vfio vs iommufd compatibility vfio. >>>> I find that backing guest memory with an ext4 file /root/guest.ram >>>> succeeds for legacy but fails for iommufd, with EFAULT in >>>> IOMMU_IOAS_MAP -> ... __get_user_pages -> check_vma_flags. Ditto for a >>>> hugetlbfs file. This is with unmodified kernels, no live update changes. >>>> >>>> Is this expected? >>> >>> Yes - I'm surprised that VFIO is not failing though? >>> >>> /* >>> * Writing to file-backed mappings which require folio dirty tracking using GUP >>> * is a fundamentally broken operation, as kernel write access to GUP mappings >>> * do not adhere to the semantics expected by a file system. >>> * >>> * Consider the following scenario:- >>> * >>> * 1. A folio is written to via GUP which write-faults the memory, notifying >>> * the file system and dirtying the folio. >>> * 2. Later, writeback is triggered, resulting in the folio being cleaned and >>> * the PTE being marked read-only. >>> * 3. The GUP caller writes to the folio, as it is mapped read/write via the >>> * direct mapping. >>> * 4. The GUP caller, now done with the page, unpins it and sets it dirty >>> * (though it does not have to). >>> * >>> * This results in both data being written to a folio without writenotify, and >>> * the folio being dirtied unexpectedly (if the caller decides to do so). >>> */ >>> static bool writable_file_mapping_allowed(struct vm_area_struct *vma, >>> unsigned long gup_flags) >>> { >>> >>> Why doesn't that fail in the VFIO case? I didn't notice anything >>> special there that would make it work??? >> >> vfio_dma_do_map -> vfio_pin_map_dma -> vfio_pin_pages_remote does not call GUP, >> it just grabs a reference to each page. > > Can you be more specific, because... > > vfio_pin_pages_remote > vaddr_get_pfns > pin_user_pages_remote > __gup_longterm_locked > __get_user_pages_locked > __get_user_pages Right. The correct explanation is my test succeeded for an older kernel, based on 5.15. It failed in 6.9. This change in 6.4 (which Jason quotes from above) appears to be the reason: 8ac2684 mm/gup: disallow FOLL_LONGTERM GUP-nonfast writing to file-backed mappings I do not question the need for that fix, but I am surprised that QEMU users have not complained about new kernels failing for hugetlbfs plus vfio. Maybe most are still on older stable kernels (as I was). - Steve ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mapping files for iommufd 2024-08-13 14:23 ` Steven Sistare @ 2024-08-13 14:36 ` Jason Gunthorpe 2024-08-13 15:13 ` Steven Sistare 0 siblings, 1 reply; 7+ messages in thread From: Jason Gunthorpe @ 2024-08-13 14:36 UTC (permalink / raw) To: Steven Sistare; +Cc: Alex Williamson, iommu On Tue, Aug 13, 2024 at 10:23:10AM -0400, Steven Sistare wrote: > The correct explanation is my test succeeded for an older kernel, based on 5.15. > It failed in 6.9. This change in 6.4 (which Jason quotes from above) appears to be > the reason: > 8ac2684 mm/gup: disallow FOLL_LONGTERM GUP-nonfast writing to file-backed mappings > > I do not question the need for that fix, but I am surprised that QEMU users have not > complained about new kernels failing for hugetlbfs plus vfio. Maybe most are still > on older stable kernels (as I was). hugetlbfs is supposed to work Normal filesystems are not Jason ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mapping files for iommufd 2024-08-13 14:36 ` Jason Gunthorpe @ 2024-08-13 15:13 ` Steven Sistare 0 siblings, 0 replies; 7+ messages in thread From: Steven Sistare @ 2024-08-13 15:13 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Alex Williamson, iommu On 8/13/2024 10:36 AM, Jason Gunthorpe wrote: > On Tue, Aug 13, 2024 at 10:23:10AM -0400, Steven Sistare wrote: > >> The correct explanation is my test succeeded for an older kernel, based on 5.15. >> It failed in 6.9. This change in 6.4 (which Jason quotes from above) appears to be >> the reason: >> 8ac2684 mm/gup: disallow FOLL_LONGTERM GUP-nonfast writing to file-backed mappings >> >> I do not question the need for that fix, but I am surprised that QEMU users have not >> complained about new kernels failing for hugetlbfs plus vfio. Maybe most are still >> on older stable kernels (as I was). > > hugetlbfs is supposed to work > > Normal filesystems are not Thank you. I flubbed the hugetlbfs test and created a regular file. Fixed that, and it works as expected. - Steve ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-08-13 15:13 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-12 18:09 mapping files for iommufd Steven Sistare 2024-08-12 18:40 ` Jason Gunthorpe 2024-08-12 18:56 ` Steven Sistare 2024-08-12 21:05 ` Alex Williamson 2024-08-13 14:23 ` Steven Sistare 2024-08-13 14:36 ` Jason Gunthorpe 2024-08-13 15:13 ` Steven Sistare
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.