* mapping files for iommufd
@ 2024-08-12 18:09 Steven Sistare
2024-08-12 18:40 ` Jason Gunthorpe
0 siblings, 1 reply; 7+ messages in thread
From: Steven Sistare @ 2024-08-12 18:09 UTC (permalink / raw)
To: Alex Williamson, Jason Gunthorpe; +Cc: iommu
I have been using QEMU to test legacy vfio vs iommufd compatibility vfio.
I find that backing guest memory with an ext4 file /root/guest.ram
succeeds for legacy but fails for iommufd, with EFAULT in
IOMMU_IOAS_MAP -> ... __get_user_pages -> check_vma_flags. Ditto for a
hugetlbfs file. This is with unmodified kernels, no live update changes.
Is this expected?
(a tmpfs file works fine, as does memory-backend-memfd).
- Steve
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mapping files for iommufd
2024-08-12 18:09 mapping files for iommufd Steven Sistare
@ 2024-08-12 18:40 ` Jason Gunthorpe
2024-08-12 18:56 ` Steven Sistare
0 siblings, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2024-08-12 18:40 UTC (permalink / raw)
To: Steven Sistare; +Cc: Alex Williamson, iommu
On Mon, Aug 12, 2024 at 02:09:11PM -0400, Steven Sistare wrote:
> I have been using QEMU to test legacy vfio vs iommufd compatibility vfio.
> I find that backing guest memory with an ext4 file /root/guest.ram
> succeeds for legacy but fails for iommufd, with EFAULT in
> IOMMU_IOAS_MAP -> ... __get_user_pages -> check_vma_flags. Ditto for a
> hugetlbfs file. This is with unmodified kernels, no live update changes.
>
> Is this expected?
Yes - I'm surprised that VFIO is not failing though?
/*
* Writing to file-backed mappings which require folio dirty tracking using GUP
* is a fundamentally broken operation, as kernel write access to GUP mappings
* do not adhere to the semantics expected by a file system.
*
* Consider the following scenario:-
*
* 1. A folio is written to via GUP which write-faults the memory, notifying
* the file system and dirtying the folio.
* 2. Later, writeback is triggered, resulting in the folio being cleaned and
* the PTE being marked read-only.
* 3. The GUP caller writes to the folio, as it is mapped read/write via the
* direct mapping.
* 4. The GUP caller, now done with the page, unpins it and sets it dirty
* (though it does not have to).
*
* This results in both data being written to a folio without writenotify, and
* the folio being dirtied unexpectedly (if the caller decides to do so).
*/
static bool writable_file_mapping_allowed(struct vm_area_struct *vma,
unsigned long gup_flags)
{
Why doesn't that fail in the VFIO case? I didn't notice anything
special there that would make it work???
I recall we talked about allowing old stuff to keep working but I
don't see any implementation of that?
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mapping files for iommufd
2024-08-12 18:40 ` Jason Gunthorpe
@ 2024-08-12 18:56 ` Steven Sistare
2024-08-12 21:05 ` Alex Williamson
0 siblings, 1 reply; 7+ messages in thread
From: Steven Sistare @ 2024-08-12 18:56 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Alex Williamson, iommu
On 8/12/2024 2:40 PM, Jason Gunthorpe wrote:
> On Mon, Aug 12, 2024 at 02:09:11PM -0400, Steven Sistare wrote:
>> I have been using QEMU to test legacy vfio vs iommufd compatibility vfio.
>> I find that backing guest memory with an ext4 file /root/guest.ram
>> succeeds for legacy but fails for iommufd, with EFAULT in
>> IOMMU_IOAS_MAP -> ... __get_user_pages -> check_vma_flags. Ditto for a
>> hugetlbfs file. This is with unmodified kernels, no live update changes.
>>
>> Is this expected?
>
> Yes - I'm surprised that VFIO is not failing though?
>
> /*
> * Writing to file-backed mappings which require folio dirty tracking using GUP
> * is a fundamentally broken operation, as kernel write access to GUP mappings
> * do not adhere to the semantics expected by a file system.
> *
> * Consider the following scenario:-
> *
> * 1. A folio is written to via GUP which write-faults the memory, notifying
> * the file system and dirtying the folio.
> * 2. Later, writeback is triggered, resulting in the folio being cleaned and
> * the PTE being marked read-only.
> * 3. The GUP caller writes to the folio, as it is mapped read/write via the
> * direct mapping.
> * 4. The GUP caller, now done with the page, unpins it and sets it dirty
> * (though it does not have to).
> *
> * This results in both data being written to a folio without writenotify, and
> * the folio being dirtied unexpectedly (if the caller decides to do so).
> */
> static bool writable_file_mapping_allowed(struct vm_area_struct *vma,
> unsigned long gup_flags)
> {
>
> Why doesn't that fail in the VFIO case? I didn't notice anything
> special there that would make it work???
vfio_dma_do_map -> vfio_pin_map_dma -> vfio_pin_pages_remote does not call GUP,
it just grabs a reference to each page.
> I recall we talked about allowing old stuff to keep working but I
> don't see any implementation of that?
Nor I. Unless "keep old stuff working" is implemented by omitting the CONFIG
definitions for iommufd.
- Steve
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mapping files for iommufd
2024-08-12 18:56 ` Steven Sistare
@ 2024-08-12 21:05 ` Alex Williamson
2024-08-13 14:23 ` Steven Sistare
0 siblings, 1 reply; 7+ messages in thread
From: Alex Williamson @ 2024-08-12 21:05 UTC (permalink / raw)
To: Steven Sistare; +Cc: Jason Gunthorpe, iommu
On Mon, 12 Aug 2024 14:56:03 -0400
Steven Sistare <steven.sistare@oracle.com> wrote:
> On 8/12/2024 2:40 PM, Jason Gunthorpe wrote:
> > On Mon, Aug 12, 2024 at 02:09:11PM -0400, Steven Sistare wrote:
> >> I have been using QEMU to test legacy vfio vs iommufd compatibility vfio.
> >> I find that backing guest memory with an ext4 file /root/guest.ram
> >> succeeds for legacy but fails for iommufd, with EFAULT in
> >> IOMMU_IOAS_MAP -> ... __get_user_pages -> check_vma_flags. Ditto for a
> >> hugetlbfs file. This is with unmodified kernels, no live update changes.
> >>
> >> Is this expected?
> >
> > Yes - I'm surprised that VFIO is not failing though?
> >
> > /*
> > * Writing to file-backed mappings which require folio dirty tracking using GUP
> > * is a fundamentally broken operation, as kernel write access to GUP mappings
> > * do not adhere to the semantics expected by a file system.
> > *
> > * Consider the following scenario:-
> > *
> > * 1. A folio is written to via GUP which write-faults the memory, notifying
> > * the file system and dirtying the folio.
> > * 2. Later, writeback is triggered, resulting in the folio being cleaned and
> > * the PTE being marked read-only.
> > * 3. The GUP caller writes to the folio, as it is mapped read/write via the
> > * direct mapping.
> > * 4. The GUP caller, now done with the page, unpins it and sets it dirty
> > * (though it does not have to).
> > *
> > * This results in both data being written to a folio without writenotify, and
> > * the folio being dirtied unexpectedly (if the caller decides to do so).
> > */
> > static bool writable_file_mapping_allowed(struct vm_area_struct *vma,
> > unsigned long gup_flags)
> > {
> >
> > Why doesn't that fail in the VFIO case? I didn't notice anything
> > special there that would make it work???
>
> vfio_dma_do_map -> vfio_pin_map_dma -> vfio_pin_pages_remote does not call GUP,
> it just grabs a reference to each page.
Can you be more specific, because...
vfio_pin_pages_remote
vaddr_get_pfns
pin_user_pages_remote
__gup_longterm_locked
__get_user_pages_locked
__get_user_pages
Thanks,
Alex
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mapping files for iommufd
2024-08-12 21:05 ` Alex Williamson
@ 2024-08-13 14:23 ` Steven Sistare
2024-08-13 14:36 ` Jason Gunthorpe
0 siblings, 1 reply; 7+ messages in thread
From: Steven Sistare @ 2024-08-13 14:23 UTC (permalink / raw)
To: Alex Williamson; +Cc: Jason Gunthorpe, iommu
On 8/12/2024 5:05 PM, Alex Williamson wrote:
> On Mon, 12 Aug 2024 14:56:03 -0400
> Steven Sistare <steven.sistare@oracle.com> wrote:
>
>> On 8/12/2024 2:40 PM, Jason Gunthorpe wrote:
>>> On Mon, Aug 12, 2024 at 02:09:11PM -0400, Steven Sistare wrote:
>>>> I have been using QEMU to test legacy vfio vs iommufd compatibility vfio.
>>>> I find that backing guest memory with an ext4 file /root/guest.ram
>>>> succeeds for legacy but fails for iommufd, with EFAULT in
>>>> IOMMU_IOAS_MAP -> ... __get_user_pages -> check_vma_flags. Ditto for a
>>>> hugetlbfs file. This is with unmodified kernels, no live update changes.
>>>>
>>>> Is this expected?
>>>
>>> Yes - I'm surprised that VFIO is not failing though?
>>>
>>> /*
>>> * Writing to file-backed mappings which require folio dirty tracking using GUP
>>> * is a fundamentally broken operation, as kernel write access to GUP mappings
>>> * do not adhere to the semantics expected by a file system.
>>> *
>>> * Consider the following scenario:-
>>> *
>>> * 1. A folio is written to via GUP which write-faults the memory, notifying
>>> * the file system and dirtying the folio.
>>> * 2. Later, writeback is triggered, resulting in the folio being cleaned and
>>> * the PTE being marked read-only.
>>> * 3. The GUP caller writes to the folio, as it is mapped read/write via the
>>> * direct mapping.
>>> * 4. The GUP caller, now done with the page, unpins it and sets it dirty
>>> * (though it does not have to).
>>> *
>>> * This results in both data being written to a folio without writenotify, and
>>> * the folio being dirtied unexpectedly (if the caller decides to do so).
>>> */
>>> static bool writable_file_mapping_allowed(struct vm_area_struct *vma,
>>> unsigned long gup_flags)
>>> {
>>>
>>> Why doesn't that fail in the VFIO case? I didn't notice anything
>>> special there that would make it work???
>>
>> vfio_dma_do_map -> vfio_pin_map_dma -> vfio_pin_pages_remote does not call GUP,
>> it just grabs a reference to each page.
>
> Can you be more specific, because...
>
> vfio_pin_pages_remote
> vaddr_get_pfns
> pin_user_pages_remote
> __gup_longterm_locked
> __get_user_pages_locked
> __get_user_pages
Right.
The correct explanation is my test succeeded for an older kernel, based on 5.15.
It failed in 6.9. This change in 6.4 (which Jason quotes from above) appears to be
the reason:
8ac2684 mm/gup: disallow FOLL_LONGTERM GUP-nonfast writing to file-backed mappings
I do not question the need for that fix, but I am surprised that QEMU users have not
complained about new kernels failing for hugetlbfs plus vfio. Maybe most are still
on older stable kernels (as I was).
- Steve
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mapping files for iommufd
2024-08-13 14:23 ` Steven Sistare
@ 2024-08-13 14:36 ` Jason Gunthorpe
2024-08-13 15:13 ` Steven Sistare
0 siblings, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2024-08-13 14:36 UTC (permalink / raw)
To: Steven Sistare; +Cc: Alex Williamson, iommu
On Tue, Aug 13, 2024 at 10:23:10AM -0400, Steven Sistare wrote:
> The correct explanation is my test succeeded for an older kernel, based on 5.15.
> It failed in 6.9. This change in 6.4 (which Jason quotes from above) appears to be
> the reason:
> 8ac2684 mm/gup: disallow FOLL_LONGTERM GUP-nonfast writing to file-backed mappings
>
> I do not question the need for that fix, but I am surprised that QEMU users have not
> complained about new kernels failing for hugetlbfs plus vfio. Maybe most are still
> on older stable kernels (as I was).
hugetlbfs is supposed to work
Normal filesystems are not
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mapping files for iommufd
2024-08-13 14:36 ` Jason Gunthorpe
@ 2024-08-13 15:13 ` Steven Sistare
0 siblings, 0 replies; 7+ messages in thread
From: Steven Sistare @ 2024-08-13 15:13 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Alex Williamson, iommu
On 8/13/2024 10:36 AM, Jason Gunthorpe wrote:
> On Tue, Aug 13, 2024 at 10:23:10AM -0400, Steven Sistare wrote:
>
>> The correct explanation is my test succeeded for an older kernel, based on 5.15.
>> It failed in 6.9. This change in 6.4 (which Jason quotes from above) appears to be
>> the reason:
>> 8ac2684 mm/gup: disallow FOLL_LONGTERM GUP-nonfast writing to file-backed mappings
>>
>> I do not question the need for that fix, but I am surprised that QEMU users have not
>> complained about new kernels failing for hugetlbfs plus vfio. Maybe most are still
>> on older stable kernels (as I was).
>
> hugetlbfs is supposed to work
>
> Normal filesystems are not
Thank you. I flubbed the hugetlbfs test and created a regular file.
Fixed that, and it works as expected.
- Steve
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-08-13 15:13 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-12 18:09 mapping files for iommufd Steven Sistare
2024-08-12 18:40 ` Jason Gunthorpe
2024-08-12 18:56 ` Steven Sistare
2024-08-12 21:05 ` Alex Williamson
2024-08-13 14:23 ` Steven Sistare
2024-08-13 14:36 ` Jason Gunthorpe
2024-08-13 15:13 ` Steven Sistare
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.