mapping files for iommufd

All of lore.kernel.org
 help / color / mirror / Atom feed

* mapping files for iommufd
@ 2024-08-12 18:09 Steven Sistare
  2024-08-12 18:40 ` Jason Gunthorpe
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Sistare @ 2024-08-12 18:09 UTC (permalink / raw)
  To: Alex Williamson, Jason Gunthorpe; +Cc: iommu

I have been using QEMU to test legacy vfio vs iommufd compatibility vfio.
I find that backing guest memory with an ext4 file /root/guest.ram
succeeds for legacy but fails for iommufd, with EFAULT in
IOMMU_IOAS_MAP -> ... __get_user_pages -> check_vma_flags.  Ditto for a
hugetlbfs file.  This is with unmodified kernels, no live update changes.

Is this expected?

(a tmpfs file works fine, as does memory-backend-memfd).

- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mapping files for iommufd
  2024-08-12 18:09 mapping files for iommufd Steven Sistare
@ 2024-08-12 18:40 ` Jason Gunthorpe
  2024-08-12 18:56   ` Steven Sistare
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2024-08-12 18:40 UTC (permalink / raw)
  To: Steven Sistare; +Cc: Alex Williamson, iommu

On Mon, Aug 12, 2024 at 02:09:11PM -0400, Steven Sistare wrote:
> I have been using QEMU to test legacy vfio vs iommufd compatibility vfio.
> I find that backing guest memory with an ext4 file /root/guest.ram
> succeeds for legacy but fails for iommufd, with EFAULT in
> IOMMU_IOAS_MAP -> ... __get_user_pages -> check_vma_flags.  Ditto for a
> hugetlbfs file.  This is with unmodified kernels, no live update changes.
> 
> Is this expected?

Yes - I'm surprised that VFIO is not failing though?

/*
 * Writing to file-backed mappings which require folio dirty tracking using GUP
 * is a fundamentally broken operation, as kernel write access to GUP mappings
 * do not adhere to the semantics expected by a file system.
 *
 * Consider the following scenario:-
 *
 * 1. A folio is written to via GUP which write-faults the memory, notifying
 *    the file system and dirtying the folio.
 * 2. Later, writeback is triggered, resulting in the folio being cleaned and
 *    the PTE being marked read-only.
 * 3. The GUP caller writes to the folio, as it is mapped read/write via the
 *    direct mapping.
 * 4. The GUP caller, now done with the page, unpins it and sets it dirty
 *    (though it does not have to).
 *
 * This results in both data being written to a folio without writenotify, and
 * the folio being dirtied unexpectedly (if the caller decides to do so).
 */
static bool writable_file_mapping_allowed(struct vm_area_struct *vma,
					  unsigned long gup_flags)
{

Why doesn't that fail in the VFIO case? I didn't notice anything
special there that would make it work???

I recall we talked about allowing old stuff to keep working but I
don't see any implementation of that?

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mapping files for iommufd
  2024-08-12 18:40 ` Jason Gunthorpe
@ 2024-08-12 18:56   ` Steven Sistare
  2024-08-12 21:05     ` Alex Williamson
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Sistare @ 2024-08-12 18:56 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Alex Williamson, iommu

On 8/12/2024 2:40 PM, Jason Gunthorpe wrote:
> On Mon, Aug 12, 2024 at 02:09:11PM -0400, Steven Sistare wrote:
>> I have been using QEMU to test legacy vfio vs iommufd compatibility vfio.
>> I find that backing guest memory with an ext4 file /root/guest.ram
>> succeeds for legacy but fails for iommufd, with EFAULT in
>> IOMMU_IOAS_MAP -> ... __get_user_pages -> check_vma_flags.  Ditto for a
>> hugetlbfs file.  This is with unmodified kernels, no live update changes.
>>
>> Is this expected?
> 
> Yes - I'm surprised that VFIO is not failing though?
> 
> /*
>   * Writing to file-backed mappings which require folio dirty tracking using GUP
>   * is a fundamentally broken operation, as kernel write access to GUP mappings
>   * do not adhere to the semantics expected by a file system.
>   *
>   * Consider the following scenario:-
>   *
>   * 1. A folio is written to via GUP which write-faults the memory, notifying
>   *    the file system and dirtying the folio.
>   * 2. Later, writeback is triggered, resulting in the folio being cleaned and
>   *    the PTE being marked read-only.
>   * 3. The GUP caller writes to the folio, as it is mapped read/write via the
>   *    direct mapping.
>   * 4. The GUP caller, now done with the page, unpins it and sets it dirty
>   *    (though it does not have to).
>   *
>   * This results in both data being written to a folio without writenotify, and
>   * the folio being dirtied unexpectedly (if the caller decides to do so).
>   */
> static bool writable_file_mapping_allowed(struct vm_area_struct *vma,
> 					  unsigned long gup_flags)
> {
> 
> Why doesn't that fail in the VFIO case? I didn't notice anything
> special there that would make it work???

vfio_dma_do_map -> vfio_pin_map_dma -> vfio_pin_pages_remote does not call GUP,
it just grabs a reference to each page.

> I recall we talked about allowing old stuff to keep working but I
> don't see any implementation of that?

Nor I.  Unless "keep old stuff working" is implemented by omitting the CONFIG
definitions for iommufd.

- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mapping files for iommufd
  2024-08-12 18:56   ` Steven Sistare
@ 2024-08-12 21:05     ` Alex Williamson
  2024-08-13 14:23       ` Steven Sistare
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Williamson @ 2024-08-12 21:05 UTC (permalink / raw)
  To: Steven Sistare; +Cc: Jason Gunthorpe, iommu

On Mon, 12 Aug 2024 14:56:03 -0400
Steven Sistare <steven.sistare@oracle.com> wrote:

> On 8/12/2024 2:40 PM, Jason Gunthorpe wrote:
> > On Mon, Aug 12, 2024 at 02:09:11PM -0400, Steven Sistare wrote:  
> >> I have been using QEMU to test legacy vfio vs iommufd compatibility vfio.
> >> I find that backing guest memory with an ext4 file /root/guest.ram
> >> succeeds for legacy but fails for iommufd, with EFAULT in
> >> IOMMU_IOAS_MAP -> ... __get_user_pages -> check_vma_flags.  Ditto for a
> >> hugetlbfs file.  This is with unmodified kernels, no live update changes.
> >>
> >> Is this expected?  
> > 
> > Yes - I'm surprised that VFIO is not failing though?
> > 
> > /*
> >   * Writing to file-backed mappings which require folio dirty tracking using GUP
> >   * is a fundamentally broken operation, as kernel write access to GUP mappings
> >   * do not adhere to the semantics expected by a file system.
> >   *
> >   * Consider the following scenario:-
> >   *
> >   * 1. A folio is written to via GUP which write-faults the memory, notifying
> >   *    the file system and dirtying the folio.
> >   * 2. Later, writeback is triggered, resulting in the folio being cleaned and
> >   *    the PTE being marked read-only.
> >   * 3. The GUP caller writes to the folio, as it is mapped read/write via the
> >   *    direct mapping.
> >   * 4. The GUP caller, now done with the page, unpins it and sets it dirty
> >   *    (though it does not have to).
> >   *
> >   * This results in both data being written to a folio without writenotify, and
> >   * the folio being dirtied unexpectedly (if the caller decides to do so).
> >   */
> > static bool writable_file_mapping_allowed(struct vm_area_struct *vma,
> > 					  unsigned long gup_flags)
> > {
> > 
> > Why doesn't that fail in the VFIO case? I didn't notice anything
> > special there that would make it work???  
> 
> vfio_dma_do_map -> vfio_pin_map_dma -> vfio_pin_pages_remote does not call GUP,
> it just grabs a reference to each page.

Can you be more specific, because...

vfio_pin_pages_remote
  vaddr_get_pfns
    pin_user_pages_remote
      __gup_longterm_locked
        __get_user_pages_locked
          __get_user_pages

Thanks,
Alex


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mapping files for iommufd
  2024-08-12 21:05     ` Alex Williamson
@ 2024-08-13 14:23       ` Steven Sistare
  2024-08-13 14:36         ` Jason Gunthorpe
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Sistare @ 2024-08-13 14:23 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jason Gunthorpe, iommu

On 8/12/2024 5:05 PM, Alex Williamson wrote:
> On Mon, 12 Aug 2024 14:56:03 -0400
> Steven Sistare <steven.sistare@oracle.com> wrote:
> 
>> On 8/12/2024 2:40 PM, Jason Gunthorpe wrote:
>>> On Mon, Aug 12, 2024 at 02:09:11PM -0400, Steven Sistare wrote:
>>>> I have been using QEMU to test legacy vfio vs iommufd compatibility vfio.
>>>> I find that backing guest memory with an ext4 file /root/guest.ram
>>>> succeeds for legacy but fails for iommufd, with EFAULT in
>>>> IOMMU_IOAS_MAP -> ... __get_user_pages -> check_vma_flags.  Ditto for a
>>>> hugetlbfs file.  This is with unmodified kernels, no live update changes.
>>>>
>>>> Is this expected?
>>>
>>> Yes - I'm surprised that VFIO is not failing though?
>>>
>>> /*
>>>    * Writing to file-backed mappings which require folio dirty tracking using GUP
>>>    * is a fundamentally broken operation, as kernel write access to GUP mappings
>>>    * do not adhere to the semantics expected by a file system.
>>>    *
>>>    * Consider the following scenario:-
>>>    *
>>>    * 1. A folio is written to via GUP which write-faults the memory, notifying
>>>    *    the file system and dirtying the folio.
>>>    * 2. Later, writeback is triggered, resulting in the folio being cleaned and
>>>    *    the PTE being marked read-only.
>>>    * 3. The GUP caller writes to the folio, as it is mapped read/write via the
>>>    *    direct mapping.
>>>    * 4. The GUP caller, now done with the page, unpins it and sets it dirty
>>>    *    (though it does not have to).
>>>    *
>>>    * This results in both data being written to a folio without writenotify, and
>>>    * the folio being dirtied unexpectedly (if the caller decides to do so).
>>>    */
>>> static bool writable_file_mapping_allowed(struct vm_area_struct *vma,
>>> 					  unsigned long gup_flags)
>>> {
>>>
>>> Why doesn't that fail in the VFIO case? I didn't notice anything
>>> special there that would make it work???
>>
>> vfio_dma_do_map -> vfio_pin_map_dma -> vfio_pin_pages_remote does not call GUP,
>> it just grabs a reference to each page.
> 
> Can you be more specific, because...
> 
> vfio_pin_pages_remote
>    vaddr_get_pfns
>      pin_user_pages_remote
>        __gup_longterm_locked
>          __get_user_pages_locked
>            __get_user_pages

Right.

The correct explanation is my test succeeded for an older kernel, based on 5.15.
It failed in 6.9.  This change in 6.4 (which Jason quotes from above) appears to be
the reason:
   8ac2684 mm/gup: disallow FOLL_LONGTERM GUP-nonfast writing to file-backed mappings

I do not question the need for that fix, but I am surprised that QEMU users have not
complained about new kernels failing for hugetlbfs plus vfio.  Maybe most are still
on older stable kernels (as I was).

- Steve



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mapping files for iommufd
  2024-08-13 14:23       ` Steven Sistare
@ 2024-08-13 14:36         ` Jason Gunthorpe
  2024-08-13 15:13           ` Steven Sistare
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2024-08-13 14:36 UTC (permalink / raw)
  To: Steven Sistare; +Cc: Alex Williamson, iommu

On Tue, Aug 13, 2024 at 10:23:10AM -0400, Steven Sistare wrote:

> The correct explanation is my test succeeded for an older kernel, based on 5.15.
> It failed in 6.9.  This change in 6.4 (which Jason quotes from above) appears to be
> the reason:
>   8ac2684 mm/gup: disallow FOLL_LONGTERM GUP-nonfast writing to file-backed mappings
> 
> I do not question the need for that fix, but I am surprised that QEMU users have not
> complained about new kernels failing for hugetlbfs plus vfio.  Maybe most are still
> on older stable kernels (as I was).

hugetlbfs is supposed to work

Normal filesystems are not

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mapping files for iommufd
  2024-08-13 14:36         ` Jason Gunthorpe
@ 2024-08-13 15:13           ` Steven Sistare
  0 siblings, 0 replies; 7+ messages in thread
From: Steven Sistare @ 2024-08-13 15:13 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Alex Williamson, iommu

On 8/13/2024 10:36 AM, Jason Gunthorpe wrote:
> On Tue, Aug 13, 2024 at 10:23:10AM -0400, Steven Sistare wrote:
> 
>> The correct explanation is my test succeeded for an older kernel, based on 5.15.
>> It failed in 6.9.  This change in 6.4 (which Jason quotes from above) appears to be
>> the reason:
>>    8ac2684 mm/gup: disallow FOLL_LONGTERM GUP-nonfast writing to file-backed mappings
>>
>> I do not question the need for that fix, but I am surprised that QEMU users have not
>> complained about new kernels failing for hugetlbfs plus vfio.  Maybe most are still
>> on older stable kernels (as I was).
> 
> hugetlbfs is supposed to work
> 
> Normal filesystems are not

Thank you.  I flubbed the hugetlbfs test and created a regular file.
Fixed that, and it works as expected.

- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-08-13 15:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-12 18:09 mapping files for iommufd Steven Sistare
2024-08-12 18:40 ` Jason Gunthorpe
2024-08-12 18:56   ` Steven Sistare
2024-08-12 21:05     ` Alex Williamson
2024-08-13 14:23       ` Steven Sistare
2024-08-13 14:36         ` Jason Gunthorpe
2024-08-13 15:13           ` Steven Sistare

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.