public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: "Huang, Honglei1" <Honglei1.Huang@amd.com>
To: Demi Marie Obenour <demi@invisiblethingslab.com>
Cc: "Demi Marie Obenour" <demiobenour@gmail.com>,
	"Huang Rui" <ray.huang@amd.com>,
	"Stefano Stabellini" <stefano.stabellini@amd.com>,
	virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, "David Airlie" <airlied@redhat.com>,
	dri-devel@lists.freedesktop.org,
	"Dmitry Osipenko" <dmitry.osipenko@collabora.com>,
	"Gerd Hoffmann" <kraxel@redhat.com>,
	"Gurchetan Singh" <gurchetansingh@chromium.org>,
	"Chia-I Wu" <olvaffe@gmail.com>,
	"Akihiko Odaki" <akihiko.odaki@daynix.com>,
	"Lingshan Zhu" <Lingshan.Zhu@amd.com>,
	"Xen developer discussion" <xen-devel@lists.xenproject.org>,
	"Kernel KVM virtualization development" <kvm@vger.kernel.org>,
	"Xenia Ragiadakou" <burzalodowa@gmail.com>,
	"Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
Subject: Re: [RFC PATCH 3/3] drm/virtio: implement blob userptr resource object
Date: Thu, 6 Feb 2025 18:53:55 +0800	[thread overview]
Message-ID: <7b0bf2d5-700a-4cc7-b410-a9b2e2083b5d@amd.com> (raw)
In-Reply-To: <Z5waZsddenagCYtl@itl-email>

On 2025/1/31 8:33, Demi Marie Obenour wrote:
> On Wed, Jan 29, 2025 at 03:54:59PM -0500, Demi Marie Obenour wrote:
>> On 1/8/25 12:05 PM, Simona Vetter wrote:
>>> On Fri, Dec 27, 2024 at 10:24:29AM +0800, Huang, Honglei1 wrote:
>>>>
>>>> On 2024/12/22 9:59, Demi Marie Obenour wrote:
>>>>> On 12/20/24 10:35 AM, Simona Vetter wrote:
>>>>>> On Fri, Dec 20, 2024 at 06:04:09PM +0800, Honglei Huang wrote:
>>>>>>> From: Honglei Huang <Honglei1.Huang@amd.com>
>>>>>>>
>>>>>>> A virtio-gpu userptr is based on HMM notifier.
>>>>>>> Used for let host access guest userspace memory and
>>>>>>> notice the change of userspace memory.
>>>>>>> This series patches are in very beginning state,
>>>>>>> User space are pinned currently to ensure the host
>>>>>>> device memory operations are correct.
>>>>>>> The free and unmap operations for userspace can be
>>>>>>> handled by MMU notifier this is a simple and basice
>>>>>>> SVM feature for this series patches.
>>>>>>> The physical PFNS update operations is splited into
>>>>>>> two OPs in here. The evicted memories won't be used
>>>>>>> anymore but remap into host again to achieve same
>>>>>>> effect with hmm_rang_fault.
>>>>>>
>>>>>> So in my opinion there are two ways to implement userptr that make sense:
>>>>>>
>>>>>> - pinned userptr with pin_user_pages(FOLL_LONGTERM). there is not mmu
>>>>>>     notifier
>>>>>>
>>>>>> - unpinnned userptr where you entirely rely on userptr and do not hold any
>>>>>>     page references or page pins at all, for full SVM integration. This
>>>>>>     should use hmm_range_fault ideally, since that's the version that
>>>>>>     doesn't ever grab any page reference pins.
>>>>>>
>>>>>> All the in-between variants are imo really bad hacks, whether they hold a
>>>>>> page reference or a temporary page pin (which seems to be what you're
>>>>>> doing here). In much older kernels there was some justification for them,
>>>>>> because strange stuff happened over fork(), but with FOLL_LONGTERM this is
>>>>>> now all sorted out. So there's really only fully pinned, or true svm left
>>>>>> as clean design choices imo.
>>>>>>
>>>>>> With that background, why does pin_user_pages(FOLL_LONGTERM) not work for
>>>>>> you?
>>>>>
>>>>> +1 on using FOLL_LONGTERM.  Fully dynamic memory management has a huge cost
>>>>> in complexity that pinning everything avoids.  Furthermore, this avoids the
>>>>> host having to take action in response to guest memory reclaim requests.
>>>>> This avoids additional complexity (and thus attack surface) on the host side.
>>>>> Furthermore, since this is for ROCm and not for graphics, I am less concerned
>>>>> about supporting systems that require swappable GPU VRAM.
>>>>
>>>> Hi Sima and Demi,
>>>>
>>>> I totally agree the flag FOLL_LONGTERM is needed, I will add it in next
>>>> version.
>>>>
>>>> And for the first pin variants implementation, the MMU notifier is also
>>>> needed I think.Cause the userptr feature in UMD generally used like this:
>>>> the registering of userptr always is explicitly invoked by user code like
>>>> "registerMemoryToGPU(userptrAddr, ...)", but for the userptr release/free,
>>>> there is no explicit API for it, at least in hsakmt/KFD stack. User just
>>>> need call system call "free(userptrAddr)", then kernel driver will release
>>>> the userptr by MMU notifier callback.Virtio-GPU has no other way to know if
>>>> user has been free the userptr except for MMU notifior.And in UMD theres is
>>>> no way to get the free() operation is invoked by user.The only way is use
>>>> MMU notifier in virtio-GPU driver and free the corresponding data in host by
>>>> some virtio CMDs as far as I can see.
>>>>
>>>> And for the second way that is use hmm_range_fault, there is a predictable
>>>> issues as far as I can see, at least in hsakmt/KFD stack. That is the memory
>>>> may migrate when GPU/device is working. In bare metal, when memory is
>>>> migrating KFD driver will pause the compute work of the device in
>>>> mmap_wirte_lock then use hmm_range_fault to remap the migrated/evicted
>>>> memories to GPU then restore the compute work of device to ensure the
>>>> correction of the data. But in virtio-GPU driver the migration happen in
>>>> guest kernel, the evict mmu notifier callback happens in guest, a virtio CMD
>>>> can be used for notify host but as lack of mmap_write_lock protection in
>>>> host kernel, host will hold invalid data for a short period of time, this
>>>> may lead to some issues. And it is hard to fix as far as I can see.
>>>>
>>>> I will extract some APIs into helper according to your request, and I will
>>>> refactor the whole userptr implementation, use some callbacks in page
>>>> getting path, let the pin method and hmm_range_fault can be choiced
>>>> in this series patches.
>>>
>>> Ok, so if this is for svm, then you need full blast hmm, or the semantics
>>> are buggy. You cannot fake svm with pin(FOLL_LONGTERM) userptr, this does
>>> not work.
>>>
>>> The other option is that hsakmt/kfd api is completely busted, and that's
>>> kinda not a kernel problem.
>>> -Sima
>>
>> On further thought, I believe the driver needs to migrate the pages to
>> device memory (really a virtio-GPU blob object) *and* take a FOLL_LONGTERM
>> pin on them.  The reason is that it isn’t possible to migrate these pages
>> back to "host" memory without unmapping them from the GPU.  For the reasons
>> I mention in [1], I believe that temporarily revoking access to virtio-GPU
>> blob objects is not feasible.  Instead, the pages must be treated as if
>> they are permanently in device memory until guest userspace unmaps them
>> from the GPU, after which they must be migrated back to host memory.
> 
> Discussion on IRC indicates that migration isn't reliable.  This is
> because Linux core memory management is largely lock-free for
> performance reasons, so there is no way to prevent temporary elevation
> of a page's reference count.  A page with an elevated reference count
> cannot be migrated.
> 
> The only alternative I can think of is for the hypervisor to perform the
> migration.  The hypervisor can revoke the guest's access to the page
> without the guest's consent or involvement.  The host can then replace
> the page with one of its own pages, which might be on the CPU or GPU.
> Further migration between the CPU and GPU is controlled by the host
> kernel-mode driver (KMD) and host kernel memory management.  The guest
> kernel driver must take a FOLL_LONGTERM pin before telling the host to
> use the pages, but that is all.
> 
> On KVM, this should be essentially automatic, as guest memory really is
> just host userspace memory.  On Xen, this requires that the backend
> domain can revoke fronted access to _any_ frontend page, or at least
> frontend pages that have been granted to the backend.  The backend will
> then need to be able to handle page faults for the frontend pages, and
> replace the frontend pages with its own pages at will.  Furthermore,
> revoking pages that the backend has installed into the frontend must
> never fail, because the backend will panic if it does fail.
> 
> Sima, is putting guest pages under host kernel control the only option?
> I thought that this could be avoided by leaving the pages on the CPU if
> migration fails, but that won't work because there will be no way to
> migrate them to the GPU later, causing performance problems that would
> be impossible to debug.  Is waiting (possibly forever) on migration to
> finish an option?  Otherwise, this might mean extra complexity in the
> Xen hypervisor, as I do not believe the primitives needed are currently
> available.  Specifically, in addition to the primitives discussed at Xen
> Project Summit 2024, the backend also needs to intercept access to, and
> replace the contents of, arbitrary frontend-controlled pages.

Hi Demi,

I agree that to achieve the complete SVM feature in virtio-GPU, it is 
necessary to have the hypervisor deeply involved and add new features.
It needs solid design, I saw the detailed reply in a another thread, it
is very helpful,looking forward to the response from the Xen/hypervisor 
experts.

So for the current virito-GPU userptr implementation, It can not support 
the full SVM feature, it just can only let GPU access the user space 
memory, maybe can be called by userptr feature. I think I will finish 
this small part firstly and then to try to complete the whole SVM feature.

Regards,
Honglei


  reply	other threads:[~2025-02-06 10:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20241220100409.4007346-1-honglei1.huang@amd.com>
     [not found] ` <20241220100409.4007346-3-honglei1.huang@amd.com>
     [not found]   ` <Z2WO2udH2zAEr6ln@phenom.ffwll.local>
     [not found]     ` <2fb36b50-4de2-4060-a4b7-54d221db8647@gmail.com>
     [not found]       ` <de8ade34-eb67-4bff-a1c9-27cb51798843@amd.com>
     [not found]         ` <Z36wV07M8B_wgWPl@phenom.ffwll.local>
     [not found]           ` <9572ba57-5552-4543-a3b0-6097520a12a3@gmail.com>
2025-01-29 19:40             ` [RFC PATCH 3/3] drm/virtio: implement blob userptr resource object Demi Marie Obenour
2025-01-29 20:54           ` Demi Marie Obenour
2025-01-31  0:33             ` Demi Marie Obenour
2025-02-06 10:53               ` Huang, Honglei1 [this message]
2025-02-06 18:21                 ` Demi Marie Obenour
2025-02-07 11:07                   ` Huang, Honglei1
2025-02-08  2:30                     ` Demi Marie Obenour
2025-02-08  2:43                       ` Demi Marie Obenour
     [not found]                         ` <d259279c-9989-410f-907d-9bf0b318bc84@amd.com>
2025-02-08 19:47                           ` Demi Marie Obenour

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b0bf2d5-700a-4cc7-b410-a9b2e2083b5d@amd.com \
    --to=honglei1.huang@amd.com \
    --cc=Lingshan.Zhu@amd.com \
    --cc=airlied@redhat.com \
    --cc=akihiko.odaki@daynix.com \
    --cc=burzalodowa@gmail.com \
    --cc=demi@invisiblethingslab.com \
    --cc=demiobenour@gmail.com \
    --cc=dmitry.osipenko@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gurchetansingh@chromium.org \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marmarek@invisiblethingslab.com \
    --cc=olvaffe@gmail.com \
    --cc=ray.huang@amd.com \
    --cc=stefano.stabellini@amd.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox