Re: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jike Song <jike.song@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Yang Zhang <yang.zhang.wz@gmail.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"igvt-g@lists.01.org" <igvt-g@ml01.01.org>,
	qemu-devel <qemu-devel@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
Date: Fri, 29 Jan 2016 16:49:46 +0800	[thread overview]
Message-ID: <56AB27AA.80602@intel.com> (raw)
In-Reply-To: <56AB12CC.5000402@intel.com>

On 01/29/2016 03:20 PM, Jike Song wrote:
> This discussion becomes a little difficult for a newbie like me :(
> 
> On 01/28/2016 11:23 PM, Alex Williamson wrote:
>> On Thu, 2016-01-28 at 14:00 +0800, Jike Song wrote:
>>> On 01/28/2016 12:19 AM, Alex Williamson wrote:
>>>> On Wed, 2016-01-27 at 13:43 +0800, Jike Song wrote:
>>> {snip}
>>>  
>>>>> Had a look at eventfd, I would say yes, technically we are able to
>>>>> achieve the goal: introduce a fd, with fop->{read|write} defined in KVM,
>>>>> call into vgpu device-model, also an iodev registered for a MMIO GPA
>>>>> range to invoke the fop->{read|write}.  I just didn't understand why
>>>>> userspace can't register an iodev via API directly.
>>>>  
>>>> Please elaborate on how it would work via iodev.
>>>>  
>>>  
>>> QEMU forwards BAR0 write to the bus driver, in the bus driver, if
>>> found that MEM bit is enabled, register an iodev to KVM: with an
>>> ops:
>>>  
>>>  	const struct kvm_io_device_ops trap_mmio_ops = {
>>>  		.read	= kvmgt_guest_mmio_read,
>>>  		.write	= kvmgt_guest_mmio_write,
>>>  	};
>>>  
>>> I may not be able to illustrated it clearly with descriptions but this
>>> should not be a problem, thanks to your explanation, I can understand
>>> and adopt it for KVMGT.
>>
>> You're still crossing modules with direct callbacks, right?  What's the
>> advantage versus using the file descriptor + offset approach which could
>> offer the same performance and improve KVM overall by creating a new
>> option for generically handling MMIO?
>>
> 
> Yes, the method I gave above is the current way: calling kvm_io_device_ops
> from KVM hypervisor, and then going to vgpu device-model directly.
> 
> From KVMGT's side this is almost the same as what you suggested, I don't
> think now we have a problem here. I will adopt your suggestion.
> 
>>>>> Besides, this doesn't necessarily require another thread, right?
>>>>> I guess it can be within the VCPU thread? 
>>>>  
>>>> I would think so too, the vcpu is blocked on the MMIO access, we should
>>>> be able to service it in that context.  I hope.
>>>>  
>>>  
>>> Thanks for confirmation.
>>>  
>>>>> And this brought another question: except the vfio bus drvier and
>>>>> iommu backend (and the page_track ulitiy used for guest memory write-protection), 
>>>>> is it KVMGT allowed to call into kvm.ko (or modify)? Though we are
>>>>> becoming less and less willing to do that with VFIO, it's still better
>>>>> to know that before going wrong.
>>>>  
>>>> kvm and vfio are separate modules, for the most part, they know nothing
>>>> about each other and have no hard dependencies between them.  We do have
>>>> various accelerations we can use to avoid paths through userspace, but
>>>> these are all via APIs that are agnostic of the party on the other end.
>>>> For example, vfio signals interrups through eventfds and has no concept
>>>> of whether that eventfd terminates in userspace or into an irqfd in KVM.
>>>> vfio supports direct access to device MMIO regions via mmaps, but vfio
>>>> has no idea if that mmap gets directly mapped into a VM address space.
>>>> Even with posted interrupts, we've introduced an irq bypass manager
>>>> allowing interrupt producers and consumers to register independently to
>>>> form a connection without directly knowing anything about the other
>>>> module.  That sort or proper software layering needs to continue.  It
>>>> would be wrong for a vfio bus driver to assume KVM is the user and
>>>> directly call into KVM interfaces.  Thanks,
>>>>  
>>>  
>>> I understand and agree with your point, it's bad if the bus driver
>>> assume KVM is the user and/or call into KVM interfaces.
>>>  
>>> However, the vgpu device-model, in intel case also a part of i915 driver,
>>> will always need to call some hypervisor-specific interfaces.
>>
>> No, think differently.
>>
>>> For example, when a guest gfx driver submit GPU commands, the device-model
>>> may want to scan it for security or whatever-else purpose:
>>>  
>>>  	- get a GPA (from GPU page tables)
>>>  	- want to read 16 bytes from that GPA
>>>  	- call hypervisor-specific read_gpa() method
>>>  		- for Xen, the GPA belongs to a foreign domain, it must find
>>>  		  a way to map & read it - beyond our scope here;
>>>  		- for KVM, the GPA can converted to HVA, copy_from_user (if
>>>  		  called from vcpu thread) or access_remote_vm (if called from
>>>  		  other threads);
>>>  
>>> Please note that this is not from the vfio bus driver, but from the vgpu
>>> device-model; also this is not DMA addr from GPU talbes, but real GPA.
>>
>> This is exactly why we're proposing that the vfio IOMMU interface be
>> used as a database of guest translations. 
>> The type1 IOMMU model in QEMU
>> maps all of guest memory through the IOMMU, in the vGPU model type1 is
>> simply collecting these and they map GPA to process virtual memory.
> 
> GPA to HVA mappings are maintained in KVM/QEMU, via memslots.
> Do you mean making type1 to duplicate the GPA <-> HVA/HPA translations from
> KVM? Even technically this could be done, how to synchronize it with KVM
> hypervisor? e.g. What is expected if guest hot-add a memslot?
> 
> What's more, GPA is totally a virtualization term. When VFIO is used for
> device assignment, it uses GPA as IOVA, maps it to HPA, that's true.
> But for KVMGT, since vGPU doesn't have its own DMA requester ID, VFIO
> won't call IOMMU-API, but DMA-API instead.  GPAs from different guests
> may be identical, while IGD can only have 1 single IOMMU domain ...
> 
> 
>> When the GPU driver wants to get a GPA, it does so from this database.
>> If it wants to read from it, it could get the mm and read from the
>> virtual memory or pin the page for a GPA to HPA translation and read
>> from the HPA.  There is no reason to poke directly through to the
>> hypervisor here.  Let's design what you need into the vgpu version of
>> the type1 IOMMU instead.  Thanks,
> 
> For KVM, to access a GPA, having it translated to HVA is enough.
> 
> IIUC this may be the only remaining problem between us: where should
> a GPA be translated to HVA, KVM or VFIO?
> 

Unfortunately it's not the only one. Another example is, device-model
may want to write-protect a gfn (RAM). In case that this request goes
to VFIO .. how it is supposed to reach KVM MMU?

> 
--
Thanks,
Jike

next prev parent reply	other threads:[~2016-01-29  8:49 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-18  2:39 VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) Jike Song
2016-01-18  4:47 ` Alex Williamson
2016-01-18  8:56   ` Jike Song
2016-01-18 19:05     ` Alex Williamson
2016-01-20  8:59       ` Jike Song
2016-01-20  9:05         ` Tian, Kevin
2016-01-25 11:34           ` Jike Song
2016-01-25 21:30             ` Alex Williamson
2016-01-25 21:45               ` Tian, Kevin
2016-01-25 21:48                 ` Tian, Kevin
2016-01-26  9:48                 ` Neo Jia
2016-01-26 10:20                 ` Neo Jia
2016-01-26 19:24                   ` Tian, Kevin
2016-01-26 19:29                     ` Neo Jia
2016-01-26 20:06                   ` Alex Williamson
2016-01-26 21:38                     ` Tian, Kevin
2016-01-26 22:28                     ` Neo Jia
2016-01-26 23:30                       ` Alex Williamson
2016-01-27  9:14                         ` Neo Jia
2016-01-27 16:10                           ` Alex Williamson
2016-01-27 21:48                             ` Neo Jia
2016-01-27  8:06                     ` Kirti Wankhede
2016-01-27 16:00                       ` Alex Williamson
2016-01-27 20:55                         ` Kirti Wankhede
2016-01-27 21:58                           ` Alex Williamson
2016-01-28  3:01                             ` Kirti Wankhede
2016-01-26  7:41               ` Jike Song
2016-01-26 14:05                 ` Yang Zhang
2016-01-26 16:37                   ` Alex Williamson
2016-01-26 21:21                     ` Tian, Kevin
2016-01-26 21:30                       ` Neo Jia
2016-01-26 21:43                         ` Tian, Kevin
2016-01-26 21:43                       ` Alex Williamson
2016-01-26 21:50                         ` Tian, Kevin
2016-01-26 22:07                           ` Alex Williamson
2016-01-26 22:15                             ` Tian, Kevin
2016-01-26 22:27                               ` Alex Williamson
2016-01-26 22:39                                 ` Tian, Kevin
2016-01-26 22:56                                   ` Alex Williamson
2016-01-27  1:47                                     ` Jike Song
2016-01-27  3:07                                       ` Alex Williamson
2016-01-27  5:43                                         ` Jike Song
2016-01-27 16:19                                           ` Alex Williamson
2016-01-28  6:00                                             ` Jike Song
2016-01-28 15:23                                               ` Alex Williamson
2016-01-29  7:20                                                 ` Jike Song
2016-01-29  8:49                                                   ` Jike Song [this message]
2016-01-29 18:50                                                     ` [iGVT-g] " Alex Williamson
2016-02-01 13:10                                                       ` Gerd Hoffmann
2016-02-01 21:44                                                         ` Alex Williamson
2016-02-02  7:28                                                           ` Gerd Hoffmann
2016-02-02  7:35                                                           ` Zhiyuan Lv
2016-01-27  1:52                                     ` Yang Zhang
2016-01-27  3:37                                       ` Alex Williamson
2016-01-27  0:06                   ` Jike Song
2016-01-27  1:34                     ` Yang Zhang
2016-01-27  1:51                       ` Jike Song
2016-01-26 16:12                 ` Alex Williamson
2016-01-26 21:57                   ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56AB27AA.80602@intel.com \
    --to=jike.song@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=igvt-g@ml01.01.org \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).