From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:52975)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jike.song@intel.com>) id 1aOfcd-0000b2-JH
	for qemu-devel@nongnu.org; Thu, 28 Jan 2016 01:00:00 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jike.song@intel.com>) id 1aOfca-0007Me-Aw
	for qemu-devel@nongnu.org; Thu, 28 Jan 2016 00:59:59 -0500
Received: from mga14.intel.com ([192.55.52.115]:10831)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jike.song@intel.com>) id 1aOfca-0007MQ-1L
	for qemu-devel@nongnu.org; Thu, 28 Jan 2016 00:59:56 -0500
Message-ID: <56A9AE69.3060604@intel.com>
Date: Thu, 28 Jan 2016 14:00:09 +0800
From: Jike Song <jike.song@intel.com>
MIME-Version: 1.0
References: <569C5071.6080004@intel.com> <569CA8AD.6070200@intel.com>
	<1453143919.32741.169.camel@redhat.com>
	<569F4C86.2070501@intel.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D15F786B4B@SHSMSX101.ccr.corp.intel.com>
	<56A6083E.10703@intel.com> <1453757426.32741.614.camel@redhat.com>
	<56A72313.9030009@intel.com> <56A77D2D.40109@gmail.com>
	<1453826249.26652.54.camel@redhat.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D15F78EAEA@SHSMSX101.ccr.corp.intel.com>
	<1453844613.18049.1.camel@redhat.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D15F78EB95@SHSMSX101.ccr.corp.intel.com>
	<1453846073.18049.3.camel@redhat.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D15F78ECBB@SHSMSX101.ccr.corp.intel.com>
	<1453847250.18049.5.camel@redhat.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D15F78ED63@SHSMSX101.ccr.corp.intel.com>
	<1453848975.18049.7.camel@redhat.com> <56A821AD.5090606@intel.com>
	<1453864068.3107.3.camel@redhat.com> <56A85913.1020506@intel.com>
	<1453911589.6261.5.camel@redhat.com>
In-Reply-To: <1453911589.6261.5.camel@redhat.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3
 release of XenGT - a Mediated ...)
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Yang Zhang <yang.zhang.wz@gmail.com>, "Ruan, Shuai" <shuai.ruan@intel.com>, "Tian, Kevin" <kevin.tian@intel.com>, Neo Jia <cjia@nvidia.com>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, "igvt-g@lists.01.org" <igvt-g@ml01.01.org>, qemu-devel <qemu-devel@nongnu.org>, Gerd Hoffmann <kraxel@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, "Lv, Zhiyuan" <zhiyuan.lv@intel.com>

On 01/28/2016 12:19 AM, Alex Williamson wrote:
> On Wed, 2016-01-27 at 13:43 +0800, Jike Song wrote:
{snip}

>> Had a look at eventfd, I would say yes, technically we are able to
>> achieve the goal: introduce a fd, with fop->{read|write} defined in KVM,
>> call into vgpu device-model, also an iodev registered for a MMIO GPA
>> range to invoke the fop->{read|write}.  I just didn't understand why
>> userspace can't register an iodev via API directly.
> 
> Please elaborate on how it would work via iodev.
>

QEMU forwards BAR0 write to the bus driver, in the bus driver, if
found that MEM bit is enabled, register an iodev to KVM: with an
ops:

	const struct kvm_io_device_ops trap_mmio_ops = {
		.read	= kvmgt_guest_mmio_read,
		.write	= kvmgt_guest_mmio_write,
	};

I may not be able to illustrated it clearly with descriptions but this
should not be a problem, thanks to your explanation, I can understand
and adopt it for KVMGT.


>> Besides, this doesn't necessarily require another thread, right?
>> I guess it can be within the VCPU thread? 
> 
> I would think so too, the vcpu is blocked on the MMIO access, we should
> be able to service it in that context.  I hope.
> 

Thanks for confirmation.

>> And this brought another question: except the vfio bus drvier and
>> iommu backend (and the page_track ulitiy used for guest memory write-protection), 
>> is it KVMGT allowed to call into kvm.ko (or modify)? Though we are
>> becoming less and less willing to do that with VFIO, it's still better
>> to know that before going wrong.
> 
> kvm and vfio are separate modules, for the most part, they know nothing
> about each other and have no hard dependencies between them.  We do have
> various accelerations we can use to avoid paths through userspace, but
> these are all via APIs that are agnostic of the party on the other end.
> For example, vfio signals interrups through eventfds and has no concept
> of whether that eventfd terminates in userspace or into an irqfd in KVM.
> vfio supports direct access to device MMIO regions via mmaps, but vfio
> has no idea if that mmap gets directly mapped into a VM address space.
> Even with posted interrupts, we've introduced an irq bypass manager
> allowing interrupt producers and consumers to register independently to
> form a connection without directly knowing anything about the other
> module.  That sort or proper software layering needs to continue.  It
> would be wrong for a vfio bus driver to assume KVM is the user and
> directly call into KVM interfaces.  Thanks,
> 

I understand and agree with your point, it's bad if the bus driver
assume KVM is the user and/or call into KVM interfaces.

However, the vgpu device-model, in intel case also a part of i915 driver,
will always need to call some hypervisor-specific interfaces.
For example, when a guest gfx driver submit GPU commands, the device-model
may want to scan it for security or whatever-else purpose:

	- get a GPA (from GPU page tables)
	- want to read 16 bytes from that GPA
	- call hypervisor-specific read_gpa() method
		- for Xen, the GPA belongs to a foreign domain, it must find
		  a way to map & read it - beyond our scope here;
		- for KVM, the GPA can converted to HVA, copy_from_user (if
		  called from vcpu thread) or access_remote_vm (if called from
		  other threads);

Please note that this is not from the vfio bus driver, but from the vgpu
device-model; also this is not DMA addr from GPU talbes, but real GPA.


> Alex
> 

--
Thanks,
Jike