All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Jike Song <jike.song@intel.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	Yang Zhang <yang.zhang.wz@gmail.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>,
	"Ruan, Shuai" <shuai.ruan@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	qemu-devel <qemu-devel@nongnu.org>,
	"igvt-g@lists.01.org" <igvt-g@ml01.01.org>,
	Neo Jia <cjia@nvidia.com>
Subject: Re: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
Date: Thu, 28 Jan 2016 08:23:06 -0700	[thread overview]
Message-ID: <1453994586.29166.1.camel@redhat.com> (raw)
In-Reply-To: <56A9AE69.3060604@intel.com>

On Thu, 2016-01-28 at 14:00 +0800, Jike Song wrote:
> On 01/28/2016 12:19 AM, Alex Williamson wrote:
> > On Wed, 2016-01-27 at 13:43 +0800, Jike Song wrote:
> {snip}
> 
> > > Had a look at eventfd, I would say yes, technically we are able to
> > > achieve the goal: introduce a fd, with fop->{read|write} defined in KVM,
> > > call into vgpu device-model, also an iodev registered for a MMIO GPA
> > > range to invoke the fop->{read|write}.  I just didn't understand why
> > > userspace can't register an iodev via API directly.
> > 
> > Please elaborate on how it would work via iodev.
> > 
> 
> QEMU forwards BAR0 write to the bus driver, in the bus driver, if
> found that MEM bit is enabled, register an iodev to KVM: with an
> ops:
> 
> 	const struct kvm_io_device_ops trap_mmio_ops = {
> 		.read	= kvmgt_guest_mmio_read,
> 		.write	= kvmgt_guest_mmio_write,
> 	};
> 
> I may not be able to illustrated it clearly with descriptions but this
> should not be a problem, thanks to your explanation, I can understand
> and adopt it for KVMGT.

You're still crossing modules with direct callbacks, right?  What's the
advantage versus using the file descriptor + offset approach which could
offer the same performance and improve KVM overall by creating a new
option for generically handling MMIO?

> > > Besides, this doesn't necessarily require another thread, right?
> > > I guess it can be within the VCPU thread? 
> > 
> > I would think so too, the vcpu is blocked on the MMIO access, we should
> > be able to service it in that context.  I hope.
> > 
> 
> Thanks for confirmation.
> 
> > > And this brought another question: except the vfio bus drvier and
> > > iommu backend (and the page_track ulitiy used for guest memory write-protection), 
> > > is it KVMGT allowed to call into kvm.ko (or modify)? Though we are
> > > becoming less and less willing to do that with VFIO, it's still better
> > > to know that before going wrong.
> > 
> > kvm and vfio are separate modules, for the most part, they know nothing
> > about each other and have no hard dependencies between them.  We do have
> > various accelerations we can use to avoid paths through userspace, but
> > these are all via APIs that are agnostic of the party on the other end.
> > For example, vfio signals interrups through eventfds and has no concept
> > of whether that eventfd terminates in userspace or into an irqfd in KVM.
> > vfio supports direct access to device MMIO regions via mmaps, but vfio
> > has no idea if that mmap gets directly mapped into a VM address space.
> > Even with posted interrupts, we've introduced an irq bypass manager
> > allowing interrupt producers and consumers to register independently to
> > form a connection without directly knowing anything about the other
> > module.  That sort or proper software layering needs to continue.  It
> > would be wrong for a vfio bus driver to assume KVM is the user and
> > directly call into KVM interfaces.  Thanks,
> > 
> 
> I understand and agree with your point, it's bad if the bus driver
> assume KVM is the user and/or call into KVM interfaces.
> 
> However, the vgpu device-model, in intel case also a part of i915 driver,
> will always need to call some hypervisor-specific interfaces.

No, think differently.

> For example, when a guest gfx driver submit GPU commands, the device-model
> may want to scan it for security or whatever-else purpose:
> 
> 	- get a GPA (from GPU page tables)
> 	- want to read 16 bytes from that GPA
> 	- call hypervisor-specific read_gpa() method
> 		- for Xen, the GPA belongs to a foreign domain, it must find
> 		  a way to map & read it - beyond our scope here;
> 		- for KVM, the GPA can converted to HVA, copy_from_user (if
> 		  called from vcpu thread) or access_remote_vm (if called from
> 		  other threads);
> 
> Please note that this is not from the vfio bus driver, but from the vgpu
> device-model; also this is not DMA addr from GPU talbes, but real GPA.

This is exactly why we're proposing that the vfio IOMMU interface be
used as a database of guest translations.  The type1 IOMMU model in QEMU
maps all of guest memory through the IOMMU, in the vGPU model type1 is
simply collecting these and they map GPA to process virtual memory.
When the GPU driver wants to get a GPA, it does so from this database.
If it wants to read from it, it could get the mm and read from the
virtual memory or pin the page for a GPA to HPA translation and read
from the HPA.  There is no reason to poke directly through to the
hypervisor here.  Let's design what you need into the vgpu version of
the type1 IOMMU instead.  Thanks,

Alex


WARNING: multiple messages have this Message-ID (diff)
From: Alex Williamson <alex.williamson@redhat.com>
To: Jike Song <jike.song@intel.com>
Cc: Yang Zhang <yang.zhang.wz@gmail.com>,
	"Ruan, Shuai" <shuai.ruan@intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>, Neo Jia <cjia@nvidia.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"igvt-g@lists.01.org" <igvt-g@ml01.01.org>,
	qemu-devel <qemu-devel@nongnu.org>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>
Subject: Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
Date: Thu, 28 Jan 2016 08:23:06 -0700	[thread overview]
Message-ID: <1453994586.29166.1.camel@redhat.com> (raw)
In-Reply-To: <56A9AE69.3060604@intel.com>

On Thu, 2016-01-28 at 14:00 +0800, Jike Song wrote:
> On 01/28/2016 12:19 AM, Alex Williamson wrote:
> > On Wed, 2016-01-27 at 13:43 +0800, Jike Song wrote:
> {snip}
> 
> > > Had a look at eventfd, I would say yes, technically we are able to
> > > achieve the goal: introduce a fd, with fop->{read|write} defined in KVM,
> > > call into vgpu device-model, also an iodev registered for a MMIO GPA
> > > range to invoke the fop->{read|write}.  I just didn't understand why
> > > userspace can't register an iodev via API directly.
> > 
> > Please elaborate on how it would work via iodev.
> > 
> 
> QEMU forwards BAR0 write to the bus driver, in the bus driver, if
> found that MEM bit is enabled, register an iodev to KVM: with an
> ops:
> 
> 	const struct kvm_io_device_ops trap_mmio_ops = {
> 		.read	= kvmgt_guest_mmio_read,
> 		.write	= kvmgt_guest_mmio_write,
> 	};
> 
> I may not be able to illustrated it clearly with descriptions but this
> should not be a problem, thanks to your explanation, I can understand
> and adopt it for KVMGT.

You're still crossing modules with direct callbacks, right?  What's the
advantage versus using the file descriptor + offset approach which could
offer the same performance and improve KVM overall by creating a new
option for generically handling MMIO?

> > > Besides, this doesn't necessarily require another thread, right?
> > > I guess it can be within the VCPU thread? 
> > 
> > I would think so too, the vcpu is blocked on the MMIO access, we should
> > be able to service it in that context.  I hope.
> > 
> 
> Thanks for confirmation.
> 
> > > And this brought another question: except the vfio bus drvier and
> > > iommu backend (and the page_track ulitiy used for guest memory write-protection), 
> > > is it KVMGT allowed to call into kvm.ko (or modify)? Though we are
> > > becoming less and less willing to do that with VFIO, it's still better
> > > to know that before going wrong.
> > 
> > kvm and vfio are separate modules, for the most part, they know nothing
> > about each other and have no hard dependencies between them.  We do have
> > various accelerations we can use to avoid paths through userspace, but
> > these are all via APIs that are agnostic of the party on the other end.
> > For example, vfio signals interrups through eventfds and has no concept
> > of whether that eventfd terminates in userspace or into an irqfd in KVM.
> > vfio supports direct access to device MMIO regions via mmaps, but vfio
> > has no idea if that mmap gets directly mapped into a VM address space.
> > Even with posted interrupts, we've introduced an irq bypass manager
> > allowing interrupt producers and consumers to register independently to
> > form a connection without directly knowing anything about the other
> > module.  That sort or proper software layering needs to continue.  It
> > would be wrong for a vfio bus driver to assume KVM is the user and
> > directly call into KVM interfaces.  Thanks,
> > 
> 
> I understand and agree with your point, it's bad if the bus driver
> assume KVM is the user and/or call into KVM interfaces.
> 
> However, the vgpu device-model, in intel case also a part of i915 driver,
> will always need to call some hypervisor-specific interfaces.

No, think differently.

> For example, when a guest gfx driver submit GPU commands, the device-model
> may want to scan it for security or whatever-else purpose:
> 
> 	- get a GPA (from GPU page tables)
> 	- want to read 16 bytes from that GPA
> 	- call hypervisor-specific read_gpa() method
> 		- for Xen, the GPA belongs to a foreign domain, it must find
> 		  a way to map & read it - beyond our scope here;
> 		- for KVM, the GPA can converted to HVA, copy_from_user (if
> 		  called from vcpu thread) or access_remote_vm (if called from
> 		  other threads);
> 
> Please note that this is not from the vfio bus driver, but from the vgpu
> device-model; also this is not DMA addr from GPU talbes, but real GPA.

This is exactly why we're proposing that the vfio IOMMU interface be
used as a database of guest translations.  The type1 IOMMU model in QEMU
maps all of guest memory through the IOMMU, in the vGPU model type1 is
simply collecting these and they map GPA to process virtual memory.
When the GPU driver wants to get a GPA, it does so from this database.
If it wants to read from it, it could get the mm and read from the
virtual memory or pin the page for a GPA to HPA translation and read
from the HPA.  There is no reason to poke directly through to the
hypervisor here.  Let's design what you need into the vgpu version of
the type1 IOMMU instead.  Thanks,

Alex

  reply	other threads:[~2016-01-28 15:23 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-18  2:39 VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) Jike Song
2016-01-18  2:39 ` [Qemu-devel] " Jike Song
2016-01-18  4:47 ` Alex Williamson
2016-01-18  4:47   ` [Qemu-devel] " Alex Williamson
2016-01-18  8:56   ` Jike Song
2016-01-18  8:56     ` [Qemu-devel] " Jike Song
2016-01-18 19:05     ` Alex Williamson
2016-01-18 19:05       ` [Qemu-devel] " Alex Williamson
2016-01-20  8:59       ` Jike Song
2016-01-20  8:59         ` [Qemu-devel] " Jike Song
2016-01-20  9:05         ` Tian, Kevin
2016-01-20  9:05           ` [Qemu-devel] " Tian, Kevin
2016-01-25 11:34           ` Jike Song
2016-01-25 11:34             ` [Qemu-devel] " Jike Song
2016-01-25 21:30             ` Alex Williamson
2016-01-25 21:30               ` [Qemu-devel] " Alex Williamson
2016-01-25 21:45               ` Tian, Kevin
2016-01-25 21:45                 ` [Qemu-devel] " Tian, Kevin
2016-01-25 21:48                 ` Tian, Kevin
2016-01-25 21:48                   ` [Qemu-devel] " Tian, Kevin
2016-01-26  9:48                 ` Neo Jia
2016-01-26  9:48                   ` [Qemu-devel] " Neo Jia
2016-01-26 10:20                 ` Neo Jia
2016-01-26 10:20                   ` [Qemu-devel] " Neo Jia
2016-01-26 19:24                   ` Tian, Kevin
2016-01-26 19:24                     ` [Qemu-devel] " Tian, Kevin
2016-01-26 19:29                     ` Neo Jia
2016-01-26 19:29                       ` [Qemu-devel] " Neo Jia
2016-01-26 20:06                   ` Alex Williamson
2016-01-26 20:06                     ` [Qemu-devel] " Alex Williamson
2016-01-26 21:38                     ` Tian, Kevin
2016-01-26 21:38                       ` [Qemu-devel] " Tian, Kevin
2016-01-26 22:28                     ` Neo Jia
2016-01-26 22:28                       ` [Qemu-devel] " Neo Jia
2016-01-26 23:30                       ` Alex Williamson
2016-01-26 23:30                         ` [Qemu-devel] " Alex Williamson
2016-01-27  9:14                         ` Neo Jia
2016-01-27  9:14                           ` [Qemu-devel] " Neo Jia
2016-01-27 16:10                           ` Alex Williamson
2016-01-27 16:10                             ` [Qemu-devel] " Alex Williamson
2016-01-27 21:48                             ` Neo Jia
2016-01-27 21:48                               ` [Qemu-devel] " Neo Jia
2016-01-27  8:06                     ` Kirti Wankhede
2016-01-27  8:06                       ` [Qemu-devel] " Kirti Wankhede
2016-01-27 16:00                       ` Alex Williamson
2016-01-27 16:00                         ` [Qemu-devel] " Alex Williamson
2016-01-27 20:55                         ` Kirti Wankhede
2016-01-27 20:55                           ` [Qemu-devel] " Kirti Wankhede
2016-01-27 21:58                           ` Alex Williamson
2016-01-27 21:58                             ` [Qemu-devel] " Alex Williamson
2016-01-28  3:01                             ` Kirti Wankhede
2016-01-28  3:01                               ` [Qemu-devel] " Kirti Wankhede
2016-01-26  7:41               ` Jike Song
2016-01-26  7:41                 ` [Qemu-devel] " Jike Song
2016-01-26 14:05                 ` Yang Zhang
2016-01-26 14:05                   ` [Qemu-devel] " Yang Zhang
2016-01-26 16:37                   ` Alex Williamson
2016-01-26 16:37                     ` [Qemu-devel] " Alex Williamson
2016-01-26 21:21                     ` Tian, Kevin
2016-01-26 21:21                       ` [Qemu-devel] " Tian, Kevin
2016-01-26 21:30                       ` Neo Jia
2016-01-26 21:30                         ` [Qemu-devel] " Neo Jia
2016-01-26 21:43                         ` Tian, Kevin
2016-01-26 21:43                           ` [Qemu-devel] " Tian, Kevin
2016-01-26 21:43                       ` Alex Williamson
2016-01-26 21:43                         ` [Qemu-devel] " Alex Williamson
2016-01-26 21:50                         ` Tian, Kevin
2016-01-26 21:50                           ` [Qemu-devel] " Tian, Kevin
2016-01-26 22:07                           ` Alex Williamson
2016-01-26 22:07                             ` [Qemu-devel] " Alex Williamson
2016-01-26 22:15                             ` Tian, Kevin
2016-01-26 22:15                               ` [Qemu-devel] " Tian, Kevin
2016-01-26 22:27                               ` Alex Williamson
2016-01-26 22:27                                 ` [Qemu-devel] " Alex Williamson
2016-01-26 22:39                                 ` Tian, Kevin
2016-01-26 22:39                                   ` [Qemu-devel] " Tian, Kevin
2016-01-26 22:56                                   ` Alex Williamson
2016-01-26 22:56                                     ` [Qemu-devel] " Alex Williamson
2016-01-27  1:47                                     ` Jike Song
2016-01-27  1:47                                       ` [Qemu-devel] " Jike Song
2016-01-27  3:07                                       ` Alex Williamson
2016-01-27  3:07                                         ` [Qemu-devel] " Alex Williamson
2016-01-27  5:43                                         ` Jike Song
2016-01-27  5:43                                           ` [Qemu-devel] " Jike Song
2016-01-27 16:19                                           ` Alex Williamson
2016-01-27 16:19                                             ` [Qemu-devel] " Alex Williamson
2016-01-28  6:00                                             ` Jike Song
2016-01-28  6:00                                               ` [Qemu-devel] " Jike Song
2016-01-28 15:23                                               ` Alex Williamson [this message]
2016-01-28 15:23                                                 ` Alex Williamson
2016-01-29  7:20                                                 ` Jike Song
2016-01-29  7:20                                                   ` [Qemu-devel] " Jike Song
2016-01-29  8:49                                                   ` [iGVT-g] " Jike Song
2016-01-29  8:49                                                     ` [Qemu-devel] " Jike Song
2016-01-29 18:50                                                     ` Alex Williamson
2016-01-29 18:50                                                       ` [Qemu-devel] " Alex Williamson
2016-02-01 13:10                                                       ` Gerd Hoffmann
2016-02-01 13:10                                                         ` [Qemu-devel] " Gerd Hoffmann
2016-02-01 21:44                                                         ` Alex Williamson
2016-02-01 21:44                                                           ` [Qemu-devel] " Alex Williamson
2016-02-02  7:28                                                           ` Gerd Hoffmann
2016-02-02  7:28                                                             ` [Qemu-devel] " Gerd Hoffmann
2016-02-02  7:35                                                           ` Zhiyuan Lv
2016-02-02  7:35                                                             ` [Qemu-devel] " Zhiyuan Lv
2016-01-27  1:52                                     ` Yang Zhang
2016-01-27  1:52                                       ` [Qemu-devel] " Yang Zhang
2016-01-27  3:37                                       ` Alex Williamson
2016-01-27  3:37                                         ` [Qemu-devel] " Alex Williamson
2016-01-27  0:06                   ` Jike Song
2016-01-27  0:06                     ` [Qemu-devel] " Jike Song
2016-01-27  1:34                     ` Yang Zhang
2016-01-27  1:34                       ` [Qemu-devel] " Yang Zhang
2016-01-27  1:51                       ` Jike Song
2016-01-27  1:51                         ` [Qemu-devel] " Jike Song
2016-01-26 16:12                 ` Alex Williamson
2016-01-26 16:12                   ` [Qemu-devel] " Alex Williamson
2016-01-26 21:57                   ` Tian, Kevin
2016-01-26 21:57                     ` [Qemu-devel] " Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1453994586.29166.1.camel@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=cjia@nvidia.com \
    --cc=igvt-g@ml01.01.org \
    --cc=jike.song@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuai.ruan@intel.com \
    --cc=yang.zhang.wz@gmail.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.