From: Jike Song <jike.song@intel.com>
To: Alex Williamson <alwillia@redhat.com>
Cc: "Ruan, Shuai" <shuai.ruan@intel.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
kvm@vger.kernel.org, qemu-devel <qemu-devel@nongnu.org>,
"igvt-g@lists.01.org" <igvt-g@lists.01.org>,
Gerd Hoffmann <kraxel@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Zhiyuan Lv <zhiyuan.lv@intel.com>
Subject: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
Date: Mon, 18 Jan 2016 10:39:45 +0800 [thread overview]
Message-ID: <569C5071.6080004@intel.com> (raw)
Hi Alex, let's continue with a new thread :)
Basically we agree with you: exposing vGPU via VFIO can make
QEMU share as much code as possible with pcidev(PF or VF) assignment.
And yes, different vGPU vendors can share quite a lot of the
QEMU part, which will do good for upper layers such as libvirt.
To achieve this, there are quite a lot to do, I'll summarize
it below. I dived into VFIO for a while but still may have
things misunderstood, so please correct me :)
First, let me illustrate my understanding of current VFIO
framework used to pass through a pcidev to guest:
+----------------------------------+
| vfio qemu |
+-----+------------------------+---+
|DMA ^ |CFG
QEMU |map IRQ| |
-----------------------|---------------------|--|-----------
KERNEL +------------|---------------------|--|----------+
| VFIO | | | |
| v | v |
| +-------------------+ +-----+-----------+ |
IOMMU | | vfio iommu driver | | vfio bus driver | |
API <-------+ | | | |
Layer | | e.g. type1 | | e.g. vfio_pci | |
| +-------------------+ +-----------------+ |
+------------------------------------------------+
Here when a particular pcidev is passed-through to a KVM guest,
it is attached to vfio_pci driver in host, and guest memory
is mapped into IOMMU via the type1 iommu driver.
Then, the draft infrastructure of future VFIO-based vgpu:
+-------------------------------------+
| vfio qemu |
+----+-------------------------+------+
|DMA ^ |CFG
QEMU |map IRQ| |
----------------------|----------------------|--|-----------
KERNEL | | |
+------------|----------------------|--|----------+
|VFIO | | | |
| v | v |
| +--------------------+ +-----+-----------+ |
DMA | | vfio iommu driver | | vfio bus driver | |
API <------+ | | | |
Layer | | e.g. vfio_type2 | | e.g. vfio_vgpu | |
| +--------------------+ +-----------------+ |
| | ^ | ^ |
+---------|--|----------------------|--|----------+
| | | |
| | v |
+---------|--|----------+ +---------------------+
| +-------v-----------+ | | |
| | | | | |
| | KVMGT | | | |
| | | | | host gfx driver |
| +-------------------+ | | |
| | | |
| KVM hypervisor | | |
+-----------------------+ +---------------------+
NOTE vfio_type2 and vfio_vgpu are only *logically* parts
of VFIO, they may be implemented in KVM hypervisor
or host gfx driver.
Here we need to implement a new vfio IOMMU driver instead of type1,
let's call it vfio_type2 temporarily. The main difference from pcidev
assignment is, vGPU doesn't have its own DMA requester id, so it has
to share mappings with host and other vGPUs.
- type1 iommu driver maps gpa to hpa for passing through;
whereas type2 maps iova to hpa;
- hardware iommu is always needed by type1, whereas for
type2, hardware iommu is optional;
- type1 will invoke low-level IOMMU API (iommu_map et al) to
setup IOMMU page table directly, whereas type2 dosen't (only
need to invoke higher level DMA API like dma_map_page);
We also need to implement a new 'bus' driver instead of vfio_pci,
let's call it vfio_vgpu temporarily:
- vfio_pci is a real pci driver, it has a probe method called
during dev attaching; whereas the vfio_vgpu is a pseudo
driver, it won't attach any devivce - the GPU is always owned by
host gfx driver. It has to do 'probing' elsewhere, but
still in host gfx driver attached to the device;
- pcidev(PF or VF) attached to vfio_pci has a natural path
in sysfs; whereas vgpu is purely a software concept:
vfio_vgpu needs to create create/destory vgpu instances,
maintain their paths in sysfs (e.g. "/sys/class/vgpu/intel/vgpu0")
etc. There should be something added in a higher layer
to do this (VFIO or DRM).
- vfio_pci in most case will allow QEMU to access pcidev
hardware; whereas vfio_vgpu is to access virtual resource
emulated by another device model;
- vfio_pci will inject an IRQ to guest only when physical IRQ
generated; whereas vfio_vgpu may inject an IRQ for emulation
purpose. Anyway they can share the same injection interface;
Questions:
[1] For VFIO No-IOMMU mode (!iommu_present), I saw it was reverted
in upstream ae5515d66362(Revert: "vfio: Include No-IOMMU mode").
In my opinion, vfio_type2 doesn't rely on it to support No-IOMMU
case, instead it needs a new implementation which fits both
w/ and w/o IOMMU. Is this correct?
For things not mentioned above, we might have them discussed in
other threads, or temporarily maintained in a TODO list (we might get
back to them after the big picture get agreed):
- How to expose guest framebuffer via VFIO for SPICE;
- How to avoid double translation with two-stage: GTT + IOMMU,
whether identity map is possible, and if yes, how to make it
more effectively;
- Application acceleration
You mentioned that with VFIO, a vGPU may be used by
applications to get GPU acceleration. It's a potential
opportunity to use vGPU for container usage, worthy of
further investigation.
--
Thanks,
Jike
next reply other threads:[~2016-01-18 2:39 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-18 2:39 Jike Song [this message]
2016-01-18 4:47 ` VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) Alex Williamson
2016-01-18 8:56 ` Jike Song
2016-01-18 19:05 ` Alex Williamson
2016-01-20 8:59 ` Jike Song
2016-01-20 9:05 ` Tian, Kevin
2016-01-25 11:34 ` Jike Song
2016-01-25 21:30 ` Alex Williamson
2016-01-25 21:45 ` Tian, Kevin
2016-01-25 21:48 ` Tian, Kevin
2016-01-26 9:48 ` Neo Jia
2016-01-26 10:20 ` Neo Jia
2016-01-26 19:24 ` Tian, Kevin
2016-01-26 19:29 ` Neo Jia
2016-01-26 20:06 ` Alex Williamson
2016-01-26 21:38 ` Tian, Kevin
2016-01-26 22:28 ` Neo Jia
2016-01-26 23:30 ` Alex Williamson
2016-01-27 9:14 ` Neo Jia
2016-01-27 16:10 ` Alex Williamson
2016-01-27 21:48 ` Neo Jia
2016-01-27 8:06 ` Kirti Wankhede
2016-01-27 16:00 ` Alex Williamson
2016-01-27 20:55 ` Kirti Wankhede
2016-01-27 21:58 ` Alex Williamson
2016-01-28 3:01 ` Kirti Wankhede
2016-01-26 7:41 ` Jike Song
2016-01-26 14:05 ` Yang Zhang
2016-01-26 16:37 ` Alex Williamson
2016-01-26 21:21 ` Tian, Kevin
2016-01-26 21:30 ` Neo Jia
2016-01-26 21:43 ` Tian, Kevin
2016-01-26 21:43 ` Alex Williamson
2016-01-26 21:50 ` Tian, Kevin
2016-01-26 22:07 ` Alex Williamson
2016-01-26 22:15 ` Tian, Kevin
2016-01-26 22:27 ` Alex Williamson
2016-01-26 22:39 ` Tian, Kevin
2016-01-26 22:56 ` Alex Williamson
2016-01-27 1:47 ` Jike Song
2016-01-27 3:07 ` Alex Williamson
2016-01-27 5:43 ` Jike Song
2016-01-27 16:19 ` Alex Williamson
2016-01-28 6:00 ` Jike Song
2016-01-28 15:23 ` Alex Williamson
2016-01-29 7:20 ` Jike Song
2016-01-29 8:49 ` [iGVT-g] " Jike Song
2016-01-29 18:50 ` Alex Williamson
2016-02-01 13:10 ` Gerd Hoffmann
2016-02-01 21:44 ` Alex Williamson
2016-02-02 7:28 ` Gerd Hoffmann
2016-02-02 7:35 ` Zhiyuan Lv
2016-01-27 1:52 ` Yang Zhang
2016-01-27 3:37 ` Alex Williamson
2016-01-27 0:06 ` Jike Song
2016-01-27 1:34 ` Yang Zhang
2016-01-27 1:51 ` Jike Song
2016-01-26 16:12 ` Alex Williamson
2016-01-26 21:57 ` Tian, Kevin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=569C5071.6080004@intel.com \
--to=jike.song@intel.com \
--cc=alwillia@redhat.com \
--cc=igvt-g@lists.01.org \
--cc=kevin.tian@intel.com \
--cc=kraxel@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=shuai.ruan@intel.com \
--cc=zhiyuan.lv@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).