VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jike Song <jike.song@intel.com>
To: Alex Williamson <alwillia@redhat.com>
Cc: "Ruan, Shuai" <shuai.ruan@intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	kvm@vger.kernel.org, qemu-devel <qemu-devel@nongnu.org>,
	"igvt-g@lists.01.org" <igvt-g@lists.01.org>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Zhiyuan Lv <zhiyuan.lv@intel.com>
Subject: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
Date: Mon, 18 Jan 2016 10:39:45 +0800	[thread overview]
Message-ID: <569C5071.6080004@intel.com> (raw)

Hi Alex, let's continue with a new thread :)

Basically we agree with you: exposing vGPU via VFIO can make
QEMU share as much code as possible with pcidev(PF or VF) assignment.
And yes, different vGPU vendors can share quite a lot of the
QEMU part, which will do good for upper layers such as libvirt.


To achieve this, there are quite a lot to do, I'll summarize
it below. I dived into VFIO for a while but still may have
things misunderstood, so please correct me :)



First, let me illustrate my understanding of current VFIO
framework used to pass through a pcidev to guest:


                 +----------------------------------+
                 |            vfio qemu             |
                 +-----+------------------------+---+
                       |DMA                  ^  |CFG
QEMU                   |map               IRQ|  |
-----------------------|---------------------|--|-----------
KERNEL    +------------|---------------------|--|----------+
          | VFIO       |                     |  |          |
          |            v                     |  v          |
          |  +-------------------+     +-----+-----------+ |
IOMMU     |  | vfio iommu driver |     | vfio bus driver | |
API  <-------+                   |     |                 | |
Layer     |  | e.g. type1        |     | e.g. vfio_pci   | |
          |  +-------------------+     +-----------------+ |
          +------------------------------------------------+


Here when a particular pcidev is passed-through to a KVM guest,
it is attached to vfio_pci driver in host, and guest memory
is mapped into IOMMU via the type1 iommu driver.


Then, the draft infrastructure of future VFIO-based vgpu:



                 +-------------------------------------+
                 |              vfio qemu              |
                 +----+-------------------------+------+
                      |DMA                   ^  |CFG
QEMU                  |map                IRQ|  |
----------------------|----------------------|--|-----------
KERNEL                |                      |  |
         +------------|----------------------|--|----------+
         |VFIO        |                      |  |          |
         |            v                      |  v          |
         | +--------------------+      +-----+-----------+ |
DMA      | | vfio iommu driver  |      | vfio bus driver | |
API <------+                    |      |                 | |
Layer    | |  e.g. vfio_type2   |      |  e.g. vfio_vgpu | |
         | +--------------------+      +-----------------+ |
         |         |  ^                      |  ^          |
         +---------|--|----------------------|--|----------+
                   |  |                      |  |
                   |  |                      v  |
         +---------|--|----------+   +---------------------+
         | +-------v-----------+ |   |                     |
         | |                   | |   |                     |
         | |      KVMGT        | |   |                     |
         | |                   | |   |   host gfx driver   |
         | +-------------------+ |   |                     |
         |                       |   |                     |
         |    KVM hypervisor     |   |                     |
         +-----------------------+   +---------------------+

        NOTE    vfio_type2 and vfio_vgpu are only *logically* parts
                of VFIO, they may be implemented in KVM hypervisor
                or host gfx driver.



Here we need to implement a new vfio IOMMU driver instead of type1,
let's call it vfio_type2 temporarily. The main difference from pcidev
assignment is, vGPU doesn't have its own DMA requester id, so it has
to share mappings with host and other vGPUs.

        - type1 iommu driver maps gpa to hpa for passing through;
          whereas type2 maps iova to hpa;

        - hardware iommu is always needed by type1, whereas for
          type2, hardware iommu is optional;

        - type1 will invoke low-level IOMMU API (iommu_map et al) to
          setup IOMMU page table directly, whereas type2 dosen't (only
          need to invoke higher level DMA API like dma_map_page);


We also need to implement a new 'bus' driver instead of vfio_pci,
let's call it vfio_vgpu temporarily:

        - vfio_pci is a real pci driver, it has a probe method called
          during dev attaching; whereas the vfio_vgpu is a pseudo
          driver, it won't attach any devivce - the GPU is always owned by
          host gfx driver. It has to do 'probing' elsewhere, but
          still in host gfx driver attached to the device;

        - pcidev(PF or VF) attached to vfio_pci has a natural path
          in sysfs; whereas vgpu is purely a software concept:
          vfio_vgpu needs to create create/destory vgpu instances,
          maintain their paths in sysfs (e.g. "/sys/class/vgpu/intel/vgpu0")
          etc. There should be something added in a higher layer
          to do this (VFIO or DRM).

        - vfio_pci in most case will allow QEMU to access pcidev
          hardware; whereas vfio_vgpu is to access virtual resource
          emulated by another device model;

        - vfio_pci will inject an IRQ to guest only when physical IRQ
          generated; whereas vfio_vgpu may inject an IRQ for emulation
          purpose. Anyway they can share the same injection interface;


Questions:

        [1] For VFIO No-IOMMU mode (!iommu_present), I saw it was reverted
            in upstream ae5515d66362(Revert: "vfio: Include No-IOMMU mode").
            In my opinion, vfio_type2 doesn't rely on it to support No-IOMMU
            case, instead it needs a new implementation which fits both
            w/ and w/o IOMMU. Is this correct?


For things not mentioned above, we might have them discussed in
other threads, or temporarily maintained in a TODO list (we might get
back to them after the big picture get agreed):


        - How to expose guest framebuffer via VFIO for SPICE;

        - How to avoid double translation with two-stage: GTT + IOMMU,
          whether identity map is possible, and if yes, how to make it
          more effectively;

        - Application acceleration
          You mentioned that with VFIO, a vGPU may be used by
          applications to get GPU acceleration. It's a potential
          opportunity to use vGPU for container usage, worthy of
          further investigation.





--
Thanks,
Jike

next             reply	other threads:[~2016-01-18  2:39 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-18  2:39 Jike Song [this message]
2016-01-18  4:47 ` VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) Alex Williamson
2016-01-18  8:56   ` Jike Song
2016-01-18 19:05     ` Alex Williamson
2016-01-20  8:59       ` Jike Song
2016-01-20  9:05         ` Tian, Kevin
2016-01-25 11:34           ` Jike Song
2016-01-25 21:30             ` Alex Williamson
2016-01-25 21:45               ` Tian, Kevin
2016-01-25 21:48                 ` Tian, Kevin
2016-01-26  9:48                 ` Neo Jia
2016-01-26 10:20                 ` Neo Jia
2016-01-26 19:24                   ` Tian, Kevin
2016-01-26 19:29                     ` Neo Jia
2016-01-26 20:06                   ` Alex Williamson
2016-01-26 21:38                     ` Tian, Kevin
2016-01-26 22:28                     ` Neo Jia
2016-01-26 23:30                       ` Alex Williamson
2016-01-27  9:14                         ` Neo Jia
2016-01-27 16:10                           ` Alex Williamson
2016-01-27 21:48                             ` Neo Jia
2016-01-27  8:06                     ` Kirti Wankhede
2016-01-27 16:00                       ` Alex Williamson
2016-01-27 20:55                         ` Kirti Wankhede
2016-01-27 21:58                           ` Alex Williamson
2016-01-28  3:01                             ` Kirti Wankhede
2016-01-26  7:41               ` Jike Song
2016-01-26 14:05                 ` Yang Zhang
2016-01-26 16:37                   ` Alex Williamson
2016-01-26 21:21                     ` Tian, Kevin
2016-01-26 21:30                       ` Neo Jia
2016-01-26 21:43                         ` Tian, Kevin
2016-01-26 21:43                       ` Alex Williamson
2016-01-26 21:50                         ` Tian, Kevin
2016-01-26 22:07                           ` Alex Williamson
2016-01-26 22:15                             ` Tian, Kevin
2016-01-26 22:27                               ` Alex Williamson
2016-01-26 22:39                                 ` Tian, Kevin
2016-01-26 22:56                                   ` Alex Williamson
2016-01-27  1:47                                     ` Jike Song
2016-01-27  3:07                                       ` Alex Williamson
2016-01-27  5:43                                         ` Jike Song
2016-01-27 16:19                                           ` Alex Williamson
2016-01-28  6:00                                             ` Jike Song
2016-01-28 15:23                                               ` Alex Williamson
2016-01-29  7:20                                                 ` Jike Song
2016-01-29  8:49                                                   ` [iGVT-g] " Jike Song
2016-01-29 18:50                                                     ` Alex Williamson
2016-02-01 13:10                                                       ` Gerd Hoffmann
2016-02-01 21:44                                                         ` Alex Williamson
2016-02-02  7:28                                                           ` Gerd Hoffmann
2016-02-02  7:35                                                           ` Zhiyuan Lv
2016-01-27  1:52                                     ` Yang Zhang
2016-01-27  3:37                                       ` Alex Williamson
2016-01-27  0:06                   ` Jike Song
2016-01-27  1:34                     ` Yang Zhang
2016-01-27  1:51                       ` Jike Song
2016-01-26 16:12                 ` Alex Williamson
2016-01-26 21:57                   ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=569C5071.6080004@intel.com \
    --to=jike.song@intel.com \
    --cc=alwillia@redhat.com \
    --cc=igvt-g@lists.01.org \
    --cc=kevin.tian@intel.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuai.ruan@intel.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).