VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jike Song <jike.song@intel.com>
To: Alex Williamson <alwillia@redhat.com>
Cc: "Ruan, Shuai" <shuai.ruan@intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	kvm@vger.kernel.org, qemu-devel <qemu-devel@nongnu.org>,
	"igvt-g@lists.01.org" <igvt-g@lists.01.org>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Zhiyuan Lv <zhiyuan.lv@intel.com>
Subject: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
Date: Mon, 18 Jan 2016 10:39:45 +0800	[thread overview]
Message-ID: <569C5071.6080004@intel.com> (raw)

Hi Alex, let's continue with a new thread :)

Basically we agree with you: exposing vGPU via VFIO can make
QEMU share as much code as possible with pcidev(PF or VF) assignment.
And yes, different vGPU vendors can share quite a lot of the
QEMU part, which will do good for upper layers such as libvirt.


To achieve this, there are quite a lot to do, I'll summarize
it below. I dived into VFIO for a while but still may have
things misunderstood, so please correct me :)



First, let me illustrate my understanding of current VFIO
framework used to pass through a pcidev to guest:


                 +----------------------------------+
                 |            vfio qemu             |
                 +-----+------------------------+---+
                       |DMA                  ^  |CFG
QEMU                   |map               IRQ|  |
-----------------------|---------------------|--|-----------
KERNEL    +------------|---------------------|--|----------+
          | VFIO       |                     |  |          |
          |            v                     |  v          |
          |  +-------------------+     +-----+-----------+ |
IOMMU     |  | vfio iommu driver |     | vfio bus driver | |
API  <-------+                   |     |                 | |
Layer     |  | e.g. type1        |     | e.g. vfio_pci   | |
          |  +-------------------+     +-----------------+ |
          +------------------------------------------------+


Here when a particular pcidev is passed-through to a KVM guest,
it is attached to vfio_pci driver in host, and guest memory
is mapped into IOMMU via the type1 iommu driver.


Then, the draft infrastructure of future VFIO-based vgpu:



                 +-------------------------------------+
                 |              vfio qemu              |
                 +----+-------------------------+------+
                      |DMA                   ^  |CFG
QEMU                  |map                IRQ|  |
----------------------|----------------------|--|-----------
KERNEL                |                      |  |
         +------------|----------------------|--|----------+
         |VFIO        |                      |  |          |
         |            v                      |  v          |
         | +--------------------+      +-----+-----------+ |
DMA      | | vfio iommu driver  |      | vfio bus driver | |
API <------+                    |      |                 | |
Layer    | |  e.g. vfio_type2   |      |  e.g. vfio_vgpu | |
         | +--------------------+      +-----------------+ |
         |         |  ^                      |  ^          |
         +---------|--|----------------------|--|----------+
                   |  |                      |  |
                   |  |                      v  |
         +---------|--|----------+   +---------------------+
         | +-------v-----------+ |   |                     |
         | |                   | |   |                     |
         | |      KVMGT        | |   |                     |
         | |                   | |   |   host gfx driver   |
         | +-------------------+ |   |                     |
         |                       |   |                     |
         |    KVM hypervisor     |   |                     |
         +-----------------------+   +---------------------+

        NOTE    vfio_type2 and vfio_vgpu are only *logically* parts
                of VFIO, they may be implemented in KVM hypervisor
                or host gfx driver.



Here we need to implement a new vfio IOMMU driver instead of type1,
let's call it vfio_type2 temporarily. The main difference from pcidev
assignment is, vGPU doesn't have its own DMA requester id, so it has
to share mappings with host and other vGPUs.

        - type1 iommu driver maps gpa to hpa for passing through;
          whereas type2 maps iova to hpa;

        - hardware iommu is always needed by type1, whereas for
          type2, hardware iommu is optional;

        - type1 will invoke low-level IOMMU API (iommu_map et al) to
          setup IOMMU page table directly, whereas type2 dosen't (only
          need to invoke higher level DMA API like dma_map_page);


We also need to implement a new 'bus' driver instead of vfio_pci,
let's call it vfio_vgpu temporarily:

        - vfio_pci is a real pci driver, it has a probe method called
          during dev attaching; whereas the vfio_vgpu is a pseudo
          driver, it won't attach any devivce - the GPU is always owned by
          host gfx driver. It has to do 'probing' elsewhere, but
          still in host gfx driver attached to the device;

        - pcidev(PF or VF) attached to vfio_pci has a natural path
          in sysfs; whereas vgpu is purely a software concept:
          vfio_vgpu needs to create create/destory vgpu instances,
          maintain their paths in sysfs (e.g. "/sys/class/vgpu/intel/vgpu0")
          etc. There should be something added in a higher layer
          to do this (VFIO or DRM).

        - vfio_pci in most case will allow QEMU to access pcidev
          hardware; whereas vfio_vgpu is to access virtual resource
          emulated by another device model;

        - vfio_pci will inject an IRQ to guest only when physical IRQ
          generated; whereas vfio_vgpu may inject an IRQ for emulation
          purpose. Anyway they can share the same injection interface;


Questions:

        [1] For VFIO No-IOMMU mode (!iommu_present), I saw it was reverted
            in upstream ae5515d66362(Revert: "vfio: Include No-IOMMU mode").
            In my opinion, vfio_type2 doesn't rely on it to support No-IOMMU
            case, instead it needs a new implementation which fits both
            w/ and w/o IOMMU. Is this correct?


For things not mentioned above, we might have them discussed in
other threads, or temporarily maintained in a TODO list (we might get
back to them after the big picture get agreed):


        - How to expose guest framebuffer via VFIO for SPICE;

        - How to avoid double translation with two-stage: GTT + IOMMU,
          whether identity map is possible, and if yes, how to make it
          more effectively;

        - Application acceleration
          You mentioned that with VFIO, a vGPU may be used by
          applications to get GPU acceleration. It's a potential
          opportunity to use vGPU for container usage, worthy of
          further investigation.





--
Thanks,
Jike

WARNING: multiple messages have this Message-ID (diff)

From: Jike Song <jike.song@intel.com>
To: Alex Williamson <alwillia@redhat.com>
Cc: "Ruan, Shuai" <shuai.ruan@intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	kvm@vger.kernel.org, qemu-devel <qemu-devel@nongnu.org>,
	"igvt-g@lists.01.org" <igvt-g@lists.01.org>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Zhiyuan Lv <zhiyuan.lv@intel.com>
Subject: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
Date: Mon, 18 Jan 2016 10:39:45 +0800	[thread overview]
Message-ID: <569C5071.6080004@intel.com> (raw)

Hi Alex, let's continue with a new thread :)

Basically we agree with you: exposing vGPU via VFIO can make
QEMU share as much code as possible with pcidev(PF or VF) assignment.
And yes, different vGPU vendors can share quite a lot of the
QEMU part, which will do good for upper layers such as libvirt.


To achieve this, there are quite a lot to do, I'll summarize
it below. I dived into VFIO for a while but still may have
things misunderstood, so please correct me :)



First, let me illustrate my understanding of current VFIO
framework used to pass through a pcidev to guest:


                 +----------------------------------+
                 |            vfio qemu             |
                 +-----+------------------------+---+
                       |DMA                  ^  |CFG
QEMU                   |map               IRQ|  |
-----------------------|---------------------|--|-----------
KERNEL    +------------|---------------------|--|----------+
          | VFIO       |                     |  |          |
          |            v                     |  v          |
          |  +-------------------+     +-----+-----------+ |
IOMMU     |  | vfio iommu driver |     | vfio bus driver | |
API  <-------+                   |     |                 | |
Layer     |  | e.g. type1        |     | e.g. vfio_pci   | |
          |  +-------------------+     +-----------------+ |
          +------------------------------------------------+


Here when a particular pcidev is passed-through to a KVM guest,
it is attached to vfio_pci driver in host, and guest memory
is mapped into IOMMU via the type1 iommu driver.


Then, the draft infrastructure of future VFIO-based vgpu:



                 +-------------------------------------+
                 |              vfio qemu              |
                 +----+-------------------------+------+
                      |DMA                   ^  |CFG
QEMU                  |map                IRQ|  |
----------------------|----------------------|--|-----------
KERNEL                |                      |  |
         +------------|----------------------|--|----------+
         |VFIO        |                      |  |          |
         |            v                      |  v          |
         | +--------------------+      +-----+-----------+ |
DMA      | | vfio iommu driver  |      | vfio bus driver | |
API <------+                    |      |                 | |
Layer    | |  e.g. vfio_type2   |      |  e.g. vfio_vgpu | |
         | +--------------------+      +-----------------+ |
         |         |  ^                      |  ^          |
         +---------|--|----------------------|--|----------+
                   |  |                      |  |
                   |  |                      v  |
         +---------|--|----------+   +---------------------+
         | +-------v-----------+ |   |                     |
         | |                   | |   |                     |
         | |      KVMGT        | |   |                     |
         | |                   | |   |   host gfx driver   |
         | +-------------------+ |   |                     |
         |                       |   |                     |
         |    KVM hypervisor     |   |                     |
         +-----------------------+   +---------------------+

        NOTE    vfio_type2 and vfio_vgpu are only *logically* parts
                of VFIO, they may be implemented in KVM hypervisor
                or host gfx driver.



Here we need to implement a new vfio IOMMU driver instead of type1,
let's call it vfio_type2 temporarily. The main difference from pcidev
assignment is, vGPU doesn't have its own DMA requester id, so it has
to share mappings with host and other vGPUs.

        - type1 iommu driver maps gpa to hpa for passing through;
          whereas type2 maps iova to hpa;

        - hardware iommu is always needed by type1, whereas for
          type2, hardware iommu is optional;

        - type1 will invoke low-level IOMMU API (iommu_map et al) to
          setup IOMMU page table directly, whereas type2 dosen't (only
          need to invoke higher level DMA API like dma_map_page);


We also need to implement a new 'bus' driver instead of vfio_pci,
let's call it vfio_vgpu temporarily:

        - vfio_pci is a real pci driver, it has a probe method called
          during dev attaching; whereas the vfio_vgpu is a pseudo
          driver, it won't attach any devivce - the GPU is always owned by
          host gfx driver. It has to do 'probing' elsewhere, but
          still in host gfx driver attached to the device;

        - pcidev(PF or VF) attached to vfio_pci has a natural path
          in sysfs; whereas vgpu is purely a software concept:
          vfio_vgpu needs to create create/destory vgpu instances,
          maintain their paths in sysfs (e.g. "/sys/class/vgpu/intel/vgpu0")
          etc. There should be something added in a higher layer
          to do this (VFIO or DRM).

        - vfio_pci in most case will allow QEMU to access pcidev
          hardware; whereas vfio_vgpu is to access virtual resource
          emulated by another device model;

        - vfio_pci will inject an IRQ to guest only when physical IRQ
          generated; whereas vfio_vgpu may inject an IRQ for emulation
          purpose. Anyway they can share the same injection interface;


Questions:

        [1] For VFIO No-IOMMU mode (!iommu_present), I saw it was reverted
            in upstream ae5515d66362(Revert: "vfio: Include No-IOMMU mode").
            In my opinion, vfio_type2 doesn't rely on it to support No-IOMMU
            case, instead it needs a new implementation which fits both
            w/ and w/o IOMMU. Is this correct?


For things not mentioned above, we might have them discussed in
other threads, or temporarily maintained in a TODO list (we might get
back to them after the big picture get agreed):


        - How to expose guest framebuffer via VFIO for SPICE;

        - How to avoid double translation with two-stage: GTT + IOMMU,
          whether identity map is possible, and if yes, how to make it
          more effectively;

        - Application acceleration
          You mentioned that with VFIO, a vGPU may be used by
          applications to get GPU acceleration. It's a potential
          opportunity to use vGPU for container usage, worthy of
          further investigation.





--
Thanks,
Jike

next             reply	other threads:[~2016-01-18  2:39 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-18  2:39 Jike Song [this message]
2016-01-18  2:39 ` [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) Jike Song
2016-01-18  4:47 ` Alex Williamson
2016-01-18  4:47   ` [Qemu-devel] " Alex Williamson
2016-01-18  8:56   ` Jike Song
2016-01-18  8:56     ` [Qemu-devel] " Jike Song
2016-01-18 19:05     ` Alex Williamson
2016-01-18 19:05       ` [Qemu-devel] " Alex Williamson
2016-01-20  8:59       ` Jike Song
2016-01-20  8:59         ` [Qemu-devel] " Jike Song
2016-01-20  9:05         ` Tian, Kevin
2016-01-20  9:05           ` [Qemu-devel] " Tian, Kevin
2016-01-25 11:34           ` Jike Song
2016-01-25 11:34             ` [Qemu-devel] " Jike Song
2016-01-25 21:30             ` Alex Williamson
2016-01-25 21:30               ` [Qemu-devel] " Alex Williamson
2016-01-25 21:45               ` Tian, Kevin
2016-01-25 21:45                 ` [Qemu-devel] " Tian, Kevin
2016-01-25 21:48                 ` Tian, Kevin
2016-01-25 21:48                   ` [Qemu-devel] " Tian, Kevin
2016-01-26  9:48                 ` Neo Jia
2016-01-26  9:48                   ` [Qemu-devel] " Neo Jia
2016-01-26 10:20                 ` Neo Jia
2016-01-26 10:20                   ` [Qemu-devel] " Neo Jia
2016-01-26 19:24                   ` Tian, Kevin
2016-01-26 19:24                     ` [Qemu-devel] " Tian, Kevin
2016-01-26 19:29                     ` Neo Jia
2016-01-26 19:29                       ` [Qemu-devel] " Neo Jia
2016-01-26 20:06                   ` Alex Williamson
2016-01-26 20:06                     ` [Qemu-devel] " Alex Williamson
2016-01-26 21:38                     ` Tian, Kevin
2016-01-26 21:38                       ` [Qemu-devel] " Tian, Kevin
2016-01-26 22:28                     ` Neo Jia
2016-01-26 22:28                       ` [Qemu-devel] " Neo Jia
2016-01-26 23:30                       ` Alex Williamson
2016-01-26 23:30                         ` [Qemu-devel] " Alex Williamson
2016-01-27  9:14                         ` Neo Jia
2016-01-27  9:14                           ` [Qemu-devel] " Neo Jia
2016-01-27 16:10                           ` Alex Williamson
2016-01-27 16:10                             ` [Qemu-devel] " Alex Williamson
2016-01-27 21:48                             ` Neo Jia
2016-01-27 21:48                               ` [Qemu-devel] " Neo Jia
2016-01-27  8:06                     ` Kirti Wankhede
2016-01-27  8:06                       ` [Qemu-devel] " Kirti Wankhede
2016-01-27 16:00                       ` Alex Williamson
2016-01-27 16:00                         ` [Qemu-devel] " Alex Williamson
2016-01-27 20:55                         ` Kirti Wankhede
2016-01-27 20:55                           ` [Qemu-devel] " Kirti Wankhede
2016-01-27 21:58                           ` Alex Williamson
2016-01-27 21:58                             ` [Qemu-devel] " Alex Williamson
2016-01-28  3:01                             ` Kirti Wankhede
2016-01-28  3:01                               ` [Qemu-devel] " Kirti Wankhede
2016-01-26  7:41               ` Jike Song
2016-01-26  7:41                 ` [Qemu-devel] " Jike Song
2016-01-26 14:05                 ` Yang Zhang
2016-01-26 14:05                   ` [Qemu-devel] " Yang Zhang
2016-01-26 16:37                   ` Alex Williamson
2016-01-26 16:37                     ` [Qemu-devel] " Alex Williamson
2016-01-26 21:21                     ` Tian, Kevin
2016-01-26 21:21                       ` [Qemu-devel] " Tian, Kevin
2016-01-26 21:30                       ` Neo Jia
2016-01-26 21:30                         ` [Qemu-devel] " Neo Jia
2016-01-26 21:43                         ` Tian, Kevin
2016-01-26 21:43                           ` [Qemu-devel] " Tian, Kevin
2016-01-26 21:43                       ` Alex Williamson
2016-01-26 21:43                         ` [Qemu-devel] " Alex Williamson
2016-01-26 21:50                         ` Tian, Kevin
2016-01-26 21:50                           ` [Qemu-devel] " Tian, Kevin
2016-01-26 22:07                           ` Alex Williamson
2016-01-26 22:07                             ` [Qemu-devel] " Alex Williamson
2016-01-26 22:15                             ` Tian, Kevin
2016-01-26 22:15                               ` [Qemu-devel] " Tian, Kevin
2016-01-26 22:27                               ` Alex Williamson
2016-01-26 22:27                                 ` [Qemu-devel] " Alex Williamson
2016-01-26 22:39                                 ` Tian, Kevin
2016-01-26 22:39                                   ` [Qemu-devel] " Tian, Kevin
2016-01-26 22:56                                   ` Alex Williamson
2016-01-26 22:56                                     ` [Qemu-devel] " Alex Williamson
2016-01-27  1:47                                     ` Jike Song
2016-01-27  1:47                                       ` [Qemu-devel] " Jike Song
2016-01-27  3:07                                       ` Alex Williamson
2016-01-27  3:07                                         ` [Qemu-devel] " Alex Williamson
2016-01-27  5:43                                         ` Jike Song
2016-01-27  5:43                                           ` [Qemu-devel] " Jike Song
2016-01-27 16:19                                           ` Alex Williamson
2016-01-27 16:19                                             ` [Qemu-devel] " Alex Williamson
2016-01-28  6:00                                             ` Jike Song
2016-01-28  6:00                                               ` [Qemu-devel] " Jike Song
2016-01-28 15:23                                               ` Alex Williamson
2016-01-28 15:23                                                 ` [Qemu-devel] " Alex Williamson
2016-01-29  7:20                                                 ` Jike Song
2016-01-29  7:20                                                   ` [Qemu-devel] " Jike Song
2016-01-29  8:49                                                   ` [iGVT-g] " Jike Song
2016-01-29  8:49                                                     ` [Qemu-devel] " Jike Song
2016-01-29 18:50                                                     ` Alex Williamson
2016-01-29 18:50                                                       ` [Qemu-devel] " Alex Williamson
2016-02-01 13:10                                                       ` Gerd Hoffmann
2016-02-01 13:10                                                         ` [Qemu-devel] " Gerd Hoffmann
2016-02-01 21:44                                                         ` Alex Williamson
2016-02-01 21:44                                                           ` [Qemu-devel] " Alex Williamson
2016-02-02  7:28                                                           ` Gerd Hoffmann
2016-02-02  7:28                                                             ` [Qemu-devel] " Gerd Hoffmann
2016-02-02  7:35                                                           ` Zhiyuan Lv
2016-02-02  7:35                                                             ` [Qemu-devel] " Zhiyuan Lv
2016-01-27  1:52                                     ` Yang Zhang
2016-01-27  1:52                                       ` [Qemu-devel] " Yang Zhang
2016-01-27  3:37                                       ` Alex Williamson
2016-01-27  3:37                                         ` [Qemu-devel] " Alex Williamson
2016-01-27  0:06                   ` Jike Song
2016-01-27  0:06                     ` [Qemu-devel] " Jike Song
2016-01-27  1:34                     ` Yang Zhang
2016-01-27  1:34                       ` [Qemu-devel] " Yang Zhang
2016-01-27  1:51                       ` Jike Song
2016-01-27  1:51                         ` [Qemu-devel] " Jike Song
2016-01-26 16:12                 ` Alex Williamson
2016-01-26 16:12                   ` [Qemu-devel] " Alex Williamson
2016-01-26 21:57                   ` Tian, Kevin
2016-01-26 21:57                     ` [Qemu-devel] " Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=569C5071.6080004@intel.com \
    --to=jike.song@intel.com \
    --cc=alwillia@redhat.com \
    --cc=igvt-g@lists.01.org \
    --cc=kevin.tian@intel.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuai.ruan@intel.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.