From: Jike Song <jike.song@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Yang Zhang <yang.zhang.wz@gmail.com>,
"Ruan, Shuai" <shuai.ruan@intel.com>,
"Tian, Kevin" <kevin.tian@intel.com>, Neo Jia <cjia@nvidia.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"igvt-g@lists.01.org" <igvt-g@ml01.01.org>,
qemu-devel <qemu-devel@nongnu.org>,
Gerd Hoffmann <kraxel@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
"Lv, Zhiyuan" <zhiyuan.lv@intel.com>
Subject: Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
Date: Wed, 27 Jan 2016 09:47:25 +0800 [thread overview]
Message-ID: <56A821AD.5090606@intel.com> (raw)
In-Reply-To: <1453848975.18049.7.camel@redhat.com>
On 01/27/2016 06:56 AM, Alex Williamson wrote:
> On Tue, 2016-01-26 at 22:39 +0000, Tian, Kevin wrote:
>>> From: Alex Williamson [mailto:alex.williamson@redhat.com]
>>> Sent: Wednesday, January 27, 2016 6:27 AM
>>>
>>> On Tue, 2016-01-26 at 22:15 +0000, Tian, Kevin wrote:
>>>>> From: Alex Williamson [mailto:alex.williamson@redhat.com]
>>>>> Sent: Wednesday, January 27, 2016 6:08 AM
>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Today KVMGT (not using VFIO yet) registers I/O emulation callbacks to
>>>>>>>> KVM, so VM MMIO access will be forwarded to KVMGT directly for
>>>>>>>> emulation in kernel. If we reuse above R/W flags, the whole emulation
>>>>>>>> path would be unnecessarily long with obvious performance impact. We
>>>>>>>> either need a new flag here to indicate in-kernel emulation (bias from
>>>>>>>> passthrough support), or just hide the region alternatively (let KVMGT
>>>>>>>> to handle I/O emulation itself like today).
>>>>>>>
>>>>>>> That sounds like a future optimization TBH. There's very strict
>>>>>>> layering between vfio and kvm. Physical device assignment could make
>>>>>>> use of it as well, avoiding a round trip through userspace when an
>>>>>>> ioread/write would do. Userspace also needs to orchestrate those kinds
>>>>>>> of accelerators, there might be cases where userspace wants to see those
>>>>>>> transactions for debugging or manipulating the device. We can't simply
>>>>>>> take shortcuts to provide such direct access. Thanks,
>>>>>>>
>>>>>>
>>>>>> But we have to balance such debugging flexibility and acceptable performance.
>>>>>> To me the latter one is more important otherwise there'd be no real usage
>>>>>> around this technique, while for debugging there are other alternative (e.g.
>>>>>> ftrace) Consider some extreme case with 100k traps/second and then see
>>>>>> how much impact a 2-3x longer emulation path can bring...
>>>>>
>>>>> Are you jumping to the conclusion that it cannot be done with proper
>>>>> layering in place? Performance is important, but it's not an excuse to
>>>>> abandon designing interfaces between independent components. Thanks,
>>>>>
>>>>
>>>> Two are not controversial. My point is to remove unnecessary long trip
>>>> as possible. After another thought, yes we can reuse existing read/write
>>>> flags:
>>>> - KVMGT will expose a private control variable whether in-kernel
>>>> delivery is required;
>>>
>>> But in-kernel delivery is never *required*. Wouldn't userspace want to
>>> deliver in-kernel any time it possibly could?
>>>
>>>> - when the variable is true, KVMGT will register in-kernel MMIO
>>>> emulation callbacks then VM MMIO request will be delivered to KVMGT
>>>> directly;
>>>> - when the variable is false, KVMGT will not register anything.
>>>> VM MMIO request will then be delivered to Qemu and then ioread/write
>>>> will be used to finally reach KVMGT emulation logic;
>>>
>>> No, that means the interface is entirely dependent on a backdoor through
>>> KVM. Why can't userspace (QEMU) do something like register an MMIO
>>> region with KVM handled via a provided file descriptor and offset,
>>> couldn't KVM then call the file ops without a kernel exit? Thanks,
>>>
>>
>> Could you elaborate this thought? If it can achieve the purpose w/o
>> a kernel exit definitely we can adapt to it. :-)
>
> I only thought of it when replying to the last email and have been doing
> some research, but we already do quite a bit of synchronization through
> file descriptors. The kvm-vfio pseudo device uses a group file
> descriptor to ensure a user has access to a group, allowing some degree
> of interaction between modules. Eventfds and irqfds already make use of
> f_ops on file descriptors to poke data. So, if KVM had information that
> an MMIO region was backed by a file descriptor for which it already has
> a reference via fdget() (and verified access rights and whatnot), then
> it ought to be a simple matter to get to f_ops->read/write knowing the
> base offset of that MMIO region. Perhaps it could even simply use
> __vfs_read/write(). Then we've got a proper reference to the file
> descriptor for ownership purposes and we've transparently jumped across
> modules without any implicit knowledge of the other end. Could it work?
This is OK for KVMGT, from fops to vgpu device-model would always be simple.
The only question is, how is KVM hypervisor supposed to get the fd on VM-exitings?
copy-and-paste the current implementation of vcpu_mmio_write(), seems
nothing but GPA and len are provided:
static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
const void *v)
{
int handled = 0;
int n;
do {
n = min(len, 8);
if (!(vcpu->arch.apic &&
!kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, addr, n, v))
&& kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, n, v))
break;
handled += n;
addr += n;
len -= n;
v += n;
} while (len);
return handled;
}
If we back a GPA range with a fd, this will also be a 'backdoor'?
> Thanks,
>
> Alex
>
--
Thanks,
Jike
next prev parent reply other threads:[~2016-01-27 1:47 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-18 2:39 [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) Jike Song
2016-01-18 4:47 ` Alex Williamson
2016-01-18 8:56 ` Jike Song
2016-01-18 19:05 ` Alex Williamson
2016-01-20 8:59 ` Jike Song
2016-01-20 9:05 ` Tian, Kevin
2016-01-25 11:34 ` Jike Song
2016-01-25 21:30 ` Alex Williamson
2016-01-25 21:45 ` Tian, Kevin
2016-01-25 21:48 ` Tian, Kevin
2016-01-26 9:48 ` Neo Jia
2016-01-26 10:20 ` Neo Jia
2016-01-26 19:24 ` Tian, Kevin
2016-01-26 19:29 ` Neo Jia
2016-01-26 20:06 ` Alex Williamson
2016-01-26 21:38 ` Tian, Kevin
2016-01-26 22:28 ` Neo Jia
2016-01-26 23:30 ` Alex Williamson
2016-01-27 9:14 ` Neo Jia
2016-01-27 16:10 ` Alex Williamson
2016-01-27 21:48 ` Neo Jia
2016-01-27 8:06 ` Kirti Wankhede
2016-01-27 16:00 ` Alex Williamson
2016-01-27 20:55 ` Kirti Wankhede
2016-01-27 21:58 ` Alex Williamson
2016-01-28 3:01 ` Kirti Wankhede
2016-01-26 7:41 ` Jike Song
2016-01-26 14:05 ` Yang Zhang
2016-01-26 16:37 ` Alex Williamson
2016-01-26 21:21 ` Tian, Kevin
2016-01-26 21:30 ` Neo Jia
2016-01-26 21:43 ` Tian, Kevin
2016-01-26 21:43 ` Alex Williamson
2016-01-26 21:50 ` Tian, Kevin
2016-01-26 22:07 ` Alex Williamson
2016-01-26 22:15 ` Tian, Kevin
2016-01-26 22:27 ` Alex Williamson
2016-01-26 22:39 ` Tian, Kevin
2016-01-26 22:56 ` Alex Williamson
2016-01-27 1:47 ` Jike Song [this message]
2016-01-27 3:07 ` Alex Williamson
2016-01-27 5:43 ` Jike Song
2016-01-27 16:19 ` Alex Williamson
2016-01-28 6:00 ` Jike Song
2016-01-28 15:23 ` Alex Williamson
2016-01-29 7:20 ` Jike Song
2016-01-29 8:49 ` [Qemu-devel] [iGVT-g] " Jike Song
2016-01-29 18:50 ` Alex Williamson
2016-02-01 13:10 ` Gerd Hoffmann
2016-02-01 21:44 ` Alex Williamson
2016-02-02 7:28 ` Gerd Hoffmann
2016-02-02 7:35 ` Zhiyuan Lv
2016-01-27 1:52 ` [Qemu-devel] " Yang Zhang
2016-01-27 3:37 ` Alex Williamson
2016-01-27 0:06 ` Jike Song
2016-01-27 1:34 ` Yang Zhang
2016-01-27 1:51 ` Jike Song
2016-01-26 16:12 ` Alex Williamson
2016-01-26 21:57 ` Tian, Kevin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56A821AD.5090606@intel.com \
--to=jike.song@intel.com \
--cc=alex.williamson@redhat.com \
--cc=cjia@nvidia.com \
--cc=igvt-g@ml01.01.org \
--cc=kevin.tian@intel.com \
--cc=kraxel@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=shuai.ruan@intel.com \
--cc=yang.zhang.wz@gmail.com \
--cc=zhiyuan.lv@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).