From: Xiao Guangrong <guangrong.xiao@linux.intel.com>
To: Neo Jia <cjia@nvidia.com>
Cc: "Paolo Bonzini" <pbonzini@redhat.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
"Kirti Wankhede" <kwankhede@nvidia.com>,
"Andrea Arcangeli" <aarcange@redhat.com>,
"Radim Krčmář" <rkrcmar@redhat.com>
Subject: Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed
Date: Wed, 6 Jul 2016 10:22:59 +0800 [thread overview]
Message-ID: <577C6B83.3020303@linux.intel.com> (raw)
In-Reply-To: <20160705150713.GC31814@nvidia.com>
On 07/05/2016 11:07 PM, Neo Jia wrote:
> On Tue, Jul 05, 2016 at 05:02:46PM +0800, Xiao Guangrong wrote:
>>
>>>
>>> It is physically contiguous but it is done during the runtime, physically contiguous doesn't mean
>>> static partition at boot time. And only during runtime, the proper HW resource will be requested therefore
>>> the right portion of MMIO region will be granted by the mediated device driver on the host.
>>
>> Okay. This is your implantation design rather than the hardware limitation, right?
>
> I don't think it matters here. We are talking about framework so it should
> provide the flexibility for different driver vendor.
It really matters. It is the reason why we design the framework like this and
we need to make sure whether we have a better design to fill the requirements.
>
>>
>> For example, if the instance require 512M memory (the size can be specified by QEMU
>> command line), it can tell its requirement to the mediated device driver via create()
>> interface, then the driver can allocate then memory for this instance before it is running.
>
> BAR != your device memory
>
> We don't set the BAR size via QEMU command line, BAR size is extracted by QEMU
> from config space provided by vendor driver.
>
Anyway, the BAR size have a way to configure, e.g, specify the size as a parameter when
you create a mdev via sysfs.
>>
>> Theoretically, the hardware is able to do memory management as this style, but for some
>> reasons you choose allocating memory in the runtime. right? If my understanding is right,
>> could you please tell us what benefit you want to get from this running-allocation style?
>
> Your understanding is incorrect.
Then WHY?
>
>>
>>>
>>> Then the req_size and pgoff will both come from the mediated device driver based on his internal book
>>> keeping of the hw resource allocation, which is only available during runtime. And such book keeping
>>> can be built part of para-virtualization scheme between guest and host device driver.
>>>
>>
>> I am talking the parameters you passed to validate_map_request(). req_size is calculated like this:
>>
>> + offset = virtaddr - vma->vm_start;
>> + phyaddr = (vma->vm_pgoff << PAGE_SHIFT) + offset;
>> + pgoff = phyaddr >> PAGE_SHIFT;
>>
>> All these info is from vma which is available in mmmap().
>>
>> pgoff is got from:
>> + pg_prot = vma->vm_page_prot;
>> that is also available in mmap().
>
> This is kept there in case the validate_map_request() is not provided by vendor
> driver then by default assume 1:1 mapping. So if validate_map_request() is not
> provided, fault handler should not fail.
THESE are the parameters you passed to validate_map_request(), and these info is
available in mmap(), it really does not matter if you move validate_map_request()
to mmap(). That's what i want to say.
>
>>
>>> None of such information is available at VFIO mmap() time. For example, several VMs
>>> are sharing the same physical device to provide mediated access. All VMs will
>>> call the VFIO mmap() on their virtual BAR as part of QEMU vfio/pci initialization
>>> process, at that moment, we definitely can't mmap the entire physical MMIO
>>> into both VM blindly for obvious reason.
>>>
>>
>> mmap() carries @length information, so you only need to allocate the specified size
>> (corresponding to @length) of memory for them.
>
> Again, you still look at this as a static partition at QEMU configuration time
> where the guest mmio will be mapped as a whole at some offset of the physical
> mmio region. (You still can do that like I said above by not providing
> validate_map_request() in your vendor driver.)
>
Then you can move validate_map_request() to here to achieve custom allocation-policy.
> But this is not the framework we are defining here.
>
> The framework we have here is to provide the driver vendor flexibility to decide
> the guest mmio and physical mmio mapping on page basis, and such information is
> available during runtime.
>
> How such information gets communicated between guest and host driver is up to
> driver vendor.
The problems is the sequence of the way "provide the driver vendor
flexibility to decide the guest mmio and physical mmio mapping on page basis"
and mmap().
We should provide such allocation info first then do mmap(). You current design,
do mmap() -> communication telling such info -> use such info when fault happens,
is really BAD, because you can not control the time when memory fault will happen.
The guest may access this memory before the communication you mentioned above,
and another reason is that KVM MMU can prefetch memory at any time.
next prev parent reply other threads:[~2016-07-06 2:26 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-30 13:01 [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed Paolo Bonzini
2016-06-30 13:01 ` [PATCH 1/2] KVM: MMU: prepare to support mapping of VM_IO and VM_PFNMAP frames Paolo Bonzini
2016-06-30 13:01 ` [PATCH 2/2] KVM: MMU: try to fix up page faults before giving up Paolo Bonzini
2016-06-30 21:59 ` [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed Neo Jia
2016-07-04 6:39 ` Xiao Guangrong
2016-07-04 7:03 ` Neo Jia
2016-07-04 7:37 ` Xiao Guangrong
2016-07-04 7:48 ` Paolo Bonzini
2016-07-04 7:59 ` Xiao Guangrong
2016-07-04 8:14 ` Paolo Bonzini
2016-07-04 8:21 ` Xiao Guangrong
2016-07-04 8:48 ` Paolo Bonzini
2016-07-04 7:53 ` Neo Jia
2016-07-04 8:19 ` Xiao Guangrong
2016-07-04 8:41 ` Neo Jia
2016-07-04 8:45 ` Xiao Guangrong
2016-07-04 8:54 ` Xiao Guangrong
2016-07-04 9:16 ` Neo Jia
2016-07-04 10:16 ` Xiao Guangrong
2016-07-04 15:33 ` Neo Jia
2016-07-05 1:19 ` Xiao Guangrong
2016-07-05 1:35 ` Neo Jia
2016-07-05 4:02 ` Xiao Guangrong
2016-07-05 5:16 ` Neo Jia
2016-07-05 6:26 ` Xiao Guangrong
2016-07-05 7:30 ` Neo Jia
2016-07-05 9:02 ` Xiao Guangrong
2016-07-05 15:07 ` Neo Jia
2016-07-06 2:22 ` Xiao Guangrong [this message]
2016-07-06 4:01 ` Neo Jia
2016-07-04 7:38 ` Paolo Bonzini
2016-07-04 7:40 ` Xiao Guangrong
2016-07-05 5:41 ` Neo Jia
2016-07-05 12:18 ` Paolo Bonzini
2016-07-05 14:02 ` Neo Jia
2016-07-06 2:00 ` Xiao Guangrong
2016-07-06 2:18 ` Neo Jia
2016-07-06 2:35 ` Xiao Guangrong
2016-07-06 2:57 ` Neo Jia
2016-07-06 4:02 ` Xiao Guangrong
2016-07-06 11:48 ` Paolo Bonzini
2016-07-07 2:36 ` Xiao Guangrong
2016-07-06 6:05 ` Paolo Bonzini
2016-07-06 15:50 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=577C6B83.3020303@linux.intel.com \
--to=guangrong.xiao@linux.intel.com \
--cc=aarcange@redhat.com \
--cc=cjia@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).