Re: [Qemu-devel] vhost, iova, and dirty page tracking

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Jason Wang <jasowang@redhat.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	'Alex Williamson' <alex.williamson@redhat.com>,
	Peter Xu <peterx@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] vhost, iova, and dirty page tracking
Date: Thu, 19 Sep 2019 18:08:35 +0800	[thread overview]
Message-ID: <a0c642ea-e388-0f74-bde6-1bce9832dc40@redhat.com> (raw)
In-Reply-To: <20190919093606.GE18391@joy-OptiPlex-7040>


On 2019/9/19 下午5:36, Yan Zhao wrote:
> On Thu, Sep 19, 2019 at 05:35:05PM +0800, Jason Wang wrote:
>> On 2019/9/19 下午2:32, Yan Zhao wrote:
>>> On Thu, Sep 19, 2019 at 02:29:54PM +0800, Yan Zhao wrote:
>>>> On Thu, Sep 19, 2019 at 02:32:03PM +0800, Jason Wang wrote:
>>>>> On 2019/9/19 下午2:17, Yan Zhao wrote:
>>>>>> On Thu, Sep 19, 2019 at 02:09:53PM +0800, Jason Wang wrote:
>>>>>>> On 2019/9/19 下午1:28, Yan Zhao wrote:
>>>>>>>> On Thu, Sep 19, 2019 at 09:05:12AM +0800, Jason Wang wrote:
>>>>>>>>> On 2019/9/18 下午4:37, Tian, Kevin wrote:
>>>>>>>>>>> From: Jason Wang [mailto:jasowang@redhat.com]
>>>>>>>>>>> Sent: Wednesday, September 18, 2019 2:10 PM
>>>>>>>>>>>
>>>>>>>>>>>>> Note that the HVA to GPA mapping is not an 1:1 mapping. One HVA
>>>>>>>>>>> range
>>>>>>>>>>>>> could be mapped to several GPA ranges.
>>>>>>>>>>>> This is fine. Currently vfio_dma maintains IOVA->HVA mapping.
>>>>>>>>>>>>
>>>>>>>>>>>> btw under what condition HVA->GPA is not 1:1 mapping? I didn't realize it.
>>>>>>>>>>> I don't remember the details e.g memory region alias? And neither kvm
>>>>>>>>>>> nor kvm API does forbid this if my memory is correct.
>>>>>>>>>>>
>>>>>>>>>> I checked https://qemu.weilnetz.de/doc/devel/memory.html, which
>>>>>>>>>> provides an example of aliased layout. However, its aliasing is all
>>>>>>>>>> 1:1, instead of N:1. From guest p.o.v every writable GPA implies an
>>>>>>>>>> unique location. Why would we hit the situation where multiple
>>>>>>>>>> write-able GPAs are mapped to the same HVA (i.e. same physical
>>>>>>>>>> memory location)?
>>>>>>>>> I don't know, just want to say current API does not forbid this. So we
>>>>>>>>> probably need to take care it.
>>>>>>>>>
>>>>>>>> yes, in KVM API level, it does not forbid two slots to have the same HVA(slot->userspace_addr).
>>>>>>>> But
>>>>>>>> (1) there's only one kvm instance for each vm for each qemu process.
>>>>>>>> (2) all ramblock->host (corresponds to HVA and slot->userspace_addr) in one qemu
>>>>>>>> process is non-overlapping as it's obtained from mmmap().
>>>>>>>> (3) qemu ensures two kvm slots will not point to the same section of one ramblock.
>>>>>>>>
>>>>>>>> So, as long as kvm instance is not shared in two processes, and
>>>>>>>> there's no bug in qemu, we can assure that HVA to GPA is 1:1.
>>>>>>> Well, you leave this API for userspace, so you can't assume qemu is the
>>>>>>> only user or any its behavior. If you had you should limit it in the API
>>>>>>> level instead of open window for them.
>>>>>>>
>>>>>>>
>>>>>>>> But even if there are two processes operating on the same kvm instance
>>>>>>>> and manipulating on memory slots, adding an extra GPA along side current
>>>>>>>> IOVA & HVA to ioctl VFIO_IOMMU_MAP_DMA can still let driver knows the
>>>>>>>> right IOVA->GPA mapping, right?
>>>>>>> It looks fragile. Consider HVA was mapped to both GPA1 and GPA2. Guest
>>>>>>> maps IOVA to GPA2, so we have IOVA GPA2 HVA in the new ioctl and then
>>>>>>> log through GPA2. If userspace is trying to sync through GPA1, it will
>>>>>>> miss the dirty page. So for safety we need log both GPA1 and GPA2. (See
>>>>>>> what has been done in log_write_hva() in vhost.c). The only way to do
>>>>>>> that is to maintain an independent HVA to GPA mapping like what KVM or
>>>>>>> vhost did.
>>>>>>>
>>>>>> why GPA1 and GPA2 should be both dirty?
>>>>>> even they have the same HVA due to overlaping virtual address space in
>>>>>> two processes, they still correspond to two physical pages.
>>>>>> don't get what's your meaning :)
>>>>> The point is not leave any corner case that is hard to debug or fix in
>>>>> the future.
>>>>>
>>>>> Let's just start by a single process, the API allows userspace to maps
>>>>> HVA to both GPA1 and GPA2. Since it knows GPA1 and GPA2 are equivalent,
>>>>> it's ok to sync just through GPA1. That means if you only log GPA2, it
>>>>> won't work.
>>>>>
>>>> In that case, cannot log dirty according to HPA.
>>> sorry, it should be "cannot log dirty according to HVA".
>>
>> I think we are discussing the choice between GPA and IOVA, not HVA?
>>
> Right. so why do we need to care about HVA to GPA mapping?
> as long as IOVA to GPA is 1:1, then it's fine.


The problem is (whether) userspace can try to sync from GPA2 whose HVA 
is the same as GPA1.

Maintainers are copied by Kevin, hope it can help to clarify things.

Thanks


> Thanks
> Yan
>
>> Thanks
>>
>>
>>>> because kvm cannot tell whether it's an valid case (the two GPAs are equivalent)
>>>> or an invalid case (the two GPAs are not equivalent, but with the same
>>>> HVA value).
>>>>
>>>> Right?
>>>>
>>>> Thanks
>>>> Yan
>>>>
>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>> Thanks
>>>>>> Yan
>>>>>>
>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Yan
>>>>>>>>
>>>>>>>>>> Is Qemu doing its own same-content memory
>>>>>>>>>> merging in GPA level, similar to KSM?
>>>>>>>>> AFAIK, it doesn't.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Kevin

next prev parent reply	other threads:[~2019-09-19 10:21 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-16  1:51 [Qemu-devel] vhost, iova, and dirty page tracking Tian, Kevin
2019-09-16  8:33 ` Jason Wang
2019-09-17  8:48   ` Tian, Kevin
2019-09-17 10:36     ` Jason Wang
2019-09-18  1:44       ` Tian, Kevin
2019-09-18  6:10         ` Jason Wang
2019-09-18  7:41           ` Tian, Kevin
2019-09-18  8:37           ` Tian, Kevin
2019-09-19  1:05             ` Jason Wang
2019-09-19  5:28               ` Yan Zhao
2019-09-19  6:09                 ` Jason Wang
2019-09-19  6:17                   ` Yan Zhao
2019-09-19  6:32                     ` Jason Wang
2019-09-19  6:29                       ` Yan Zhao
2019-09-19  6:32                         ` Yan Zhao
2019-09-19  9:35                           ` Jason Wang
2019-09-19  9:36                             ` Yan Zhao
2019-09-19 10:08                               ` Jason Wang [this message]
2019-09-19 10:06                         ` Jason Wang
2019-09-19 10:16                           ` Yan Zhao
2019-09-19 12:14                             ` Jason Wang
2019-09-19  7:16                       ` Tian, Kevin
2019-09-19  9:37                         ` Jason Wang
2019-09-19 14:06                           ` Michael S. Tsirkin
2019-09-20  1:15                             ` Jason Wang
2019-09-20 10:02                               ` Michael S. Tsirkin
2019-09-19 11:14                         ` Paolo Bonzini
2019-09-19 12:39                           ` Jason Wang
2019-09-19 12:45                             ` Paolo Bonzini
2019-09-19 22:54                           ` Tian, Kevin
2019-09-20  1:18                             ` Jason Wang
2019-09-24  2:02                               ` Tian, Kevin
2019-09-25  3:46                                 ` Jason Wang
2019-09-17 14:54     ` Alex Williamson
2019-09-18  1:31       ` Tian, Kevin
2019-09-18  6:03         ` Jason Wang
2019-09-18  7:21           ` Tian, Kevin
2019-09-19 17:20             ` Alex Williamson
2019-09-19 22:40               ` Tian, Kevin
     [not found]       ` <AADFC41AFE54684AB9EE6CBC0274A5D19D57AFB7@SHSMSX104.ccr.corp.intel.com>
2019-09-18  2:15         ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a0c642ea-e388-0f74-bde6-1bce9832dc40@redhat.com \
    --to=jasowang@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=kevin.tian@intel.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).