Discussion of the implementations of VIRTIO specification
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>,
	Rob Miller <rob.miller@broadcom.com>
Cc: Virtio-Dev <virtio-dev@lists.oasis-open.org>
Subject: Re: [virtio-dev] Dirty Page Tracking (DPT)
Date: Mon, 9 Mar 2020 16:50:43 +0800	[thread overview]
Message-ID: <ff63b1e2-e4aa-1b13-0b4a-72fd23badc06@redhat.com> (raw)
In-Reply-To: <20200309030251-mutt-send-email-mst@kernel.org>


On 2020/3/9 下午3:38, Michael S. Tsirkin wrote:
> On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote:
>> I understand that DPT isn't really on the forefront of the vDPA framework, but
>> wanted to understand if there any initial thoughts on how this would work...
> And judging by the next few chapters, you are actually
> talking about vhost pci, right?
>
>> In the migration framework, in its simplest form, (I gather) its QEMU via KVM
>> that is reading the dirty page table, converting bits to page numbers, then
>> flushing remote VM/copying local page(s)->remote VM, ect.
>>
>> While this is fine for a VM (say VM1) dirtying its own memory and the accesses
>> are trapped in the kernel as well as the log is being updated, I'm not sure
>> what happens in the situation of vhost, where a remote VM (say VM2) is dirtying
>> up VM1's memory since it can directly access it, during packet reception for
>> example.
>> Whatever technique is employed to catch this, how would this differ from a HW
>> based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU
>> going to have a 2nd place to query the dirty logs - ie: the vDPA layer?
> I don't think anyone has a good handle at the vhost pci migration yet.
> But I think a reasonable way to handle that would be to
> activate dirty tracking in VM2's QEMU.
>
> And then VM2's QEMU would periodically copy the bits to the log - does
> this sound right?
>
>> Further I heard about a SW based DPT within the vDPA framework for those
>> devices that do not (yet) support DPT inherently in HW. How is this envisioned
>> to work?
> What I am aware of is simply switching to a software virtio
> for the duration of migration. The software can be pretty simple
> since the formats match: just copy available entries to device ring,
> and for used entries, see a used ring entry, mark page
> dirty and then copy used entry to guest ring.


That looks more heavyweight than e.g just relay used ring (as what dpdk 
did) I believe?


>
>
> Another approach that I proposed and was prototyped at some point by
> Alex Duyck is guest driver touching the page in question before
> processing it within guest e.g. by an atomic xor with 0.
> Sounds attractive but didn't perform all that well.


Intel posted i40e software solution that traps queue tail/head write. 
But I'm not sure it's good enough.

https://lore.kernel.org/kvm/20191206082232.GH31791@joy-OptiPlex-7040/


>
>
>> Finally, for those HW vendors that do support DPT in HW, a mapping of a bit ->
>> page isn't really an option, since no one wants to do a byte wide
>> read-modify-write across the PCI bus, but rather  map a whole byte to page is
>> likely more desirable - the HW can just do non-posted writes to the dirty page
>> table. If byte wise, then the QEMU/vDPA layer has to either fix-up the mapping
>> (from byte->bit) or have the capability to handle the granularity diffs.
>>
>> Thoughts?
>>
>> Rob Miller
>> rob.miller@broadcom.com
>> (919)721-3339
> If using an IOMMU, DPT can also be done using either PRI or dirty bit in
> a PTE. PRI is an interrupt so it can kick off a thread to set bits in
> the log I guess, but if it's the dirty bit then I don't think there's an
> interrupt. And a polling thread does not sound attractive.  I guess
> we'll need a new interface to notify VDPA that QEMU is looking for dirty
> logs, and then VDPA can send them to QEMU in some way.  Will probably be
> good enough to support vendor specific logging interfaces, too.  I don't
> actually have hardware which supports either so actually coding it up is
> not yet practical.


Yes, both PRI and PTE dirty bit requires special hardware support. We 
can extend vDPA API to support both. For page fault, probably just a 
IOMMU page fault handler.


>
> Further, at my KVM forum presentaiton I proposed a virtio-specific
> pagefault handling interface.  If there's a wish to standardize and
> implement that, let me know and I will try to write this up in a more
> formal way.


Besides pagefault, if we want virito to be more like vhost, we need also 
formalize the device state feching. E.g per vq index etc.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


  reply	other threads:[~2020-03-09  8:50 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-06 15:40 [virtio-dev] Dirty Page Tracking (DPT) Rob Miller
2020-03-09  7:38 ` Michael S. Tsirkin
2020-03-09  8:50   ` Jason Wang [this message]
2020-03-09 10:13     ` Michael S. Tsirkin
2020-03-10  3:22       ` Jason Wang
2020-03-10  6:24         ` Michael S. Tsirkin
2020-03-10  6:39           ` Jason Wang
2020-03-18 15:13             ` Rob Miller
2020-03-19  3:35               ` Jason Wang
2020-03-19 11:17               ` Paolo Bonzini
2020-04-07  9:52                 ` Eugenio Perez Martin
2020-04-07 10:27                   ` Rob Miller
2020-04-07 16:31                     ` Eugenio Perez Martin
2020-04-08 10:10                       ` Jason Wang
2020-04-07 10:40                   ` Rob Miller
2020-04-08 10:00                     ` Jason Wang
2020-04-09 21:06                   ` Michael S. Tsirkin
2020-04-10  2:40                     ` Jason Wang
2020-04-13 12:15                       ` Eugenio Perez Martin
2020-04-13 13:30                         ` Rob Miller
2020-04-13 13:49                           ` Jason Wang
2020-04-13 13:49                           ` Jason Wang
2020-04-13 13:55                         ` Jason Wang
2020-04-16 10:55                           ` Eugenio Perez Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ff63b1e2-e4aa-1b13-0b4a-72fd23badc06@redhat.com \
    --to=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=rob.miller@broadcom.com \
    --cc=virtio-dev@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox