[virtio-dev] Dirty Page Tracking (DPT)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [virtio-dev] Dirty Page Tracking (DPT)
@ 2020-03-06 15:40 Rob Miller
  2020-03-09  7:38 ` Michael S. Tsirkin
  0 siblings, 1 reply; 24+ messages in thread
From: Rob Miller @ 2020-03-06 15:40 UTC (permalink / raw)
  To: Virtio-Dev

[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]

I understand that DPT isn't really on the forefront of the vDPA framework,
but wanted to understand if there any initial thoughts on how this would
work...

In the migration framework, in its simplest form, (I gather) its QEMU via
KVM that is reading the dirty page table, converting bits to page numbers,
then flushing remote VM/copying local page(s)->remote VM, ect.

While this is fine for a VM (say VM1) dirtying its own memory and the
accesses are trapped in the kernel as well as the log is being updated, I'm
not sure what happens in the situation of vhost, where a remote VM (say
VM2) is dirtying up VM1's memory since it can directly access it, during
packet reception for example.

Whatever technique is employed to catch this, how would this differ from a
HW based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is
QEMU going to have a 2nd place to query the dirty logs - ie: the vDPA layer?

Further I heard about a SW based DPT within the vDPA framework for those
devices that do not (yet) support DPT inherently in HW. How is this
envisioned to work?

Finally, for those HW vendors that do support DPT in HW, a mapping of a bit
-> page isn't really an option, since no one wants to do a byte wide
read-modify-write across the PCI bus, but rather  map a whole byte to page
is likely more desirable - the HW can just do non-posted writes to the
dirty page table. If byte wise, then the QEMU/vDPA layer has to either
fix-up the mapping (from byte->bit) or have the capability to handle the
granularity diffs.

Thoughts?

Rob Miller
rob.miller@broadcom.com
(919)721-3339

[-- Attachment #2: Type: text/html, Size: 2012 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-03-06 15:40 [virtio-dev] Dirty Page Tracking (DPT) Rob Miller
@ 2020-03-09  7:38 ` Michael S. Tsirkin
  2020-03-09  8:50   ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2020-03-09  7:38 UTC (permalink / raw)
  To: Rob Miller; +Cc: Virtio-Dev, Jason Wang

On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote:
> I understand that DPT isn't really on the forefront of the vDPA framework, but
> wanted to understand if there any initial thoughts on how this would work...

And judging by the next few chapters, you are actually
talking about vhost pci, right?

> In the migration framework, in its simplest form, (I gather) its QEMU via KVM
> that is reading the dirty page table, converting bits to page numbers, then
> flushing remote VM/copying local page(s)->remote VM, ect. 
> 
> While this is fine for a VM (say VM1) dirtying its own memory and the accesses
> are trapped in the kernel as well as the log is being updated, I'm not sure
> what happens in the situation of vhost, where a remote VM (say VM2) is dirtying
> up VM1's memory since it can directly access it, during packet reception for
> example.
> Whatever technique is employed to catch this, how would this differ from a HW
> based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU
> going to have a 2nd place to query the dirty logs - ie: the vDPA layer?

I don't think anyone has a good handle at the vhost pci migration yet.
But I think a reasonable way to handle that would be to
activate dirty tracking in VM2's QEMU.

And then VM2's QEMU would periodically copy the bits to the log - does
this sound right?

> Further I heard about a SW based DPT within the vDPA framework for those
> devices that do not (yet) support DPT inherently in HW. How is this envisioned
> to work?

What I am aware of is simply switching to a software virtio
for the duration of migration. The software can be pretty simple
since the formats match: just copy available entries to device ring,
and for used entries, see a used ring entry, mark page
dirty and then copy used entry to guest ring.

Another approach that I proposed and was prototyped at some point by
Alex Duyck is guest driver touching the page in question before
processing it within guest e.g. by an atomic xor with 0.
Sounds attractive but didn't perform all that well.

> Finally, for those HW vendors that do support DPT in HW, a mapping of a bit ->
> page isn't really an option, since no one wants to do a byte wide
> read-modify-write across the PCI bus, but rather  map a whole byte to page is
> likely more desirable - the HW can just do non-posted writes to the dirty page
> table. If byte wise, then the QEMU/vDPA layer has to either fix-up the mapping
> (from byte->bit) or have the capability to handle the granularity diffs.
> 
> Thoughts?
> 
> Rob Miller
> rob.miller@broadcom.com
> (919)721-3339

If using an IOMMU, DPT can also be done using either PRI or dirty bit in
a PTE. PRI is an interrupt so it can kick off a thread to set bits in
the log I guess, but if it's the dirty bit then I don't think there's an
interrupt. And a polling thread does not sound attractive.  I guess
we'll need a new interface to notify VDPA that QEMU is looking for dirty
logs, and then VDPA can send them to QEMU in some way.  Will probably be
good enough to support vendor specific logging interfaces, too.  I don't
actually have hardware which supports either so actually coding it up is
not yet practical.

Further, at my KVM forum presentaiton I proposed a virtio-specific
pagefault handling interface.  If there's a wish to standardize and
implement that, let me know and I will try to write this up in a more
formal way.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-03-09  7:38 ` Michael S. Tsirkin
@ 2020-03-09  8:50   ` Jason Wang
  2020-03-09 10:13     ` Michael S. Tsirkin
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2020-03-09  8:50 UTC (permalink / raw)
  To: Michael S. Tsirkin, Rob Miller; +Cc: Virtio-Dev


On 2020/3/9 下午3:38, Michael S. Tsirkin wrote:
> On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote:
>> I understand that DPT isn't really on the forefront of the vDPA framework, but
>> wanted to understand if there any initial thoughts on how this would work...
> And judging by the next few chapters, you are actually
> talking about vhost pci, right?
>
>> In the migration framework, in its simplest form, (I gather) its QEMU via KVM
>> that is reading the dirty page table, converting bits to page numbers, then
>> flushing remote VM/copying local page(s)->remote VM, ect.
>>
>> While this is fine for a VM (say VM1) dirtying its own memory and the accesses
>> are trapped in the kernel as well as the log is being updated, I'm not sure
>> what happens in the situation of vhost, where a remote VM (say VM2) is dirtying
>> up VM1's memory since it can directly access it, during packet reception for
>> example.
>> Whatever technique is employed to catch this, how would this differ from a HW
>> based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU
>> going to have a 2nd place to query the dirty logs - ie: the vDPA layer?
> I don't think anyone has a good handle at the vhost pci migration yet.
> But I think a reasonable way to handle that would be to
> activate dirty tracking in VM2's QEMU.
>
> And then VM2's QEMU would periodically copy the bits to the log - does
> this sound right?
>
>> Further I heard about a SW based DPT within the vDPA framework for those
>> devices that do not (yet) support DPT inherently in HW. How is this envisioned
>> to work?
> What I am aware of is simply switching to a software virtio
> for the duration of migration. The software can be pretty simple
> since the formats match: just copy available entries to device ring,
> and for used entries, see a used ring entry, mark page
> dirty and then copy used entry to guest ring.


That looks more heavyweight than e.g just relay used ring (as what dpdk 
did) I believe?


>
>
> Another approach that I proposed and was prototyped at some point by
> Alex Duyck is guest driver touching the page in question before
> processing it within guest e.g. by an atomic xor with 0.
> Sounds attractive but didn't perform all that well.


Intel posted i40e software solution that traps queue tail/head write. 
But I'm not sure it's good enough.

https://lore.kernel.org/kvm/20191206082232.GH31791@joy-OptiPlex-7040/


>
>
>> Finally, for those HW vendors that do support DPT in HW, a mapping of a bit ->
>> page isn't really an option, since no one wants to do a byte wide
>> read-modify-write across the PCI bus, but rather  map a whole byte to page is
>> likely more desirable - the HW can just do non-posted writes to the dirty page
>> table. If byte wise, then the QEMU/vDPA layer has to either fix-up the mapping
>> (from byte->bit) or have the capability to handle the granularity diffs.
>>
>> Thoughts?
>>
>> Rob Miller
>> rob.miller@broadcom.com
>> (919)721-3339
> If using an IOMMU, DPT can also be done using either PRI or dirty bit in
> a PTE. PRI is an interrupt so it can kick off a thread to set bits in
> the log I guess, but if it's the dirty bit then I don't think there's an
> interrupt. And a polling thread does not sound attractive.  I guess
> we'll need a new interface to notify VDPA that QEMU is looking for dirty
> logs, and then VDPA can send them to QEMU in some way.  Will probably be
> good enough to support vendor specific logging interfaces, too.  I don't
> actually have hardware which supports either so actually coding it up is
> not yet practical.


Yes, both PRI and PTE dirty bit requires special hardware support. We 
can extend vDPA API to support both. For page fault, probably just a 
IOMMU page fault handler.


>
> Further, at my KVM forum presentaiton I proposed a virtio-specific
> pagefault handling interface.  If there's a wish to standardize and
> implement that, let me know and I will try to write this up in a more
> formal way.


Besides pagefault, if we want virito to be more like vhost, we need also 
formalize the device state feching. E.g per vq index etc.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-03-09  8:50   ` Jason Wang
@ 2020-03-09 10:13     ` Michael S. Tsirkin
  2020-03-10  3:22       ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2020-03-09 10:13 UTC (permalink / raw)
  To: Jason Wang; +Cc: Rob Miller, Virtio-Dev

On Mon, Mar 09, 2020 at 04:50:43PM +0800, Jason Wang wrote:
> 
> On 2020/3/9 下午3:38, Michael S. Tsirkin wrote:
> > On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote:
> > > I understand that DPT isn't really on the forefront of the vDPA framework, but
> > > wanted to understand if there any initial thoughts on how this would work...
> > And judging by the next few chapters, you are actually
> > talking about vhost pci, right?
> > 
> > > In the migration framework, in its simplest form, (I gather) its QEMU via KVM
> > > that is reading the dirty page table, converting bits to page numbers, then
> > > flushing remote VM/copying local page(s)->remote VM, ect.
> > > 
> > > While this is fine for a VM (say VM1) dirtying its own memory and the accesses
> > > are trapped in the kernel as well as the log is being updated, I'm not sure
> > > what happens in the situation of vhost, where a remote VM (say VM2) is dirtying
> > > up VM1's memory since it can directly access it, during packet reception for
> > > example.
> > > Whatever technique is employed to catch this, how would this differ from a HW
> > > based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU
> > > going to have a 2nd place to query the dirty logs - ie: the vDPA layer?
> > I don't think anyone has a good handle at the vhost pci migration yet.
> > But I think a reasonable way to handle that would be to
> > activate dirty tracking in VM2's QEMU.
> > 
> > And then VM2's QEMU would periodically copy the bits to the log - does
> > this sound right?
> > 
> > > Further I heard about a SW based DPT within the vDPA framework for those
> > > devices that do not (yet) support DPT inherently in HW. How is this envisioned
> > > to work?
> > What I am aware of is simply switching to a software virtio
> > for the duration of migration. The software can be pretty simple
> > since the formats match: just copy available entries to device ring,
> > and for used entries, see a used ring entry, mark page
> > dirty and then copy used entry to guest ring.
> 
> 
> That looks more heavyweight than e.g just relay used ring (as what dpdk did)
> I believe?

That works for used but not for the packed ring.

> 
> > 
> > 
> > Another approach that I proposed and was prototyped at some point by
> > Alex Duyck is guest driver touching the page in question before
> > processing it within guest e.g. by an atomic xor with 0.
> > Sounds attractive but didn't perform all that well.
> 
> 
> Intel posted i40e software solution that traps queue tail/head write. But
> I'm not sure it's good enough.
> 
> https://lore.kernel.org/kvm/20191206082232.GH31791@joy-OptiPlex-7040/


DMA unmap time seems more generic to me. But again I suspect
the main issue is the same - it's handled on the data path
blocking packet RX until dirty tracking is handled.

Hardware solutions by comparison queue writes and make
progress, dirty page is handled by the migration CPU.


> 
> > 
> > 
> > > Finally, for those HW vendors that do support DPT in HW, a mapping of a bit ->
> > > page isn't really an option, since no one wants to do a byte wide
> > > read-modify-write across the PCI bus, but rather  map a whole byte to page is
> > > likely more desirable - the HW can just do non-posted writes to the dirty page
> > > table. If byte wise, then the QEMU/vDPA layer has to either fix-up the mapping
> > > (from byte->bit) or have the capability to handle the granularity diffs.
> > > 
> > > Thoughts?
> > > 
> > > Rob Miller
> > > rob.miller@broadcom.com
> > > (919)721-3339
> > If using an IOMMU, DPT can also be done using either PRI or dirty bit in
> > a PTE. PRI is an interrupt so it can kick off a thread to set bits in
> > the log I guess, but if it's the dirty bit then I don't think there's an
> > interrupt. And a polling thread does not sound attractive.  I guess
> > we'll need a new interface to notify VDPA that QEMU is looking for dirty
> > logs, and then VDPA can send them to QEMU in some way.  Will probably be
> > good enough to support vendor specific logging interfaces, too.  I don't
> > actually have hardware which supports either so actually coding it up is
> > not yet practical.
> 
> 
> Yes, both PRI and PTE dirty bit requires special hardware support. We can
> extend vDPA API to support both. For page fault, probably just a IOMMU page
> fault handler.
> 
> 
> > 
> > Further, at my KVM forum presentaiton I proposed a virtio-specific
> > pagefault handling interface.  If there's a wish to standardize and
> > implement that, let me know and I will try to write this up in a more
> > formal way.
> 
> 
> Besides pagefault, if we want virito to be more like vhost, we need also
> formalize the device state feching. E.g per vq index etc.
> 
> Thanks

Yes that would clearly be in-scope for the spec.   I would not start
with a guest/host interface even.  I would start by just listing what
the state that needs to be migrated is, for each device. And it would
also be useful to list, for each device, how to make two devices
compatible migration wise.  We can do that in a non-normative section.
Again the big blocker here is lack of manpower.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-03-09 10:13     ` Michael S. Tsirkin
@ 2020-03-10  3:22       ` Jason Wang
  2020-03-10  6:24         ` Michael S. Tsirkin
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2020-03-10  3:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Rob Miller, Virtio-Dev


On 2020/3/9 下午6:13, Michael S. Tsirkin wrote:
> On Mon, Mar 09, 2020 at 04:50:43PM +0800, Jason Wang wrote:
>> On 2020/3/9 下午3:38, Michael S. Tsirkin wrote:
>>> On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote:
>>>> I understand that DPT isn't really on the forefront of the vDPA framework, but
>>>> wanted to understand if there any initial thoughts on how this would work...
>>> And judging by the next few chapters, you are actually
>>> talking about vhost pci, right?
>>>
>>>> In the migration framework, in its simplest form, (I gather) its QEMU via KVM
>>>> that is reading the dirty page table, converting bits to page numbers, then
>>>> flushing remote VM/copying local page(s)->remote VM, ect.
>>>>
>>>> While this is fine for a VM (say VM1) dirtying its own memory and the accesses
>>>> are trapped in the kernel as well as the log is being updated, I'm not sure
>>>> what happens in the situation of vhost, where a remote VM (say VM2) is dirtying
>>>> up VM1's memory since it can directly access it, during packet reception for
>>>> example.
>>>> Whatever technique is employed to catch this, how would this differ from a HW
>>>> based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU
>>>> going to have a 2nd place to query the dirty logs - ie: the vDPA layer?
>>> I don't think anyone has a good handle at the vhost pci migration yet.
>>> But I think a reasonable way to handle that would be to
>>> activate dirty tracking in VM2's QEMU.
>>>
>>> And then VM2's QEMU would periodically copy the bits to the log - does
>>> this sound right?
>>>
>>>> Further I heard about a SW based DPT within the vDPA framework for those
>>>> devices that do not (yet) support DPT inherently in HW. How is this envisioned
>>>> to work?
>>> What I am aware of is simply switching to a software virtio
>>> for the duration of migration. The software can be pretty simple
>>> since the formats match: just copy available entries to device ring,
>>> and for used entries, see a used ring entry, mark page
>>> dirty and then copy used entry to guest ring.
>>
>> That looks more heavyweight than e.g just relay used ring (as what dpdk did)
>> I believe?
> That works for used but not for the packed ring.


For packed ring, we can relay the descriptor ring?


>
>>>
>>> Another approach that I proposed and was prototyped at some point by
>>> Alex Duyck is guest driver touching the page in question before
>>> processing it within guest e.g. by an atomic xor with 0.
>>> Sounds attractive but didn't perform all that well.
>>
>> Intel posted i40e software solution that traps queue tail/head write. But
>> I'm not sure it's good enough.
>>
>> https://lore.kernel.org/kvm/20191206082232.GH31791@joy-OptiPlex-7040/
>
> DMA unmap time seems more generic to me. But again I suspect
> the main issue is the same - it's handled on the data path
> blocking packet RX until dirty tracking is handled.
>
> Hardware solutions by comparison queue writes and make
> progress, dirty page is handled by the migration CPU.
>
>
>>>
>>>> Finally, for those HW vendors that do support DPT in HW, a mapping of a bit ->
>>>> page isn't really an option, since no one wants to do a byte wide
>>>> read-modify-write across the PCI bus, but rather  map a whole byte to page is
>>>> likely more desirable - the HW can just do non-posted writes to the dirty page
>>>> table. If byte wise, then the QEMU/vDPA layer has to either fix-up the mapping
>>>> (from byte->bit) or have the capability to handle the granularity diffs.
>>>>
>>>> Thoughts?
>>>>
>>>> Rob Miller
>>>> rob.miller@broadcom.com
>>>> (919)721-3339
>>> If using an IOMMU, DPT can also be done using either PRI or dirty bit in
>>> a PTE. PRI is an interrupt so it can kick off a thread to set bits in
>>> the log I guess, but if it's the dirty bit then I don't think there's an
>>> interrupt. And a polling thread does not sound attractive.  I guess
>>> we'll need a new interface to notify VDPA that QEMU is looking for dirty
>>> logs, and then VDPA can send them to QEMU in some way.  Will probably be
>>> good enough to support vendor specific logging interfaces, too.  I don't
>>> actually have hardware which supports either so actually coding it up is
>>> not yet practical.
>>
>> Yes, both PRI and PTE dirty bit requires special hardware support. We can
>> extend vDPA API to support both. For page fault, probably just a IOMMU page
>> fault handler.
>>
>>
>>> Further, at my KVM forum presentaiton I proposed a virtio-specific
>>> pagefault handling interface.  If there's a wish to standardize and
>>> implement that, let me know and I will try to write this up in a more
>>> formal way.
>>
>> Besides pagefault, if we want virito to be more like vhost, we need also
>> formalize the device state feching. E.g per vq index etc.
>>
>> Thanks
> Yes that would clearly be in-scope for the spec.   I would not start
> with a guest/host interface even.  I would start by just listing what
> the state that needs to be migrated is, for each device. And it would
> also be useful to list, for each device, how to make two devices
> compatible migration wise.  We can do that in a non-normative section.
> Again the big blocker here is lack of manpower.


Yes.

Thanks

>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-03-10  3:22       ` Jason Wang
@ 2020-03-10  6:24         ` Michael S. Tsirkin
  2020-03-10  6:39           ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2020-03-10  6:24 UTC (permalink / raw)
  To: Jason Wang; +Cc: Rob Miller, Virtio-Dev

On Tue, Mar 10, 2020 at 11:22:00AM +0800, Jason Wang wrote:
> 
> On 2020/3/9 下午6:13, Michael S. Tsirkin wrote:
> > On Mon, Mar 09, 2020 at 04:50:43PM +0800, Jason Wang wrote:
> > > On 2020/3/9 下午3:38, Michael S. Tsirkin wrote:
> > > > On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote:
> > > > > I understand that DPT isn't really on the forefront of the vDPA framework, but
> > > > > wanted to understand if there any initial thoughts on how this would work...
> > > > And judging by the next few chapters, you are actually
> > > > talking about vhost pci, right?
> > > > 
> > > > > In the migration framework, in its simplest form, (I gather) its QEMU via KVM
> > > > > that is reading the dirty page table, converting bits to page numbers, then
> > > > > flushing remote VM/copying local page(s)->remote VM, ect.
> > > > > 
> > > > > While this is fine for a VM (say VM1) dirtying its own memory and the accesses
> > > > > are trapped in the kernel as well as the log is being updated, I'm not sure
> > > > > what happens in the situation of vhost, where a remote VM (say VM2) is dirtying
> > > > > up VM1's memory since it can directly access it, during packet reception for
> > > > > example.
> > > > > Whatever technique is employed to catch this, how would this differ from a HW
> > > > > based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU
> > > > > going to have a 2nd place to query the dirty logs - ie: the vDPA layer?
> > > > I don't think anyone has a good handle at the vhost pci migration yet.
> > > > But I think a reasonable way to handle that would be to
> > > > activate dirty tracking in VM2's QEMU.
> > > > 
> > > > And then VM2's QEMU would periodically copy the bits to the log - does
> > > > this sound right?
> > > > 
> > > > > Further I heard about a SW based DPT within the vDPA framework for those
> > > > > devices that do not (yet) support DPT inherently in HW. How is this envisioned
> > > > > to work?
> > > > What I am aware of is simply switching to a software virtio
> > > > for the duration of migration. The software can be pretty simple
> > > > since the formats match: just copy available entries to device ring,
> > > > and for used entries, see a used ring entry, mark page
> > > > dirty and then copy used entry to guest ring.
> > > 
> > > That looks more heavyweight than e.g just relay used ring (as what dpdk did)
> > > I believe?
> > That works for used but not for the packed ring.
> 
> 
> For packed ring, we can relay the descriptor ring?

Yes, and thus one must relay both available and used descriptors.


It's an interesting tradeoff. Packed ring at least was not designed
with multiple actors in mind.

If this becomes a thing (and that's a big if) it might make sense to
support temporarily reporting used entries in a separate buffer, while
migration is in progress.  Also if doing this, it looks like we can then
support used ring resize too, and thus it might also make sense to use
this to support sharing a used ring between multiple available rings -
this way a single CPU can handle multiple used rings efficiently.



> 
> > 
> > > > 
> > > > Another approach that I proposed and was prototyped at some point by
> > > > Alex Duyck is guest driver touching the page in question before
> > > > processing it within guest e.g. by an atomic xor with 0.
> > > > Sounds attractive but didn't perform all that well.
> > > 
> > > Intel posted i40e software solution that traps queue tail/head write. But
> > > I'm not sure it's good enough.
> > > 
> > > https://lore.kernel.org/kvm/20191206082232.GH31791@joy-OptiPlex-7040/
> > 
> > DMA unmap time seems more generic to me. But again I suspect
> > the main issue is the same - it's handled on the data path
> > blocking packet RX until dirty tracking is handled.
> > 
> > Hardware solutions by comparison queue writes and make
> > progress, dirty page is handled by the migration CPU.
> > 
> > 
> > > > 
> > > > > Finally, for those HW vendors that do support DPT in HW, a mapping of a bit ->
> > > > > page isn't really an option, since no one wants to do a byte wide
> > > > > read-modify-write across the PCI bus, but rather  map a whole byte to page is
> > > > > likely more desirable - the HW can just do non-posted writes to the dirty page
> > > > > table. If byte wise, then the QEMU/vDPA layer has to either fix-up the mapping
> > > > > (from byte->bit) or have the capability to handle the granularity diffs.
> > > > > 
> > > > > Thoughts?
> > > > > 
> > > > > Rob Miller
> > > > > rob.miller@broadcom.com
> > > > > (919)721-3339
> > > > If using an IOMMU, DPT can also be done using either PRI or dirty bit in
> > > > a PTE. PRI is an interrupt so it can kick off a thread to set bits in
> > > > the log I guess, but if it's the dirty bit then I don't think there's an
> > > > interrupt. And a polling thread does not sound attractive.  I guess
> > > > we'll need a new interface to notify VDPA that QEMU is looking for dirty
> > > > logs, and then VDPA can send them to QEMU in some way.  Will probably be
> > > > good enough to support vendor specific logging interfaces, too.  I don't
> > > > actually have hardware which supports either so actually coding it up is
> > > > not yet practical.
> > > 
> > > Yes, both PRI and PTE dirty bit requires special hardware support. We can
> > > extend vDPA API to support both. For page fault, probably just a IOMMU page
> > > fault handler.
> > > 
> > > 
> > > > Further, at my KVM forum presentaiton I proposed a virtio-specific
> > > > pagefault handling interface.  If there's a wish to standardize and
> > > > implement that, let me know and I will try to write this up in a more
> > > > formal way.
> > > 
> > > Besides pagefault, if we want virito to be more like vhost, we need also
> > > formalize the device state feching. E.g per vq index etc.
> > > 
> > > Thanks
> > Yes that would clearly be in-scope for the spec.   I would not start
> > with a guest/host interface even.  I would start by just listing what
> > the state that needs to be migrated is, for each device. And it would
> > also be useful to list, for each device, how to make two devices
> > compatible migration wise.  We can do that in a non-normative section.
> > Again the big blocker here is lack of manpower.
> 
> 
> Yes.
> 
> Thanks
> 
> > 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-03-10  6:24         ` Michael S. Tsirkin
@ 2020-03-10  6:39           ` Jason Wang
  2020-03-18 15:13             ` Rob Miller
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2020-03-10  6:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Rob Miller, Virtio-Dev


On 2020/3/10 下午2:24, Michael S. Tsirkin wrote:
> On Tue, Mar 10, 2020 at 11:22:00AM +0800, Jason Wang wrote:
>> On 2020/3/9 下午6:13, Michael S. Tsirkin wrote:
>>> On Mon, Mar 09, 2020 at 04:50:43PM +0800, Jason Wang wrote:
>>>> On 2020/3/9 下午3:38, Michael S. Tsirkin wrote:
>>>>> On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote:
>>>>>> I understand that DPT isn't really on the forefront of the vDPA framework, but
>>>>>> wanted to understand if there any initial thoughts on how this would work...
>>>>> And judging by the next few chapters, you are actually
>>>>> talking about vhost pci, right?
>>>>>
>>>>>> In the migration framework, in its simplest form, (I gather) its QEMU via KVM
>>>>>> that is reading the dirty page table, converting bits to page numbers, then
>>>>>> flushing remote VM/copying local page(s)->remote VM, ect.
>>>>>>
>>>>>> While this is fine for a VM (say VM1) dirtying its own memory and the accesses
>>>>>> are trapped in the kernel as well as the log is being updated, I'm not sure
>>>>>> what happens in the situation of vhost, where a remote VM (say VM2) is dirtying
>>>>>> up VM1's memory since it can directly access it, during packet reception for
>>>>>> example.
>>>>>> Whatever technique is employed to catch this, how would this differ from a HW
>>>>>> based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU
>>>>>> going to have a 2nd place to query the dirty logs - ie: the vDPA layer?
>>>>> I don't think anyone has a good handle at the vhost pci migration yet.
>>>>> But I think a reasonable way to handle that would be to
>>>>> activate dirty tracking in VM2's QEMU.
>>>>>
>>>>> And then VM2's QEMU would periodically copy the bits to the log - does
>>>>> this sound right?
>>>>>
>>>>>> Further I heard about a SW based DPT within the vDPA framework for those
>>>>>> devices that do not (yet) support DPT inherently in HW. How is this envisioned
>>>>>> to work?
>>>>> What I am aware of is simply switching to a software virtio
>>>>> for the duration of migration. The software can be pretty simple
>>>>> since the formats match: just copy available entries to device ring,
>>>>> and for used entries, see a used ring entry, mark page
>>>>> dirty and then copy used entry to guest ring.
>>>> That looks more heavyweight than e.g just relay used ring (as what dpdk did)
>>>> I believe?
>>> That works for used but not for the packed ring.
>> For packed ring, we can relay the descriptor ring?
> Yes, and thus one must relay both available and used descriptors.
>

Yes.


> It's an interesting tradeoff. Packed ring at least was not designed
> with multiple actors in mind.


Yes.


> If this becomes a thing (and that's a big if) it might make sense to
> support temporarily reporting used entries in a separate buffer, while
> migration is in progress.  Also if doing this, it looks like we can then
> support used ring resize too, and thus it might also make sense to use
> this to support sharing a used ring between multiple available rings -
> this way a single CPU can handle multiple used rings efficiently.


Right, that's something similar to the two ring model I proposed in the 
past.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-03-10  6:39           ` Jason Wang
@ 2020-03-18 15:13             ` Rob Miller
  2020-03-19  3:35               ` Jason Wang
  2020-03-19 11:17               ` Paolo Bonzini
  0 siblings, 2 replies; 24+ messages in thread
From: Rob Miller @ 2020-03-18 15:13 UTC (permalink / raw)
  To: Virtio-Dev

[-- Attachment #1: Type: text/plain, Size: 4680 bytes --]

In trying to more fully understand DPT, I ran across an article regarding
how Physical RAM works within QEMU and noticed the statement below. My
current understanding, based upon the statement, is that DPT is automatic
inside QEMU. I can understand that this scheme is not employed in all
hypervisors, but i'm wondering if others, b/c of VM migration, do have a
similar scheme.


Dirty memory tracking

When the guest CPU or device DMA stores to guest RAM this needs to be
noticed by several users:


   1. The live migration feature relies on tracking dirty memory pages so
   they can be resent if they change during live migration.
   2. TCG relies on tracking self-modifying code so it can recompile
   changed instructions.
   3. Graphics card emulation relies on tracking dirty video memory to
   redraw only scanlines that have changed.

There are dirty memory bitmaps for each of these users in ram_list because
dirty memory tracking can be enabled or disabled independently for each of
these users.

http://blog.vmsplice.net/2016/01/qemu-internals-how-guest-physical-ram.html

Rob Miller
rob.miller@broadcom.com
(919)721-3339


On Tue, Mar 10, 2020 at 2:39 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/3/10 下午2:24, Michael S. Tsirkin wrote:
> > On Tue, Mar 10, 2020 at 11:22:00AM +0800, Jason Wang wrote:
> >> On 2020/3/9 下午6:13, Michael S. Tsirkin wrote:
> >>> On Mon, Mar 09, 2020 at 04:50:43PM +0800, Jason Wang wrote:
> >>>> On 2020/3/9 下午3:38, Michael S. Tsirkin wrote:
> >>>>> On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote:
> >>>>>> I understand that DPT isn't really on the forefront of the vDPA
> framework, but
> >>>>>> wanted to understand if there any initial thoughts on how this
> would work...
> >>>>> And judging by the next few chapters, you are actually
> >>>>> talking about vhost pci, right?
> >>>>>
> >>>>>> In the migration framework, in its simplest form, (I gather) its
> QEMU via KVM
> >>>>>> that is reading the dirty page table, converting bits to page
> numbers, then
> >>>>>> flushing remote VM/copying local page(s)->remote VM, ect.
> >>>>>>
> >>>>>> While this is fine for a VM (say VM1) dirtying its own memory and
> the accesses
> >>>>>> are trapped in the kernel as well as the log is being updated, I'm
> not sure
> >>>>>> what happens in the situation of vhost, where a remote VM (say VM2)
> is dirtying
> >>>>>> up VM1's memory since it can directly access it, during packet
> reception for
> >>>>>> example.
> >>>>>> Whatever technique is employed to catch this, how would this differ
> from a HW
> >>>>>> based Virtio device doing DMA directly into a VM's DDR, wrt to DPT?
> Is QEMU
> >>>>>> going to have a 2nd place to query the dirty logs - ie: the vDPA
> layer?
> >>>>> I don't think anyone has a good handle at the vhost pci migration
> yet.
> >>>>> But I think a reasonable way to handle that would be to
> >>>>> activate dirty tracking in VM2's QEMU.
> >>>>>
> >>>>> And then VM2's QEMU would periodically copy the bits to the log -
> does
> >>>>> this sound right?
> >>>>>
> >>>>>> Further I heard about a SW based DPT within the vDPA framework for
> those
> >>>>>> devices that do not (yet) support DPT inherently in HW. How is this
> envisioned
> >>>>>> to work?
> >>>>> What I am aware of is simply switching to a software virtio
> >>>>> for the duration of migration. The software can be pretty simple
> >>>>> since the formats match: just copy available entries to device ring,
> >>>>> and for used entries, see a used ring entry, mark page
> >>>>> dirty and then copy used entry to guest ring.
> >>>> That looks more heavyweight than e.g just relay used ring (as what
> dpdk did)
> >>>> I believe?
> >>> That works for used but not for the packed ring.
> >> For packed ring, we can relay the descriptor ring?
> > Yes, and thus one must relay both available and used descriptors.
> >
>
> Yes.
>
>
> > It's an interesting tradeoff. Packed ring at least was not designed
> > with multiple actors in mind.
>
>
> Yes.
>
>
> > If this becomes a thing (and that's a big if) it might make sense to
> > support temporarily reporting used entries in a separate buffer, while
> > migration is in progress.  Also if doing this, it looks like we can then
> > support used ring resize too, and thus it might also make sense to use
> > this to support sharing a used ring between multiple available rings -
> > this way a single CPU can handle multiple used rings efficiently.
>
>
> Right, that's something similar to the two ring model I proposed in the
> past.
>
> Thanks
>
>

[-- Attachment #2: Type: text/html, Size: 7417 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-03-18 15:13             ` Rob Miller
@ 2020-03-19  3:35               ` Jason Wang
  2020-03-19 11:17               ` Paolo Bonzini
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Wang @ 2020-03-19  3:35 UTC (permalink / raw)
  To: Rob Miller, Virtio-Dev


On 2020/3/18 下午11:13, Rob Miller wrote:
> In trying to more fully understand DPT, I ran across an article 
> regarding how Physical RAM works within QEMU and noticed the statement 
> below. My current understanding, based upon the statement, is that DPT 
> is automatic inside QEMU. I can understand that this scheme is not 
> employed in all hypervisors, but i'm wondering if others, b/c of VM 
> migration, do have a similar scheme.
>
>
>     Dirty memory tracking
>
> When the guest CPU or device DMA stores to guest RAM this needs to be 
> noticed by several users:
>
>  1. The live migration feature relies on tracking dirty memory pages
>     so they can be resent if they change during live migration.
>  2. TCG relies on tracking self-modifying code so it can recompile
>     changed instructions.
>  3. Graphics card emulation relies on tracking dirty video memory to
>     redraw only scanlines that have changed.
>
> There are dirty memory bitmaps for each of these users in ram_list 
> because dirty memory tracking can be enabled or disabled independently 
> for each of these users.
>
> http://blog.vmsplice.net/2016/01/qemu-internals-how-guest-physical-ram.html 
>
>
> Rob Miller
> rob.miller@broadcom.com <mailto:rob.miller@broadcom.com>
> (919)721-3339


Hi Rob:

My understanding is DPT is a must for all hypervisors that want to 
support live migration.

For qemu, except for tracking dirty pages by itself, it can also syncs 
dirty pages from external users like:

- KVM: which can write protect pages and track dirty page through #PF
- vhost: which is a software virtio backend which can track the used 
ring and then know which page were modified
- VFIO: the work of syncing dirty pages from hardware is ongoing.

For vDPA, we have two ways do that:

- pure software solution, qemu vhost-vdpa backend will take over the 
ring (used ring for split for example), then it can know which part of 
guest memory was modified by vDPA and report the dirty pages through 
qemu internal helpers.
- hardware solution, when hardware support dirty page tracking, vDPA bus 
need to be extended to allow hardware to report dirty pages (bitmap or 
other), and qemu can sync them from vhost.

Thanks



>
>
> On Tue, Mar 10, 2020 at 2:39 AM Jason Wang <jasowang@redhat.com 
> <mailto:jasowang@redhat.com>> wrote:
>
>
>     On 2020/3/10 下午2:24, Michael S. Tsirkin wrote:
>     > On Tue, Mar 10, 2020 at 11:22:00AM +0800, Jason Wang wrote:
>     >> On 2020/3/9 下午6:13, Michael S. Tsirkin wrote:
>     >>> On Mon, Mar 09, 2020 at 04:50:43PM +0800, Jason Wang wrote:
>     >>>> On 2020/3/9 下午3:38, Michael S. Tsirkin wrote:
>     >>>>> On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote:
>     >>>>>> I understand that DPT isn't really on the forefront of the
>     vDPA framework, but
>     >>>>>> wanted to understand if there any initial thoughts on how
>     this would work...
>     >>>>> And judging by the next few chapters, you are actually
>     >>>>> talking about vhost pci, right?
>     >>>>>
>     >>>>>> In the migration framework, in its simplest form, (I
>     gather) its QEMU via KVM
>     >>>>>> that is reading the dirty page table, converting bits to
>     page numbers, then
>     >>>>>> flushing remote VM/copying local page(s)->remote VM, ect.
>     >>>>>>
>     >>>>>> While this is fine for a VM (say VM1) dirtying its own
>     memory and the accesses
>     >>>>>> are trapped in the kernel as well as the log is being
>     updated, I'm not sure
>     >>>>>> what happens in the situation of vhost, where a remote VM
>     (say VM2) is dirtying
>     >>>>>> up VM1's memory since it can directly access it, during
>     packet reception for
>     >>>>>> example.
>     >>>>>> Whatever technique is employed to catch this, how would
>     this differ from a HW
>     >>>>>> based Virtio device doing DMA directly into a VM's DDR, wrt
>     to DPT? Is QEMU
>     >>>>>> going to have a 2nd place to query the dirty logs - ie: the
>     vDPA layer?
>     >>>>> I don't think anyone has a good handle at the vhost pci
>     migration yet.
>     >>>>> But I think a reasonable way to handle that would be to
>     >>>>> activate dirty tracking in VM2's QEMU.
>     >>>>>
>     >>>>> And then VM2's QEMU would periodically copy the bits to the
>     log - does
>     >>>>> this sound right?
>     >>>>>
>     >>>>>> Further I heard about a SW based DPT within the vDPA
>     framework for those
>     >>>>>> devices that do not (yet) support DPT inherently in HW. How
>     is this envisioned
>     >>>>>> to work?
>     >>>>> What I am aware of is simply switching to a software virtio
>     >>>>> for the duration of migration. The software can be pretty simple
>     >>>>> since the formats match: just copy available entries to
>     device ring,
>     >>>>> and for used entries, see a used ring entry, mark page
>     >>>>> dirty and then copy used entry to guest ring.
>     >>>> That looks more heavyweight than e.g just relay used ring (as
>     what dpdk did)
>     >>>> I believe?
>     >>> That works for used but not for the packed ring.
>     >> For packed ring, we can relay the descriptor ring?
>     > Yes, and thus one must relay both available and used descriptors.
>     >
>
>     Yes.
>
>
>     > It's an interesting tradeoff. Packed ring at least was not designed
>     > with multiple actors in mind.
>
>
>     Yes.
>
>
>     > If this becomes a thing (and that's a big if) it might make sense to
>     > support temporarily reporting used entries in a separate buffer,
>     while
>     > migration is in progress.  Also if doing this, it looks like we
>     can then
>     > support used ring resize too, and thus it might also make sense
>     to use
>     > this to support sharing a used ring between multiple available
>     rings -
>     > this way a single CPU can handle multiple used rings efficiently.
>
>
>     Right, that's something similar to the two ring model I proposed
>     in the
>     past.
>
>     Thanks
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-03-18 15:13             ` Rob Miller
  2020-03-19  3:35               ` Jason Wang
@ 2020-03-19 11:17               ` Paolo Bonzini
  2020-04-07  9:52                 ` Eugenio Perez Martin
  1 sibling, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2020-03-19 11:17 UTC (permalink / raw)
  To: Rob Miller, Virtio-Dev

The sentence below refers to emulated device DMA.

When emulated devices inside QEMU perform DMA goes through functions
that keep the dirty page bitmap up to date.  Likewise for CPU emulation
performed by QEMU, which is not an issue if you are using KVM or other
hypervisors supported by QEMU.

Whenever external code touches memory (which includes all the cases
mentioned by Jason), it has to provide an interface for QEMU to read the
dirty page bitmaps and synchronize them at appropriate points.

Paolo

On 18/03/20 16:13, Rob Miller wrote:
> In trying to more fully understand DPT, I ran across an article
> regarding how Physical RAM works within QEMU and noticed the statement
> below. My current understanding, based upon the statement, is that DPT
> is automatic inside QEMU. I can understand that this scheme is not
> employed in all hypervisors, but i'm wondering if others, b/c of VM
> migration, do have a similar scheme.
> 
> 
>     Dirty memory tracking
> 
> When the guest CPU or device DMA stores to guest RAM this needs to be
> noticed by several users:
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-03-19 11:17               ` Paolo Bonzini
@ 2020-04-07  9:52                 ` Eugenio Perez Martin
  2020-04-07 10:27                   ` Rob Miller
                                     ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Eugenio Perez Martin @ 2020-04-07  9:52 UTC (permalink / raw)
  To: Rob Miller
  Cc: Virtio-Dev, Paolo Bonzini, Jason Wang, Michael Tsirkin,
	Juan Quintela

Hi!

So, from the previous mails, it seems that monitoring the used ring
(and the packed descriptors) is a good first step in that direction,
as DPDK did. This way, the device does not need to worry about the
dirty page tracking using a bitmap and the PCI writes limitation, and
we can evaluate later the proposed alternatives:
* Alternate used descriptors in packed.
* vDPA interface for vDPA devices in a convenient format.

Any thoughts? Do you think that we should start with another way?

Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-04-07  9:52                 ` Eugenio Perez Martin
@ 2020-04-07 10:27                   ` Rob Miller
  2020-04-07 16:31                     ` Eugenio Perez Martin
  2020-04-07 10:40                   ` Rob Miller
  2020-04-09 21:06                   ` Michael S. Tsirkin
  2 siblings, 1 reply; 24+ messages in thread
From: Rob Miller @ 2020-04-07 10:27 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Virtio-Dev, Paolo Bonzini, Jason Wang, Michael Tsirkin,
	Juan Quintela

[-- Attachment #1: Type: text/plain, Size: 979 bytes --]

Does this mean that SW takes over the datapath during LM?
If so, is there any infrastructure to "gracefully" do a hand-off from HW
mode (pci device is managing the rings) to SW mode, in that when switching
from HW->SW, HW is stalled & quiesced, then the SW takes from where HW
left off?

Rob Miller
rob.miller@broadcom.com
(919)721-3339


On Tue, Apr 7, 2020 at 5:53 AM Eugenio Perez Martin <eperezma@redhat.com>
wrote:

> Hi!
>
> So, from the previous mails, it seems that monitoring the used ring
> (and the packed descriptors) is a good first step in that direction,
> as DPDK did. This way, the device does not need to worry about the
> dirty page tracking using a bitmap and the PCI writes limitation, and
> we can evaluate later the proposed alternatives:
> * Alternate used descriptors in packed.
> * vDPA interface for vDPA devices in a convenient format.
>
> Any thoughts? Do you think that we should start with another way?
>
> Thanks!
>
>

[-- Attachment #2: Type: text/html, Size: 1521 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-04-07 10:27                   ` Rob Miller
@ 2020-04-07 16:31                     ` Eugenio Perez Martin
  2020-04-08 10:10                       ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Eugenio Perez Martin @ 2020-04-07 16:31 UTC (permalink / raw)
  To: Rob Miller
  Cc: Virtio-Dev, Paolo Bonzini, Jason Wang, Michael Tsirkin,
	Juan Quintela

On Tue, Apr 7, 2020 at 12:28 PM Rob Miller <rob.miller@broadcom.com> wrote:
>
> Does this mean that SW takes over the datapath during LM?

"Takes over" sounds to me like solving the problem using failover to
switch to a software interface to do the packet forwarding during
migration [1]. In this solution, only the used ring needs to be spied
or intercepted to communicate qemu the memory regions modified.

Not sure if this is what you had in mind.

[1] https://www.dpdk.org/wp-content/uploads/sites/35/2019/10/VirtioNet.pdf

> If so, is there any infrastructure to "gracefully" do a hand-off from HW mode (pci device is managing the rings) to SW mode, in that when switching from HW->SW, HW is stalled & quiesced, then the SW takes from where HW left off?
>
> Rob Miller
> rob.miller@broadcom.com
> (919)721-3339
>
>
> On Tue, Apr 7, 2020 at 5:53 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>>
>> Hi!
>>
>> So, from the previous mails, it seems that monitoring the used ring
>> (and the packed descriptors) is a good first step in that direction,
>> as DPDK did. This way, the device does not need to worry about the
>> dirty page tracking using a bitmap and the PCI writes limitation, and
>> we can evaluate later the proposed alternatives:
>> * Alternate used descriptors in packed.
>> * vDPA interface for vDPA devices in a convenient format.
>>
>> Any thoughts? Do you think that we should start with another way?
>>
>> Thanks!
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-04-07 16:31                     ` Eugenio Perez Martin
@ 2020-04-08 10:10                       ` Jason Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2020-04-08 10:10 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Rob Miller, Virtio-Dev, Paolo Bonzini, Michael Tsirkin,
	Juan Quintela


----- Original Message -----
> On Tue, Apr 7, 2020 at 12:28 PM Rob Miller <rob.miller@broadcom.com> wrote:
> >
> > Does this mean that SW takes over the datapath during LM?
> 
> "Takes over" sounds to me like solving the problem using failover to
> switch to a software interface to do the packet forwarding during
> migration [1]. In this solution, only the used ring needs to be spied
> or intercepted to communicate qemu the memory regions modified.
> 
> Not sure if this is what you had in mind.
> 
> [1] https://www.dpdk.org/wp-content/uploads/sites/35/2019/10/VirtioNet.pdf

Yes, so my understanding is, there's two way for doing software
assisted live migration:

1) Switch to a full software datapath. I think this is what you meant
   here, this work but may get more performance degradation.

2) Used ring relay, this means qemu will only take over the used ring,
   this means when migration start, qemu will teach hardware to use
   another (mediated) used ring, then qemu can inspect it and log the
   dirty page from there, and relay the content to the used ring used
   by guest. This can have better performance.

DPDK choose to use method 2). You may refer
https://www.dpdk.org/wp-content/uploads/sites/35/2018/12/XiaoWang-DPDK-US-Summit-SW-assisted-VDPA-for-LM-v2.pdf


> 
> > If so, is there any infrastructure to "gracefully" do a hand-off from HW
> > mode (pci device is managing the rings) to SW mode, in that when switching
> > from HW->SW, HW is stalled & quiesced, then the SW takes from where HW
> > left off?

Yes, for both methods, it requires a stop & start the device.

And it also means the device should support the recovery of virtqueue
state. E.g it supports a last_avail_idx set by driver, then when the
device is started, it can try to read avail ring index start at
last_avail_idx. This is why vDPA bus support set_vq_state().

Thanks

> >
> > Rob Miller
> > rob.miller@broadcom.com
> > (919)721-3339
> >
> >
> > On Tue, Apr 7, 2020 at 5:53 AM Eugenio Perez Martin <eperezma@redhat.com>
> > wrote:
> >>
> >> Hi!
> >>
> >> So, from the previous mails, it seems that monitoring the used ring
> >> (and the packed descriptors) is a good first step in that direction,
> >> as DPDK did. This way, the device does not need to worry about the
> >> dirty page tracking using a bitmap and the PCI writes limitation, and
> >> we can evaluate later the proposed alternatives:
> >> * Alternate used descriptors in packed.
> >> * vDPA interface for vDPA devices in a convenient format.
> >>
> >> Any thoughts? Do you think that we should start with another way?
> >>
> >> Thanks!
> >>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-04-07  9:52                 ` Eugenio Perez Martin
  2020-04-07 10:27                   ` Rob Miller
@ 2020-04-07 10:40                   ` Rob Miller
  2020-04-08 10:00                     ` Jason Wang
  2020-04-09 21:06                   ` Michael S. Tsirkin
  2 siblings, 1 reply; 24+ messages in thread
From: Rob Miller @ 2020-04-07 10:40 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Virtio-Dev, Paolo Bonzini, Jason Wang, Michael Tsirkin,
	Juan Quintela

[-- Attachment #1: Type: text/plain, Size: 920 bytes --]

another question on vDPA vs vendor specific driver portion...

Are the subsystem vendor & device IDs to be different from the primary (Red
Hat) versions as there has to be a way for a vendor specific driver to
"see" its device.

Rob Miller
rob.miller@broadcom.com
(919)721-3339


On Tue, Apr 7, 2020 at 5:53 AM Eugenio Perez Martin <eperezma@redhat.com>
wrote:

> Hi!
>
> So, from the previous mails, it seems that monitoring the used ring
> (and the packed descriptors) is a good first step in that direction,
> as DPDK did. This way, the device does not need to worry about the
> dirty page tracking using a bitmap and the PCI writes limitation, and
> we can evaluate later the proposed alternatives:
> * Alternate used descriptors in packed.
> * vDPA interface for vDPA devices in a convenient format.
>
> Any thoughts? Do you think that we should start with another way?
>
> Thanks!
>
>

[-- Attachment #2: Type: text/html, Size: 1471 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-04-07 10:40                   ` Rob Miller
@ 2020-04-08 10:00                     ` Jason Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2020-04-08 10:00 UTC (permalink / raw)
  To: Rob Miller
  Cc: Eugenio Perez Martin, Virtio-Dev, Paolo Bonzini, Michael Tsirkin,
	Juan Quintela

----- Original Message -----
> another question on vDPA vs vendor specific driver portion...
> 
> Are the subsystem vendor & device IDs to be different from the primary (Red
> Hat) versions as there has to be a way for a vendor specific driver to
> "see" its device.

Yes, any kinds of (PCI)device could be registered to the vDPA bus. For PCI
driver, it supports exact mathing based on subsystem ID.

E.g in IFCVF driver it does:

static struct pci_device_id ifcvf_pci_ids[] = {
	{ PCI_DEVICE_SUB(IFCVF_VENDOR_ID,
		IFCVF_DEVICE_ID,
		IFCVF_SUBSYS_VENDOR_ID,
		IFCVF_SUBSYS_DEVICE_ID) },
	{ 0 },
};

Which uses Redhat primary vendor ID but their own sybsysm
vendor/device ID.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-04-07  9:52                 ` Eugenio Perez Martin
  2020-04-07 10:27                   ` Rob Miller
  2020-04-07 10:40                   ` Rob Miller
@ 2020-04-09 21:06                   ` Michael S. Tsirkin
  2020-04-10  2:40                     ` Jason Wang
  2 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2020-04-09 21:06 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Rob Miller, Virtio-Dev, Paolo Bonzini, Jason Wang, Juan Quintela

On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin wrote:
> Hi!
> 
> So, from the previous mails, it seems that monitoring the used ring
> (and the packed descriptors) is a good first step in that direction,
> as DPDK did. This way, the device does not need to worry about the
> dirty page tracking using a bitmap and the PCI writes limitation, and
> we can evaluate later the proposed alternatives:
> * Alternate used descriptors in packed.
> * vDPA interface for vDPA devices in a convenient format.
> 
> Any thoughts? Do you think that we should start with another way?
> 
> Thanks!

I am concerned that with software in data path, we'll hit RX queue
underruns, won't we?
Two ways to avoid underruns:
- dirty page tracking
- page faults

I'm working on a proposal for page faults now. If someone wants
to work on dirty tracking in addition, that's also an option.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-04-09 21:06                   ` Michael S. Tsirkin
@ 2020-04-10  2:40                     ` Jason Wang
  2020-04-13 12:15                       ` Eugenio Perez Martin
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2020-04-10  2:40 UTC (permalink / raw)
  To: Michael S. Tsirkin, Eugenio Perez Martin
  Cc: Rob Miller, Virtio-Dev, Paolo Bonzini, Juan Quintela


On 2020/4/10 上午5:06, Michael S. Tsirkin wrote:
> On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin wrote:
>> Hi!
>>
>> So, from the previous mails, it seems that monitoring the used ring
>> (and the packed descriptors) is a good first step in that direction,
>> as DPDK did. This way, the device does not need to worry about the
>> dirty page tracking using a bitmap and the PCI writes limitation, and
>> we can evaluate later the proposed alternatives:
>> * Alternate used descriptors in packed.
>> * vDPA interface for vDPA devices in a convenient format.
>>
>> Any thoughts? Do you think that we should start with another way?
>>
>> Thanks!
> I am concerned that with software in data path, we'll hit RX queue
> underruns, won't we?


Do you mean it will lose some performance? If yes, I think so.


> Two ways to avoid underruns:
> - dirty page tracking
> - page faults


It looks to me this will lead even worse performance than software path? 
There will be lots of page faults during RX.

Another direction is to track dirty pages via IOMMU. E.g recent Intel 
IOMMU has EA and D bit which could be used for tracking pages wrote by 
devices but not CPU.


>
> I'm working on a proposal for page faults now.


I guess it's better to have a transport independent method.


> If someone wants
> to work on dirty tracking in addition, that's also an option.


I remember Rob mention some challenges of implementing dirty bitmap, I 
wonder something like queue based interface would be better (similar to 
Peter did for KVM)?

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-04-10  2:40                     ` Jason Wang
@ 2020-04-13 12:15                       ` Eugenio Perez Martin
  2020-04-13 13:30                         ` Rob Miller
  2020-04-13 13:55                         ` Jason Wang
  0 siblings, 2 replies; 24+ messages in thread
From: Eugenio Perez Martin @ 2020-04-13 12:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Rob Miller, Virtio-Dev, Paolo Bonzini,
	Juan Quintela

On Fri, Apr 10, 2020 at 4:40 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2020/4/10 上午5:06, Michael S. Tsirkin wrote:
> > On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin wrote:
> >> Hi!
> >>
> >> So, from the previous mails, it seems that monitoring the used ring
> >> (and the packed descriptors) is a good first step in that direction,
> >> as DPDK did. This way, the device does not need to worry about the
> >> dirty page tracking using a bitmap and the PCI writes limitation, and
> >> we can evaluate later the proposed alternatives:
> >> * Alternate used descriptors in packed.
> >> * vDPA interface for vDPA devices in a convenient format.
> >>
> >> Any thoughts? Do you think that we should start with another way?
> >>
> >> Thanks!
> > I am concerned that with software in data path, we'll hit RX queue
> > underruns, won't we?
>
>
> Do you mean it will lose some performance? If yes, I think so.
>
>
> > Two ways to avoid underruns:
> > - dirty page tracking
> > - page faults
>
>
> It looks to me this will lead even worse performance than software path?
> There will be lots of page faults during RX.
>
> Another direction is to track dirty pages via IOMMU. E.g recent Intel
> IOMMU has EA and D bit which could be used for tracking pages wrote by
> devices but not CPU.
>

So this could be added on top of the dirty tracking mechanism, isn't?
or would it be easier to start another-way around, and to start using
modern IOMMU and then extend to old generic code?

>
> >
> > I'm working on a proposal for page faults now.
>
>
> I guess it's better to have a transport independent method.
>
>
> > If someone wants
> > to work on dirty tracking in addition, that's also an option.
>
>
> I remember Rob mention some challenges of implementing dirty bitmap, I
> wonder something like queue based interface would be better (similar to
> Peter did for KVM)?
>

I think that the main challenge was to write bits instead of writes
using PCI bus.

From a conversation with Juan, another solution could be to do a byte
map DPT, where a byte represents a page, not a bit. While I find this
solution simpler, I'm not sure about the performance implications of
track this way (memory/TLB pressure, etc). Rob and Juan, please
correct me if I'm wrong or missed something.

Compared to the used ring interposition, much more logic needs to be
in the device driver for both alternatives (byte mapping and queue),
isn't it?

> Thanks
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-04-13 12:15                       ` Eugenio Perez Martin
@ 2020-04-13 13:30                         ` Rob Miller
  2020-04-13 13:49                           ` Jason Wang
  2020-04-13 13:49                           ` Jason Wang
  2020-04-13 13:55                         ` Jason Wang
  1 sibling, 2 replies; 24+ messages in thread
From: Rob Miller @ 2020-04-13 13:30 UTC (permalink / raw)
  To: Virtio-Dev

[-- Attachment #1: Type: text/plain, Size: 3429 bytes --]

On Mon, Apr 13, 2020 at 8:16 AM Eugenio Perez Martin <eperezma@redhat.com>
wrote:

> On Fri, Apr 10, 2020 at 4:40 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > On 2020/4/10 上午5:06, Michael S. Tsirkin wrote:
> > > On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin wrote:
> > >> Hi!
> > >>
> > >> So, from the previous mails, it seems that monitoring the used ring
> > >> (and the packed descriptors) is a good first step in that direction,
> > >> as DPDK did. This way, the device does not need to worry about the
> > >> dirty page tracking using a bitmap and the PCI writes limitation, and
> > >> we can evaluate later the proposed alternatives:
> > >> * Alternate used descriptors in packed.
> > >> * vDPA interface for vDPA devices in a convenient format.
> > >>
> > >> Any thoughts? Do you think that we should start with another way?
> > >>
> > >> Thanks!
> > > I am concerned that with software in data path, we'll hit RX queue
> > > underruns, won't we?
> >
> >
> > Do you mean it will lose some performance? If yes, I think so.
> >
> >
> > > Two ways to avoid underruns:
> > > - dirty page tracking
> > > - page faults
> >
> >
> > It looks to me this will lead even worse performance than software path?
> > There will be lots of page faults during RX.
> >
> > Another direction is to track dirty pages via IOMMU. E.g recent Intel
> > IOMMU has EA and D bit which could be used for tracking pages wrote by
> > devices but not CPU.
> >
>
> So this could be added on top of the dirty tracking mechanism, isn't?
> or would it be easier to start another-way around, and to start using
> modern IOMMU and then extend to old generic code?
>
> >
> > >
> > > I'm working on a proposal for page faults now.
> >
> >
> > I guess it's better to have a transport independent method.
> >
> >
> > > If someone wants
> > > to work on dirty tracking in addition, that's also an option.
> >
> >
> > I remember Rob mention some challenges of implementing dirty bitmap, I
> > wonder something like queue based interface would be better (similar to
> > Peter did for KVM)?
> >
>
> I think that the main challenge was to write bits instead of writes
> using PCI bus.
>
> From a conversation with Juan, another solution could be to do a byte
> map DPT, where a byte represents a page, not a bit. While I find this
> solution simpler, I'm not sure about the performance implications of
> track this way (memory/TLB pressure, etc). Rob and Juan, please
> correct me if I'm wrong or missed something.
>
RJM>] You captured it correctly. We (Broadcom) discussed the idea of
using a byte (being 0xFF) which effectively indicate that 64KB were dirty,
this would cause potentially refreshing pages to the remote that weren't
needed to be refreshed, slowing down the migration. I like the idea of
mapping
a byte to a page, and teach QEMU (or vDPA, or ...) how to interpret.

>
> Compared to the used ring interposition, much more logic needs to be
> in the device driver for both alternatives (byte mapping and queue),
> isn't it?
>
RJM> Yes, but this would (should) lead to best performance.

>
> > Thanks
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>
>

[-- Attachment #2: Type: text/html, Size: 4745 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-04-13 13:30                         ` Rob Miller
@ 2020-04-13 13:49                           ` Jason Wang
  2020-04-13 13:49                           ` Jason Wang
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Wang @ 2020-04-13 13:49 UTC (permalink / raw)
  To: Rob Miller, Virtio-Dev


On 2020/4/13 下午9:30, Rob Miller wrote:
>
>
> On Mon, Apr 13, 2020 at 8:16 AM Eugenio Perez Martin 
> <eperezma@redhat.com <mailto:eperezma@redhat.com>> wrote:
>
>     On Fri, Apr 10, 2020 at 4:40 AM Jason Wang <jasowang@redhat.com
>     <mailto:jasowang@redhat.com>> wrote:
>     >
>     >
>     > On 2020/4/10 上午5:06, Michael S. Tsirkin wrote:
>     > > On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin
>     wrote:
>     > >> Hi!
>     > >>
>     > >> So, from the previous mails, it seems that monitoring the
>     used ring
>     > >> (and the packed descriptors) is a good first step in that
>     direction,
>     > >> as DPDK did. This way, the device does not need to worry
>     about the
>     > >> dirty page tracking using a bitmap and the PCI writes
>     limitation, and
>     > >> we can evaluate later the proposed alternatives:
>     > >> * Alternate used descriptors in packed.
>     > >> * vDPA interface for vDPA devices in a convenient format.
>     > >>
>     > >> Any thoughts? Do you think that we should start with another way?
>     > >>
>     > >> Thanks!
>     > > I am concerned that with software in data path, we'll hit RX queue
>     > > underruns, won't we?
>     >
>     >
>     > Do you mean it will lose some performance? If yes, I think so.
>     >
>     >
>     > > Two ways to avoid underruns:
>     > > - dirty page tracking
>     > > - page faults
>     >
>     >
>     > It looks to me this will lead even worse performance than
>     software path?
>     > There will be lots of page faults during RX.
>     >
>     > Another direction is to track dirty pages via IOMMU. E.g recent
>     Intel
>     > IOMMU has EA and D bit which could be used for tracking pages
>     wrote by
>     > devices but not CPU.
>     >
>
>     So this could be added on top of the dirty tracking mechanism, isn't?
>     or would it be easier to start another-way around, and to start using
>     modern IOMMU and then extend to old generic code?
>
>     >
>     > >
>     > > I'm working on a proposal for page faults now.
>     >
>     >
>     > I guess it's better to have a transport independent method.
>     >
>     >
>     > > If someone wants
>     > > to work on dirty tracking in addition, that's also an option.
>     >
>     >
>     > I remember Rob mention some challenges of implementing dirty
>     bitmap, I
>     > wonder something like queue based interface would be better
>     (similar to
>     > Peter did for KVM)?
>     >
>
>     I think that the main challenge was to write bits instead of writes
>     using PCI bus.
>
>     From a conversation with Juan, another solution could be to do a byte
>     map DPT, where a byte represents a page, not a bit. While I find this
>     solution simpler, I'm not sure about the performance implications of
>     track this way (memory/TLB pressure, etc). Rob and Juan, please
>     correct me if I'm wrong or missed something.
>

It would lead large working set, but the actual impact may require 
benchmark.


> RJM>] You captured it correctly. We (Broadcom) discussed the idea of
> using a byte (being 0xFF) which effectively indicate that 64KB were dirty,
> this would cause potentially refreshing pages to the remote that weren't
> needed to be refreshed, slowing down the migration. I like the idea of 
> mapping
> a byte to a page, and teach QEMU (or vDPA, or ...) how to interpret.


That's a way to go, I think we can try to propose an API and move the 
discussion upstream.


>
>     Compared to the used ring interposition, much more logic needs to be
>     in the device driver for both alternatives (byte mapping and queue),
>     isn't it?
>
> RJM> Yes, but this would (should) lead to best performance.


I agree, I think we can start from software support and discuss the 
software path in parallel.

Thanks


>
>     > Thanks
>     >
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail:
>     virtio-dev-unsubscribe@lists.oasis-open.org
>     <mailto:virtio-dev-unsubscribe@lists.oasis-open.org>
>     For additional commands, e-mail:
>     virtio-dev-help@lists.oasis-open.org
>     <mailto:virtio-dev-help@lists.oasis-open.org>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-04-13 13:30                         ` Rob Miller
  2020-04-13 13:49                           ` Jason Wang
@ 2020-04-13 13:49                           ` Jason Wang
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Wang @ 2020-04-13 13:49 UTC (permalink / raw)
  To: Rob Miller, Virtio-Dev


On 2020/4/13 下午9:30, Rob Miller wrote:
>
>
> On Mon, Apr 13, 2020 at 8:16 AM Eugenio Perez Martin 
> <eperezma@redhat.com <mailto:eperezma@redhat.com>> wrote:
>
>     On Fri, Apr 10, 2020 at 4:40 AM Jason Wang <jasowang@redhat.com
>     <mailto:jasowang@redhat.com>> wrote:
>     >
>     >
>     > On 2020/4/10 上午5:06, Michael S. Tsirkin wrote:
>     > > On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin
>     wrote:
>     > >> Hi!
>     > >>
>     > >> So, from the previous mails, it seems that monitoring the
>     used ring
>     > >> (and the packed descriptors) is a good first step in that
>     direction,
>     > >> as DPDK did. This way, the device does not need to worry
>     about the
>     > >> dirty page tracking using a bitmap and the PCI writes
>     limitation, and
>     > >> we can evaluate later the proposed alternatives:
>     > >> * Alternate used descriptors in packed.
>     > >> * vDPA interface for vDPA devices in a convenient format.
>     > >>
>     > >> Any thoughts? Do you think that we should start with another way?
>     > >>
>     > >> Thanks!
>     > > I am concerned that with software in data path, we'll hit RX queue
>     > > underruns, won't we?
>     >
>     >
>     > Do you mean it will lose some performance? If yes, I think so.
>     >
>     >
>     > > Two ways to avoid underruns:
>     > > - dirty page tracking
>     > > - page faults
>     >
>     >
>     > It looks to me this will lead even worse performance than
>     software path?
>     > There will be lots of page faults during RX.
>     >
>     > Another direction is to track dirty pages via IOMMU. E.g recent
>     Intel
>     > IOMMU has EA and D bit which could be used for tracking pages
>     wrote by
>     > devices but not CPU.
>     >
>
>     So this could be added on top of the dirty tracking mechanism, isn't?
>     or would it be easier to start another-way around, and to start using
>     modern IOMMU and then extend to old generic code?
>
>     >
>     > >
>     > > I'm working on a proposal for page faults now.
>     >
>     >
>     > I guess it's better to have a transport independent method.
>     >
>     >
>     > > If someone wants
>     > > to work on dirty tracking in addition, that's also an option.
>     >
>     >
>     > I remember Rob mention some challenges of implementing dirty
>     bitmap, I
>     > wonder something like queue based interface would be better
>     (similar to
>     > Peter did for KVM)?
>     >
>
>     I think that the main challenge was to write bits instead of writes
>     using PCI bus.
>
>     From a conversation with Juan, another solution could be to do a byte
>     map DPT, where a byte represents a page, not a bit. While I find this
>     solution simpler, I'm not sure about the performance implications of
>     track this way (memory/TLB pressure, etc). Rob and Juan, please
>     correct me if I'm wrong or missed something.
>

It would lead large working set, but the actual impact may require 
benchmark.


> RJM>] You captured it correctly. We (Broadcom) discussed the idea of
> using a byte (being 0xFF) which effectively indicate that 64KB were dirty,
> this would cause potentially refreshing pages to the remote that weren't
> needed to be refreshed, slowing down the migration. I like the idea of 
> mapping
> a byte to a page, and teach QEMU (or vDPA, or ...) how to interpret.


That's a way to go, I think we can try to propose an API and move the 
discussion upstream.


>
>     Compared to the used ring interposition, much more logic needs to be
>     in the device driver for both alternatives (byte mapping and queue),
>     isn't it?
>
> RJM> Yes, but this would (should) lead to best performance.


I agree, I think we can start from software support and discuss the 
hardware support in parallel.

Thanks


>
>     > Thanks
>     >
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail:
>     virtio-dev-unsubscribe@lists.oasis-open.org
>     <mailto:virtio-dev-unsubscribe@lists.oasis-open.org>
>     For additional commands, e-mail:
>     virtio-dev-help@lists.oasis-open.org
>     <mailto:virtio-dev-help@lists.oasis-open.org>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-04-13 12:15                       ` Eugenio Perez Martin
  2020-04-13 13:30                         ` Rob Miller
@ 2020-04-13 13:55                         ` Jason Wang
  2020-04-16 10:55                           ` Eugenio Perez Martin
  1 sibling, 1 reply; 24+ messages in thread
From: Jason Wang @ 2020-04-13 13:55 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Michael S. Tsirkin, Rob Miller, Virtio-Dev, Paolo Bonzini,
	Juan Quintela


On 2020/4/13 下午8:15, Eugenio Perez Martin wrote:
> On Fri, Apr 10, 2020 at 4:40 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2020/4/10 上午5:06, Michael S. Tsirkin wrote:
>>> On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin wrote:
>>>> Hi!
>>>>
>>>> So, from the previous mails, it seems that monitoring the used ring
>>>> (and the packed descriptors) is a good first step in that direction,
>>>> as DPDK did. This way, the device does not need to worry about the
>>>> dirty page tracking using a bitmap and the PCI writes limitation, and
>>>> we can evaluate later the proposed alternatives:
>>>> * Alternate used descriptors in packed.
>>>> * vDPA interface for vDPA devices in a convenient format.
>>>>
>>>> Any thoughts? Do you think that we should start with another way?
>>>>
>>>> Thanks!
>>> I am concerned that with software in data path, we'll hit RX queue
>>> underruns, won't we?
>>
>> Do you mean it will lose some performance? If yes, I think so.
>>
>>
>>> Two ways to avoid underruns:
>>> - dirty page tracking
>>> - page faults
>>
>> It looks to me this will lead even worse performance than software path?
>> There will be lots of page faults during RX.
>>
>> Another direction is to track dirty pages via IOMMU. E.g recent Intel
>> IOMMU has EA and D bit which could be used for tracking pages wrote by
>> devices but not CPU.
>>
> So this could be added on top of the dirty tracking mechanism, isn't?


Yes, it requires the support from IOMMU API and driver.


> or would it be easier to start another-way around, and to start using
> modern IOMMU and then extend to old generic code?


If my understanding is correct, even for modern IOMMU it still require a 
lot of work. So we need to start from software support and hardware 
support first.


>
>>> I'm working on a proposal for page faults now.
>>
>> I guess it's better to have a transport independent method.
>>
>>
>>> If someone wants
>>> to work on dirty tracking in addition, that's also an option.
>>
>> I remember Rob mention some challenges of implementing dirty bitmap, I
>> wonder something like queue based interface would be better (similar to
>> Peter did for KVM)?
>>
> I think that the main challenge was to write bits instead of writes
> using PCI bus.


Yes, the ring interface will work one the descriptor which would be 
several bytes instead of bits.

Thanks


>
>  From a conversation with Juan, another solution could be to do a byte
> map DPT, where a byte represents a page, not a bit. While I find this
> solution simpler, I'm not sure about the performance implications of
> track this way (memory/TLB pressure, etc). Rob and Juan, please
> correct me if I'm wrong or missed something.
>
> Compared to the used ring interposition, much more logic needs to be
> in the device driver for both alternatives (byte mapping and queue),
> isn't it?
>
>> Thanks
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [virtio-dev] Dirty Page Tracking (DPT)
  2020-04-13 13:55                         ` Jason Wang
@ 2020-04-16 10:55                           ` Eugenio Perez Martin
  0 siblings, 0 replies; 24+ messages in thread
From: Eugenio Perez Martin @ 2020-04-16 10:55 UTC (permalink / raw)
  To: Virtio-Dev
  Cc: Michael S. Tsirkin, Rob Miller, Paolo Bonzini, Juan Quintela,
	Jason Wang

Hi everyone.

As proposed in the previous virtio-networking Meeting, I summarized in
this document the different proposed requirements/architectures about
vDPA live migration, and a starting draft about the proposed actions.

Feedback is welcome.

Thanks!

https://docs.google.com/document/d/1-2kxRxce2CwttfsZMMoqHjyI_c-o_ZjdaHN9JHxja4M/edit?usp=sharing


On Mon, Apr 13, 2020 at 3:55 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2020/4/13 下午8:15, Eugenio Perez Martin wrote:
> > On Fri, Apr 10, 2020 at 4:40 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On 2020/4/10 上午5:06, Michael S. Tsirkin wrote:
> >>> On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin wrote:
> >>>> Hi!
> >>>>
> >>>> So, from the previous mails, it seems that monitoring the used ring
> >>>> (and the packed descriptors) is a good first step in that direction,
> >>>> as DPDK did. This way, the device does not need to worry about the
> >>>> dirty page tracking using a bitmap and the PCI writes limitation, and
> >>>> we can evaluate later the proposed alternatives:
> >>>> * Alternate used descriptors in packed.
> >>>> * vDPA interface for vDPA devices in a convenient format.
> >>>>
> >>>> Any thoughts? Do you think that we should start with another way?
> >>>>
> >>>> Thanks!
> >>> I am concerned that with software in data path, we'll hit RX queue
> >>> underruns, won't we?
> >>
> >> Do you mean it will lose some performance? If yes, I think so.
> >>
> >>
> >>> Two ways to avoid underruns:
> >>> - dirty page tracking
> >>> - page faults
> >>
> >> It looks to me this will lead even worse performance than software path?
> >> There will be lots of page faults during RX.
> >>
> >> Another direction is to track dirty pages via IOMMU. E.g recent Intel
> >> IOMMU has EA and D bit which could be used for tracking pages wrote by
> >> devices but not CPU.
> >>
> > So this could be added on top of the dirty tracking mechanism, isn't?
>
>
> Yes, it requires the support from IOMMU API and driver.
>
>
> > or would it be easier to start another-way around, and to start using
> > modern IOMMU and then extend to old generic code?
>
>
> If my understanding is correct, even for modern IOMMU it still require a
> lot of work. So we need to start from software support and hardware
> support first.
>
>
> >
> >>> I'm working on a proposal for page faults now.
> >>
> >> I guess it's better to have a transport independent method.
> >>
> >>
> >>> If someone wants
> >>> to work on dirty tracking in addition, that's also an option.
> >>
> >> I remember Rob mention some challenges of implementing dirty bitmap, I
> >> wonder something like queue based interface would be better (similar to
> >> Peter did for KVM)?
> >>
> > I think that the main challenge was to write bits instead of writes
> > using PCI bus.
>
>
> Yes, the ring interface will work one the descriptor which would be
> several bytes instead of bits.
>
> Thanks
>
>
> >
> >  From a conversation with Juan, another solution could be to do a byte
> > map DPT, where a byte represents a page, not a bit. While I find this
> > solution simpler, I'm not sure about the performance implications of
> > track this way (memory/TLB pressure, etc). Rob and Juan, please
> > correct me if I'm wrong or missed something.
> >
> > Compared to the used ring interposition, much more logic needs to be
> > in the device driver for both alternatives (byte mapping and queue),
> > isn't it?
> >
> >> Thanks
> >>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-04-16 10:55 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-06 15:40 [virtio-dev] Dirty Page Tracking (DPT) Rob Miller
2020-03-09  7:38 ` Michael S. Tsirkin
2020-03-09  8:50   ` Jason Wang
2020-03-09 10:13     ` Michael S. Tsirkin
2020-03-10  3:22       ` Jason Wang
2020-03-10  6:24         ` Michael S. Tsirkin
2020-03-10  6:39           ` Jason Wang
2020-03-18 15:13             ` Rob Miller
2020-03-19  3:35               ` Jason Wang
2020-03-19 11:17               ` Paolo Bonzini
2020-04-07  9:52                 ` Eugenio Perez Martin
2020-04-07 10:27                   ` Rob Miller
2020-04-07 16:31                     ` Eugenio Perez Martin
2020-04-08 10:10                       ` Jason Wang
2020-04-07 10:40                   ` Rob Miller
2020-04-08 10:00                     ` Jason Wang
2020-04-09 21:06                   ` Michael S. Tsirkin
2020-04-10  2:40                     ` Jason Wang
2020-04-13 12:15                       ` Eugenio Perez Martin
2020-04-13 13:30                         ` Rob Miller
2020-04-13 13:49                           ` Jason Wang
2020-04-13 13:49                           ` Jason Wang
2020-04-13 13:55                         ` Jason Wang
2020-04-16 10:55                           ` Eugenio Perez Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.