* [virtio-dev] Dirty Page Tracking (DPT) @ 2020-03-06 15:40 Rob Miller 2020-03-09 7:38 ` Michael S. Tsirkin 0 siblings, 1 reply; 24+ messages in thread From: Rob Miller @ 2020-03-06 15:40 UTC (permalink / raw) To: Virtio-Dev [-- Attachment #1: Type: text/plain, Size: 1631 bytes --] I understand that DPT isn't really on the forefront of the vDPA framework, but wanted to understand if there any initial thoughts on how this would work... In the migration framework, in its simplest form, (I gather) its QEMU via KVM that is reading the dirty page table, converting bits to page numbers, then flushing remote VM/copying local page(s)->remote VM, ect. While this is fine for a VM (say VM1) dirtying its own memory and the accesses are trapped in the kernel as well as the log is being updated, I'm not sure what happens in the situation of vhost, where a remote VM (say VM2) is dirtying up VM1's memory since it can directly access it, during packet reception for example. Whatever technique is employed to catch this, how would this differ from a HW based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU going to have a 2nd place to query the dirty logs - ie: the vDPA layer? Further I heard about a SW based DPT within the vDPA framework for those devices that do not (yet) support DPT inherently in HW. How is this envisioned to work? Finally, for those HW vendors that do support DPT in HW, a mapping of a bit -> page isn't really an option, since no one wants to do a byte wide read-modify-write across the PCI bus, but rather map a whole byte to page is likely more desirable - the HW can just do non-posted writes to the dirty page table. If byte wise, then the QEMU/vDPA layer has to either fix-up the mapping (from byte->bit) or have the capability to handle the granularity diffs. Thoughts? Rob Miller rob.miller@broadcom.com (919)721-3339 [-- Attachment #2: Type: text/html, Size: 2012 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-03-06 15:40 [virtio-dev] Dirty Page Tracking (DPT) Rob Miller @ 2020-03-09 7:38 ` Michael S. Tsirkin 2020-03-09 8:50 ` Jason Wang 0 siblings, 1 reply; 24+ messages in thread From: Michael S. Tsirkin @ 2020-03-09 7:38 UTC (permalink / raw) To: Rob Miller; +Cc: Virtio-Dev, Jason Wang On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote: > I understand that DPT isn't really on the forefront of the vDPA framework, but > wanted to understand if there any initial thoughts on how this would work... And judging by the next few chapters, you are actually talking about vhost pci, right? > In the migration framework, in its simplest form, (I gather) its QEMU via KVM > that is reading the dirty page table, converting bits to page numbers, then > flushing remote VM/copying local page(s)->remote VM, ect. > > While this is fine for a VM (say VM1) dirtying its own memory and the accesses > are trapped in the kernel as well as the log is being updated, I'm not sure > what happens in the situation of vhost, where a remote VM (say VM2) is dirtying > up VM1's memory since it can directly access it, during packet reception for > example. > Whatever technique is employed to catch this, how would this differ from a HW > based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU > going to have a 2nd place to query the dirty logs - ie: the vDPA layer? I don't think anyone has a good handle at the vhost pci migration yet. But I think a reasonable way to handle that would be to activate dirty tracking in VM2's QEMU. And then VM2's QEMU would periodically copy the bits to the log - does this sound right? > Further I heard about a SW based DPT within the vDPA framework for those > devices that do not (yet) support DPT inherently in HW. How is this envisioned > to work? What I am aware of is simply switching to a software virtio for the duration of migration. The software can be pretty simple since the formats match: just copy available entries to device ring, and for used entries, see a used ring entry, mark page dirty and then copy used entry to guest ring. Another approach that I proposed and was prototyped at some point by Alex Duyck is guest driver touching the page in question before processing it within guest e.g. by an atomic xor with 0. Sounds attractive but didn't perform all that well. > Finally, for those HW vendors that do support DPT in HW, a mapping of a bit -> > page isn't really an option, since no one wants to do a byte wide > read-modify-write across the PCI bus, but rather map a whole byte to page is > likely more desirable - the HW can just do non-posted writes to the dirty page > table. If byte wise, then the QEMU/vDPA layer has to either fix-up the mapping > (from byte->bit) or have the capability to handle the granularity diffs. > > Thoughts? > > Rob Miller > rob.miller@broadcom.com > (919)721-3339 If using an IOMMU, DPT can also be done using either PRI or dirty bit in a PTE. PRI is an interrupt so it can kick off a thread to set bits in the log I guess, but if it's the dirty bit then I don't think there's an interrupt. And a polling thread does not sound attractive. I guess we'll need a new interface to notify VDPA that QEMU is looking for dirty logs, and then VDPA can send them to QEMU in some way. Will probably be good enough to support vendor specific logging interfaces, too. I don't actually have hardware which supports either so actually coding it up is not yet practical. Further, at my KVM forum presentaiton I proposed a virtio-specific pagefault handling interface. If there's a wish to standardize and implement that, let me know and I will try to write this up in a more formal way. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-03-09 7:38 ` Michael S. Tsirkin @ 2020-03-09 8:50 ` Jason Wang 2020-03-09 10:13 ` Michael S. Tsirkin 0 siblings, 1 reply; 24+ messages in thread From: Jason Wang @ 2020-03-09 8:50 UTC (permalink / raw) To: Michael S. Tsirkin, Rob Miller; +Cc: Virtio-Dev On 2020/3/9 下午3:38, Michael S. Tsirkin wrote: > On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote: >> I understand that DPT isn't really on the forefront of the vDPA framework, but >> wanted to understand if there any initial thoughts on how this would work... > And judging by the next few chapters, you are actually > talking about vhost pci, right? > >> In the migration framework, in its simplest form, (I gather) its QEMU via KVM >> that is reading the dirty page table, converting bits to page numbers, then >> flushing remote VM/copying local page(s)->remote VM, ect. >> >> While this is fine for a VM (say VM1) dirtying its own memory and the accesses >> are trapped in the kernel as well as the log is being updated, I'm not sure >> what happens in the situation of vhost, where a remote VM (say VM2) is dirtying >> up VM1's memory since it can directly access it, during packet reception for >> example. >> Whatever technique is employed to catch this, how would this differ from a HW >> based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU >> going to have a 2nd place to query the dirty logs - ie: the vDPA layer? > I don't think anyone has a good handle at the vhost pci migration yet. > But I think a reasonable way to handle that would be to > activate dirty tracking in VM2's QEMU. > > And then VM2's QEMU would periodically copy the bits to the log - does > this sound right? > >> Further I heard about a SW based DPT within the vDPA framework for those >> devices that do not (yet) support DPT inherently in HW. How is this envisioned >> to work? > What I am aware of is simply switching to a software virtio > for the duration of migration. The software can be pretty simple > since the formats match: just copy available entries to device ring, > and for used entries, see a used ring entry, mark page > dirty and then copy used entry to guest ring. That looks more heavyweight than e.g just relay used ring (as what dpdk did) I believe? > > > Another approach that I proposed and was prototyped at some point by > Alex Duyck is guest driver touching the page in question before > processing it within guest e.g. by an atomic xor with 0. > Sounds attractive but didn't perform all that well. Intel posted i40e software solution that traps queue tail/head write. But I'm not sure it's good enough. https://lore.kernel.org/kvm/20191206082232.GH31791@joy-OptiPlex-7040/ > > >> Finally, for those HW vendors that do support DPT in HW, a mapping of a bit -> >> page isn't really an option, since no one wants to do a byte wide >> read-modify-write across the PCI bus, but rather map a whole byte to page is >> likely more desirable - the HW can just do non-posted writes to the dirty page >> table. If byte wise, then the QEMU/vDPA layer has to either fix-up the mapping >> (from byte->bit) or have the capability to handle the granularity diffs. >> >> Thoughts? >> >> Rob Miller >> rob.miller@broadcom.com >> (919)721-3339 > If using an IOMMU, DPT can also be done using either PRI or dirty bit in > a PTE. PRI is an interrupt so it can kick off a thread to set bits in > the log I guess, but if it's the dirty bit then I don't think there's an > interrupt. And a polling thread does not sound attractive. I guess > we'll need a new interface to notify VDPA that QEMU is looking for dirty > logs, and then VDPA can send them to QEMU in some way. Will probably be > good enough to support vendor specific logging interfaces, too. I don't > actually have hardware which supports either so actually coding it up is > not yet practical. Yes, both PRI and PTE dirty bit requires special hardware support. We can extend vDPA API to support both. For page fault, probably just a IOMMU page fault handler. > > Further, at my KVM forum presentaiton I proposed a virtio-specific > pagefault handling interface. If there's a wish to standardize and > implement that, let me know and I will try to write this up in a more > formal way. Besides pagefault, if we want virito to be more like vhost, we need also formalize the device state feching. E.g per vq index etc. Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-03-09 8:50 ` Jason Wang @ 2020-03-09 10:13 ` Michael S. Tsirkin 2020-03-10 3:22 ` Jason Wang 0 siblings, 1 reply; 24+ messages in thread From: Michael S. Tsirkin @ 2020-03-09 10:13 UTC (permalink / raw) To: Jason Wang; +Cc: Rob Miller, Virtio-Dev On Mon, Mar 09, 2020 at 04:50:43PM +0800, Jason Wang wrote: > > On 2020/3/9 下午3:38, Michael S. Tsirkin wrote: > > On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote: > > > I understand that DPT isn't really on the forefront of the vDPA framework, but > > > wanted to understand if there any initial thoughts on how this would work... > > And judging by the next few chapters, you are actually > > talking about vhost pci, right? > > > > > In the migration framework, in its simplest form, (I gather) its QEMU via KVM > > > that is reading the dirty page table, converting bits to page numbers, then > > > flushing remote VM/copying local page(s)->remote VM, ect. > > > > > > While this is fine for a VM (say VM1) dirtying its own memory and the accesses > > > are trapped in the kernel as well as the log is being updated, I'm not sure > > > what happens in the situation of vhost, where a remote VM (say VM2) is dirtying > > > up VM1's memory since it can directly access it, during packet reception for > > > example. > > > Whatever technique is employed to catch this, how would this differ from a HW > > > based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU > > > going to have a 2nd place to query the dirty logs - ie: the vDPA layer? > > I don't think anyone has a good handle at the vhost pci migration yet. > > But I think a reasonable way to handle that would be to > > activate dirty tracking in VM2's QEMU. > > > > And then VM2's QEMU would periodically copy the bits to the log - does > > this sound right? > > > > > Further I heard about a SW based DPT within the vDPA framework for those > > > devices that do not (yet) support DPT inherently in HW. How is this envisioned > > > to work? > > What I am aware of is simply switching to a software virtio > > for the duration of migration. The software can be pretty simple > > since the formats match: just copy available entries to device ring, > > and for used entries, see a used ring entry, mark page > > dirty and then copy used entry to guest ring. > > > That looks more heavyweight than e.g just relay used ring (as what dpdk did) > I believe? That works for used but not for the packed ring. > > > > > > > Another approach that I proposed and was prototyped at some point by > > Alex Duyck is guest driver touching the page in question before > > processing it within guest e.g. by an atomic xor with 0. > > Sounds attractive but didn't perform all that well. > > > Intel posted i40e software solution that traps queue tail/head write. But > I'm not sure it's good enough. > > https://lore.kernel.org/kvm/20191206082232.GH31791@joy-OptiPlex-7040/ DMA unmap time seems more generic to me. But again I suspect the main issue is the same - it's handled on the data path blocking packet RX until dirty tracking is handled. Hardware solutions by comparison queue writes and make progress, dirty page is handled by the migration CPU. > > > > > > > > Finally, for those HW vendors that do support DPT in HW, a mapping of a bit -> > > > page isn't really an option, since no one wants to do a byte wide > > > read-modify-write across the PCI bus, but rather map a whole byte to page is > > > likely more desirable - the HW can just do non-posted writes to the dirty page > > > table. If byte wise, then the QEMU/vDPA layer has to either fix-up the mapping > > > (from byte->bit) or have the capability to handle the granularity diffs. > > > > > > Thoughts? > > > > > > Rob Miller > > > rob.miller@broadcom.com > > > (919)721-3339 > > If using an IOMMU, DPT can also be done using either PRI or dirty bit in > > a PTE. PRI is an interrupt so it can kick off a thread to set bits in > > the log I guess, but if it's the dirty bit then I don't think there's an > > interrupt. And a polling thread does not sound attractive. I guess > > we'll need a new interface to notify VDPA that QEMU is looking for dirty > > logs, and then VDPA can send them to QEMU in some way. Will probably be > > good enough to support vendor specific logging interfaces, too. I don't > > actually have hardware which supports either so actually coding it up is > > not yet practical. > > > Yes, both PRI and PTE dirty bit requires special hardware support. We can > extend vDPA API to support both. For page fault, probably just a IOMMU page > fault handler. > > > > > > Further, at my KVM forum presentaiton I proposed a virtio-specific > > pagefault handling interface. If there's a wish to standardize and > > implement that, let me know and I will try to write this up in a more > > formal way. > > > Besides pagefault, if we want virito to be more like vhost, we need also > formalize the device state feching. E.g per vq index etc. > > Thanks Yes that would clearly be in-scope for the spec. I would not start with a guest/host interface even. I would start by just listing what the state that needs to be migrated is, for each device. And it would also be useful to list, for each device, how to make two devices compatible migration wise. We can do that in a non-normative section. Again the big blocker here is lack of manpower. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-03-09 10:13 ` Michael S. Tsirkin @ 2020-03-10 3:22 ` Jason Wang 2020-03-10 6:24 ` Michael S. Tsirkin 0 siblings, 1 reply; 24+ messages in thread From: Jason Wang @ 2020-03-10 3:22 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Rob Miller, Virtio-Dev On 2020/3/9 下午6:13, Michael S. Tsirkin wrote: > On Mon, Mar 09, 2020 at 04:50:43PM +0800, Jason Wang wrote: >> On 2020/3/9 下午3:38, Michael S. Tsirkin wrote: >>> On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote: >>>> I understand that DPT isn't really on the forefront of the vDPA framework, but >>>> wanted to understand if there any initial thoughts on how this would work... >>> And judging by the next few chapters, you are actually >>> talking about vhost pci, right? >>> >>>> In the migration framework, in its simplest form, (I gather) its QEMU via KVM >>>> that is reading the dirty page table, converting bits to page numbers, then >>>> flushing remote VM/copying local page(s)->remote VM, ect. >>>> >>>> While this is fine for a VM (say VM1) dirtying its own memory and the accesses >>>> are trapped in the kernel as well as the log is being updated, I'm not sure >>>> what happens in the situation of vhost, where a remote VM (say VM2) is dirtying >>>> up VM1's memory since it can directly access it, during packet reception for >>>> example. >>>> Whatever technique is employed to catch this, how would this differ from a HW >>>> based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU >>>> going to have a 2nd place to query the dirty logs - ie: the vDPA layer? >>> I don't think anyone has a good handle at the vhost pci migration yet. >>> But I think a reasonable way to handle that would be to >>> activate dirty tracking in VM2's QEMU. >>> >>> And then VM2's QEMU would periodically copy the bits to the log - does >>> this sound right? >>> >>>> Further I heard about a SW based DPT within the vDPA framework for those >>>> devices that do not (yet) support DPT inherently in HW. How is this envisioned >>>> to work? >>> What I am aware of is simply switching to a software virtio >>> for the duration of migration. The software can be pretty simple >>> since the formats match: just copy available entries to device ring, >>> and for used entries, see a used ring entry, mark page >>> dirty and then copy used entry to guest ring. >> >> That looks more heavyweight than e.g just relay used ring (as what dpdk did) >> I believe? > That works for used but not for the packed ring. For packed ring, we can relay the descriptor ring? > >>> >>> Another approach that I proposed and was prototyped at some point by >>> Alex Duyck is guest driver touching the page in question before >>> processing it within guest e.g. by an atomic xor with 0. >>> Sounds attractive but didn't perform all that well. >> >> Intel posted i40e software solution that traps queue tail/head write. But >> I'm not sure it's good enough. >> >> https://lore.kernel.org/kvm/20191206082232.GH31791@joy-OptiPlex-7040/ > > DMA unmap time seems more generic to me. But again I suspect > the main issue is the same - it's handled on the data path > blocking packet RX until dirty tracking is handled. > > Hardware solutions by comparison queue writes and make > progress, dirty page is handled by the migration CPU. > > >>> >>>> Finally, for those HW vendors that do support DPT in HW, a mapping of a bit -> >>>> page isn't really an option, since no one wants to do a byte wide >>>> read-modify-write across the PCI bus, but rather map a whole byte to page is >>>> likely more desirable - the HW can just do non-posted writes to the dirty page >>>> table. If byte wise, then the QEMU/vDPA layer has to either fix-up the mapping >>>> (from byte->bit) or have the capability to handle the granularity diffs. >>>> >>>> Thoughts? >>>> >>>> Rob Miller >>>> rob.miller@broadcom.com >>>> (919)721-3339 >>> If using an IOMMU, DPT can also be done using either PRI or dirty bit in >>> a PTE. PRI is an interrupt so it can kick off a thread to set bits in >>> the log I guess, but if it's the dirty bit then I don't think there's an >>> interrupt. And a polling thread does not sound attractive. I guess >>> we'll need a new interface to notify VDPA that QEMU is looking for dirty >>> logs, and then VDPA can send them to QEMU in some way. Will probably be >>> good enough to support vendor specific logging interfaces, too. I don't >>> actually have hardware which supports either so actually coding it up is >>> not yet practical. >> >> Yes, both PRI and PTE dirty bit requires special hardware support. We can >> extend vDPA API to support both. For page fault, probably just a IOMMU page >> fault handler. >> >> >>> Further, at my KVM forum presentaiton I proposed a virtio-specific >>> pagefault handling interface. If there's a wish to standardize and >>> implement that, let me know and I will try to write this up in a more >>> formal way. >> >> Besides pagefault, if we want virito to be more like vhost, we need also >> formalize the device state feching. E.g per vq index etc. >> >> Thanks > Yes that would clearly be in-scope for the spec. I would not start > with a guest/host interface even. I would start by just listing what > the state that needs to be migrated is, for each device. And it would > also be useful to list, for each device, how to make two devices > compatible migration wise. We can do that in a non-normative section. > Again the big blocker here is lack of manpower. Yes. Thanks > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-03-10 3:22 ` Jason Wang @ 2020-03-10 6:24 ` Michael S. Tsirkin 2020-03-10 6:39 ` Jason Wang 0 siblings, 1 reply; 24+ messages in thread From: Michael S. Tsirkin @ 2020-03-10 6:24 UTC (permalink / raw) To: Jason Wang; +Cc: Rob Miller, Virtio-Dev On Tue, Mar 10, 2020 at 11:22:00AM +0800, Jason Wang wrote: > > On 2020/3/9 下午6:13, Michael S. Tsirkin wrote: > > On Mon, Mar 09, 2020 at 04:50:43PM +0800, Jason Wang wrote: > > > On 2020/3/9 下午3:38, Michael S. Tsirkin wrote: > > > > On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote: > > > > > I understand that DPT isn't really on the forefront of the vDPA framework, but > > > > > wanted to understand if there any initial thoughts on how this would work... > > > > And judging by the next few chapters, you are actually > > > > talking about vhost pci, right? > > > > > > > > > In the migration framework, in its simplest form, (I gather) its QEMU via KVM > > > > > that is reading the dirty page table, converting bits to page numbers, then > > > > > flushing remote VM/copying local page(s)->remote VM, ect. > > > > > > > > > > While this is fine for a VM (say VM1) dirtying its own memory and the accesses > > > > > are trapped in the kernel as well as the log is being updated, I'm not sure > > > > > what happens in the situation of vhost, where a remote VM (say VM2) is dirtying > > > > > up VM1's memory since it can directly access it, during packet reception for > > > > > example. > > > > > Whatever technique is employed to catch this, how would this differ from a HW > > > > > based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU > > > > > going to have a 2nd place to query the dirty logs - ie: the vDPA layer? > > > > I don't think anyone has a good handle at the vhost pci migration yet. > > > > But I think a reasonable way to handle that would be to > > > > activate dirty tracking in VM2's QEMU. > > > > > > > > And then VM2's QEMU would periodically copy the bits to the log - does > > > > this sound right? > > > > > > > > > Further I heard about a SW based DPT within the vDPA framework for those > > > > > devices that do not (yet) support DPT inherently in HW. How is this envisioned > > > > > to work? > > > > What I am aware of is simply switching to a software virtio > > > > for the duration of migration. The software can be pretty simple > > > > since the formats match: just copy available entries to device ring, > > > > and for used entries, see a used ring entry, mark page > > > > dirty and then copy used entry to guest ring. > > > > > > That looks more heavyweight than e.g just relay used ring (as what dpdk did) > > > I believe? > > That works for used but not for the packed ring. > > > For packed ring, we can relay the descriptor ring? Yes, and thus one must relay both available and used descriptors. It's an interesting tradeoff. Packed ring at least was not designed with multiple actors in mind. If this becomes a thing (and that's a big if) it might make sense to support temporarily reporting used entries in a separate buffer, while migration is in progress. Also if doing this, it looks like we can then support used ring resize too, and thus it might also make sense to use this to support sharing a used ring between multiple available rings - this way a single CPU can handle multiple used rings efficiently. > > > > > > > > > > > Another approach that I proposed and was prototyped at some point by > > > > Alex Duyck is guest driver touching the page in question before > > > > processing it within guest e.g. by an atomic xor with 0. > > > > Sounds attractive but didn't perform all that well. > > > > > > Intel posted i40e software solution that traps queue tail/head write. But > > > I'm not sure it's good enough. > > > > > > https://lore.kernel.org/kvm/20191206082232.GH31791@joy-OptiPlex-7040/ > > > > DMA unmap time seems more generic to me. But again I suspect > > the main issue is the same - it's handled on the data path > > blocking packet RX until dirty tracking is handled. > > > > Hardware solutions by comparison queue writes and make > > progress, dirty page is handled by the migration CPU. > > > > > > > > > > > > > Finally, for those HW vendors that do support DPT in HW, a mapping of a bit -> > > > > > page isn't really an option, since no one wants to do a byte wide > > > > > read-modify-write across the PCI bus, but rather map a whole byte to page is > > > > > likely more desirable - the HW can just do non-posted writes to the dirty page > > > > > table. If byte wise, then the QEMU/vDPA layer has to either fix-up the mapping > > > > > (from byte->bit) or have the capability to handle the granularity diffs. > > > > > > > > > > Thoughts? > > > > > > > > > > Rob Miller > > > > > rob.miller@broadcom.com > > > > > (919)721-3339 > > > > If using an IOMMU, DPT can also be done using either PRI or dirty bit in > > > > a PTE. PRI is an interrupt so it can kick off a thread to set bits in > > > > the log I guess, but if it's the dirty bit then I don't think there's an > > > > interrupt. And a polling thread does not sound attractive. I guess > > > > we'll need a new interface to notify VDPA that QEMU is looking for dirty > > > > logs, and then VDPA can send them to QEMU in some way. Will probably be > > > > good enough to support vendor specific logging interfaces, too. I don't > > > > actually have hardware which supports either so actually coding it up is > > > > not yet practical. > > > > > > Yes, both PRI and PTE dirty bit requires special hardware support. We can > > > extend vDPA API to support both. For page fault, probably just a IOMMU page > > > fault handler. > > > > > > > > > > Further, at my KVM forum presentaiton I proposed a virtio-specific > > > > pagefault handling interface. If there's a wish to standardize and > > > > implement that, let me know and I will try to write this up in a more > > > > formal way. > > > > > > Besides pagefault, if we want virito to be more like vhost, we need also > > > formalize the device state feching. E.g per vq index etc. > > > > > > Thanks > > Yes that would clearly be in-scope for the spec. I would not start > > with a guest/host interface even. I would start by just listing what > > the state that needs to be migrated is, for each device. And it would > > also be useful to list, for each device, how to make two devices > > compatible migration wise. We can do that in a non-normative section. > > Again the big blocker here is lack of manpower. > > > Yes. > > Thanks > > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-03-10 6:24 ` Michael S. Tsirkin @ 2020-03-10 6:39 ` Jason Wang 2020-03-18 15:13 ` Rob Miller 0 siblings, 1 reply; 24+ messages in thread From: Jason Wang @ 2020-03-10 6:39 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Rob Miller, Virtio-Dev On 2020/3/10 下午2:24, Michael S. Tsirkin wrote: > On Tue, Mar 10, 2020 at 11:22:00AM +0800, Jason Wang wrote: >> On 2020/3/9 下午6:13, Michael S. Tsirkin wrote: >>> On Mon, Mar 09, 2020 at 04:50:43PM +0800, Jason Wang wrote: >>>> On 2020/3/9 下午3:38, Michael S. Tsirkin wrote: >>>>> On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote: >>>>>> I understand that DPT isn't really on the forefront of the vDPA framework, but >>>>>> wanted to understand if there any initial thoughts on how this would work... >>>>> And judging by the next few chapters, you are actually >>>>> talking about vhost pci, right? >>>>> >>>>>> In the migration framework, in its simplest form, (I gather) its QEMU via KVM >>>>>> that is reading the dirty page table, converting bits to page numbers, then >>>>>> flushing remote VM/copying local page(s)->remote VM, ect. >>>>>> >>>>>> While this is fine for a VM (say VM1) dirtying its own memory and the accesses >>>>>> are trapped in the kernel as well as the log is being updated, I'm not sure >>>>>> what happens in the situation of vhost, where a remote VM (say VM2) is dirtying >>>>>> up VM1's memory since it can directly access it, during packet reception for >>>>>> example. >>>>>> Whatever technique is employed to catch this, how would this differ from a HW >>>>>> based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU >>>>>> going to have a 2nd place to query the dirty logs - ie: the vDPA layer? >>>>> I don't think anyone has a good handle at the vhost pci migration yet. >>>>> But I think a reasonable way to handle that would be to >>>>> activate dirty tracking in VM2's QEMU. >>>>> >>>>> And then VM2's QEMU would periodically copy the bits to the log - does >>>>> this sound right? >>>>> >>>>>> Further I heard about a SW based DPT within the vDPA framework for those >>>>>> devices that do not (yet) support DPT inherently in HW. How is this envisioned >>>>>> to work? >>>>> What I am aware of is simply switching to a software virtio >>>>> for the duration of migration. The software can be pretty simple >>>>> since the formats match: just copy available entries to device ring, >>>>> and for used entries, see a used ring entry, mark page >>>>> dirty and then copy used entry to guest ring. >>>> That looks more heavyweight than e.g just relay used ring (as what dpdk did) >>>> I believe? >>> That works for used but not for the packed ring. >> For packed ring, we can relay the descriptor ring? > Yes, and thus one must relay both available and used descriptors. > Yes. > It's an interesting tradeoff. Packed ring at least was not designed > with multiple actors in mind. Yes. > If this becomes a thing (and that's a big if) it might make sense to > support temporarily reporting used entries in a separate buffer, while > migration is in progress. Also if doing this, it looks like we can then > support used ring resize too, and thus it might also make sense to use > this to support sharing a used ring between multiple available rings - > this way a single CPU can handle multiple used rings efficiently. Right, that's something similar to the two ring model I proposed in the past. Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-03-10 6:39 ` Jason Wang @ 2020-03-18 15:13 ` Rob Miller 2020-03-19 3:35 ` Jason Wang 2020-03-19 11:17 ` Paolo Bonzini 0 siblings, 2 replies; 24+ messages in thread From: Rob Miller @ 2020-03-18 15:13 UTC (permalink / raw) To: Virtio-Dev [-- Attachment #1: Type: text/plain, Size: 4680 bytes --] In trying to more fully understand DPT, I ran across an article regarding how Physical RAM works within QEMU and noticed the statement below. My current understanding, based upon the statement, is that DPT is automatic inside QEMU. I can understand that this scheme is not employed in all hypervisors, but i'm wondering if others, b/c of VM migration, do have a similar scheme. Dirty memory tracking When the guest CPU or device DMA stores to guest RAM this needs to be noticed by several users: 1. The live migration feature relies on tracking dirty memory pages so they can be resent if they change during live migration. 2. TCG relies on tracking self-modifying code so it can recompile changed instructions. 3. Graphics card emulation relies on tracking dirty video memory to redraw only scanlines that have changed. There are dirty memory bitmaps for each of these users in ram_list because dirty memory tracking can be enabled or disabled independently for each of these users. http://blog.vmsplice.net/2016/01/qemu-internals-how-guest-physical-ram.html Rob Miller rob.miller@broadcom.com (919)721-3339 On Tue, Mar 10, 2020 at 2:39 AM Jason Wang <jasowang@redhat.com> wrote: > > On 2020/3/10 下午2:24, Michael S. Tsirkin wrote: > > On Tue, Mar 10, 2020 at 11:22:00AM +0800, Jason Wang wrote: > >> On 2020/3/9 下午6:13, Michael S. Tsirkin wrote: > >>> On Mon, Mar 09, 2020 at 04:50:43PM +0800, Jason Wang wrote: > >>>> On 2020/3/9 下午3:38, Michael S. Tsirkin wrote: > >>>>> On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote: > >>>>>> I understand that DPT isn't really on the forefront of the vDPA > framework, but > >>>>>> wanted to understand if there any initial thoughts on how this > would work... > >>>>> And judging by the next few chapters, you are actually > >>>>> talking about vhost pci, right? > >>>>> > >>>>>> In the migration framework, in its simplest form, (I gather) its > QEMU via KVM > >>>>>> that is reading the dirty page table, converting bits to page > numbers, then > >>>>>> flushing remote VM/copying local page(s)->remote VM, ect. > >>>>>> > >>>>>> While this is fine for a VM (say VM1) dirtying its own memory and > the accesses > >>>>>> are trapped in the kernel as well as the log is being updated, I'm > not sure > >>>>>> what happens in the situation of vhost, where a remote VM (say VM2) > is dirtying > >>>>>> up VM1's memory since it can directly access it, during packet > reception for > >>>>>> example. > >>>>>> Whatever technique is employed to catch this, how would this differ > from a HW > >>>>>> based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? > Is QEMU > >>>>>> going to have a 2nd place to query the dirty logs - ie: the vDPA > layer? > >>>>> I don't think anyone has a good handle at the vhost pci migration > yet. > >>>>> But I think a reasonable way to handle that would be to > >>>>> activate dirty tracking in VM2's QEMU. > >>>>> > >>>>> And then VM2's QEMU would periodically copy the bits to the log - > does > >>>>> this sound right? > >>>>> > >>>>>> Further I heard about a SW based DPT within the vDPA framework for > those > >>>>>> devices that do not (yet) support DPT inherently in HW. How is this > envisioned > >>>>>> to work? > >>>>> What I am aware of is simply switching to a software virtio > >>>>> for the duration of migration. The software can be pretty simple > >>>>> since the formats match: just copy available entries to device ring, > >>>>> and for used entries, see a used ring entry, mark page > >>>>> dirty and then copy used entry to guest ring. > >>>> That looks more heavyweight than e.g just relay used ring (as what > dpdk did) > >>>> I believe? > >>> That works for used but not for the packed ring. > >> For packed ring, we can relay the descriptor ring? > > Yes, and thus one must relay both available and used descriptors. > > > > Yes. > > > > It's an interesting tradeoff. Packed ring at least was not designed > > with multiple actors in mind. > > > Yes. > > > > If this becomes a thing (and that's a big if) it might make sense to > > support temporarily reporting used entries in a separate buffer, while > > migration is in progress. Also if doing this, it looks like we can then > > support used ring resize too, and thus it might also make sense to use > > this to support sharing a used ring between multiple available rings - > > this way a single CPU can handle multiple used rings efficiently. > > > Right, that's something similar to the two ring model I proposed in the > past. > > Thanks > > [-- Attachment #2: Type: text/html, Size: 7417 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-03-18 15:13 ` Rob Miller @ 2020-03-19 3:35 ` Jason Wang 2020-03-19 11:17 ` Paolo Bonzini 1 sibling, 0 replies; 24+ messages in thread From: Jason Wang @ 2020-03-19 3:35 UTC (permalink / raw) To: Rob Miller, Virtio-Dev On 2020/3/18 下午11:13, Rob Miller wrote: > In trying to more fully understand DPT, I ran across an article > regarding how Physical RAM works within QEMU and noticed the statement > below. My current understanding, based upon the statement, is that DPT > is automatic inside QEMU. I can understand that this scheme is not > employed in all hypervisors, but i'm wondering if others, b/c of VM > migration, do have a similar scheme. > > > Dirty memory tracking > > When the guest CPU or device DMA stores to guest RAM this needs to be > noticed by several users: > > 1. The live migration feature relies on tracking dirty memory pages > so they can be resent if they change during live migration. > 2. TCG relies on tracking self-modifying code so it can recompile > changed instructions. > 3. Graphics card emulation relies on tracking dirty video memory to > redraw only scanlines that have changed. > > There are dirty memory bitmaps for each of these users in ram_list > because dirty memory tracking can be enabled or disabled independently > for each of these users. > > http://blog.vmsplice.net/2016/01/qemu-internals-how-guest-physical-ram.html > > > Rob Miller > rob.miller@broadcom.com <mailto:rob.miller@broadcom.com> > (919)721-3339 Hi Rob: My understanding is DPT is a must for all hypervisors that want to support live migration. For qemu, except for tracking dirty pages by itself, it can also syncs dirty pages from external users like: - KVM: which can write protect pages and track dirty page through #PF - vhost: which is a software virtio backend which can track the used ring and then know which page were modified - VFIO: the work of syncing dirty pages from hardware is ongoing. For vDPA, we have two ways do that: - pure software solution, qemu vhost-vdpa backend will take over the ring (used ring for split for example), then it can know which part of guest memory was modified by vDPA and report the dirty pages through qemu internal helpers. - hardware solution, when hardware support dirty page tracking, vDPA bus need to be extended to allow hardware to report dirty pages (bitmap or other), and qemu can sync them from vhost. Thanks > > > On Tue, Mar 10, 2020 at 2:39 AM Jason Wang <jasowang@redhat.com > <mailto:jasowang@redhat.com>> wrote: > > > On 2020/3/10 下午2:24, Michael S. Tsirkin wrote: > > On Tue, Mar 10, 2020 at 11:22:00AM +0800, Jason Wang wrote: > >> On 2020/3/9 下午6:13, Michael S. Tsirkin wrote: > >>> On Mon, Mar 09, 2020 at 04:50:43PM +0800, Jason Wang wrote: > >>>> On 2020/3/9 下午3:38, Michael S. Tsirkin wrote: > >>>>> On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote: > >>>>>> I understand that DPT isn't really on the forefront of the > vDPA framework, but > >>>>>> wanted to understand if there any initial thoughts on how > this would work... > >>>>> And judging by the next few chapters, you are actually > >>>>> talking about vhost pci, right? > >>>>> > >>>>>> In the migration framework, in its simplest form, (I > gather) its QEMU via KVM > >>>>>> that is reading the dirty page table, converting bits to > page numbers, then > >>>>>> flushing remote VM/copying local page(s)->remote VM, ect. > >>>>>> > >>>>>> While this is fine for a VM (say VM1) dirtying its own > memory and the accesses > >>>>>> are trapped in the kernel as well as the log is being > updated, I'm not sure > >>>>>> what happens in the situation of vhost, where a remote VM > (say VM2) is dirtying > >>>>>> up VM1's memory since it can directly access it, during > packet reception for > >>>>>> example. > >>>>>> Whatever technique is employed to catch this, how would > this differ from a HW > >>>>>> based Virtio device doing DMA directly into a VM's DDR, wrt > to DPT? Is QEMU > >>>>>> going to have a 2nd place to query the dirty logs - ie: the > vDPA layer? > >>>>> I don't think anyone has a good handle at the vhost pci > migration yet. > >>>>> But I think a reasonable way to handle that would be to > >>>>> activate dirty tracking in VM2's QEMU. > >>>>> > >>>>> And then VM2's QEMU would periodically copy the bits to the > log - does > >>>>> this sound right? > >>>>> > >>>>>> Further I heard about a SW based DPT within the vDPA > framework for those > >>>>>> devices that do not (yet) support DPT inherently in HW. How > is this envisioned > >>>>>> to work? > >>>>> What I am aware of is simply switching to a software virtio > >>>>> for the duration of migration. The software can be pretty simple > >>>>> since the formats match: just copy available entries to > device ring, > >>>>> and for used entries, see a used ring entry, mark page > >>>>> dirty and then copy used entry to guest ring. > >>>> That looks more heavyweight than e.g just relay used ring (as > what dpdk did) > >>>> I believe? > >>> That works for used but not for the packed ring. > >> For packed ring, we can relay the descriptor ring? > > Yes, and thus one must relay both available and used descriptors. > > > > Yes. > > > > It's an interesting tradeoff. Packed ring at least was not designed > > with multiple actors in mind. > > > Yes. > > > > If this becomes a thing (and that's a big if) it might make sense to > > support temporarily reporting used entries in a separate buffer, > while > > migration is in progress. Also if doing this, it looks like we > can then > > support used ring resize too, and thus it might also make sense > to use > > this to support sharing a used ring between multiple available > rings - > > this way a single CPU can handle multiple used rings efficiently. > > > Right, that's something similar to the two ring model I proposed > in the > past. > > Thanks > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-03-18 15:13 ` Rob Miller 2020-03-19 3:35 ` Jason Wang @ 2020-03-19 11:17 ` Paolo Bonzini 2020-04-07 9:52 ` Eugenio Perez Martin 1 sibling, 1 reply; 24+ messages in thread From: Paolo Bonzini @ 2020-03-19 11:17 UTC (permalink / raw) To: Rob Miller, Virtio-Dev The sentence below refers to emulated device DMA. When emulated devices inside QEMU perform DMA goes through functions that keep the dirty page bitmap up to date. Likewise for CPU emulation performed by QEMU, which is not an issue if you are using KVM or other hypervisors supported by QEMU. Whenever external code touches memory (which includes all the cases mentioned by Jason), it has to provide an interface for QEMU to read the dirty page bitmaps and synchronize them at appropriate points. Paolo On 18/03/20 16:13, Rob Miller wrote: > In trying to more fully understand DPT, I ran across an article > regarding how Physical RAM works within QEMU and noticed the statement > below. My current understanding, based upon the statement, is that DPT > is automatic inside QEMU. I can understand that this scheme is not > employed in all hypervisors, but i'm wondering if others, b/c of VM > migration, do have a similar scheme. > > > Dirty memory tracking > > When the guest CPU or device DMA stores to guest RAM this needs to be > noticed by several users: > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-03-19 11:17 ` Paolo Bonzini @ 2020-04-07 9:52 ` Eugenio Perez Martin 2020-04-07 10:27 ` Rob Miller ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Eugenio Perez Martin @ 2020-04-07 9:52 UTC (permalink / raw) To: Rob Miller Cc: Virtio-Dev, Paolo Bonzini, Jason Wang, Michael Tsirkin, Juan Quintela Hi! So, from the previous mails, it seems that monitoring the used ring (and the packed descriptors) is a good first step in that direction, as DPDK did. This way, the device does not need to worry about the dirty page tracking using a bitmap and the PCI writes limitation, and we can evaluate later the proposed alternatives: * Alternate used descriptors in packed. * vDPA interface for vDPA devices in a convenient format. Any thoughts? Do you think that we should start with another way? Thanks! --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-04-07 9:52 ` Eugenio Perez Martin @ 2020-04-07 10:27 ` Rob Miller 2020-04-07 16:31 ` Eugenio Perez Martin 2020-04-07 10:40 ` Rob Miller 2020-04-09 21:06 ` Michael S. Tsirkin 2 siblings, 1 reply; 24+ messages in thread From: Rob Miller @ 2020-04-07 10:27 UTC (permalink / raw) To: Eugenio Perez Martin Cc: Virtio-Dev, Paolo Bonzini, Jason Wang, Michael Tsirkin, Juan Quintela [-- Attachment #1: Type: text/plain, Size: 979 bytes --] Does this mean that SW takes over the datapath during LM? If so, is there any infrastructure to "gracefully" do a hand-off from HW mode (pci device is managing the rings) to SW mode, in that when switching from HW->SW, HW is stalled & quiesced, then the SW takes from where HW left off? Rob Miller rob.miller@broadcom.com (919)721-3339 On Tue, Apr 7, 2020 at 5:53 AM Eugenio Perez Martin <eperezma@redhat.com> wrote: > Hi! > > So, from the previous mails, it seems that monitoring the used ring > (and the packed descriptors) is a good first step in that direction, > as DPDK did. This way, the device does not need to worry about the > dirty page tracking using a bitmap and the PCI writes limitation, and > we can evaluate later the proposed alternatives: > * Alternate used descriptors in packed. > * vDPA interface for vDPA devices in a convenient format. > > Any thoughts? Do you think that we should start with another way? > > Thanks! > > [-- Attachment #2: Type: text/html, Size: 1521 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-04-07 10:27 ` Rob Miller @ 2020-04-07 16:31 ` Eugenio Perez Martin 2020-04-08 10:10 ` Jason Wang 0 siblings, 1 reply; 24+ messages in thread From: Eugenio Perez Martin @ 2020-04-07 16:31 UTC (permalink / raw) To: Rob Miller Cc: Virtio-Dev, Paolo Bonzini, Jason Wang, Michael Tsirkin, Juan Quintela On Tue, Apr 7, 2020 at 12:28 PM Rob Miller <rob.miller@broadcom.com> wrote: > > Does this mean that SW takes over the datapath during LM? "Takes over" sounds to me like solving the problem using failover to switch to a software interface to do the packet forwarding during migration [1]. In this solution, only the used ring needs to be spied or intercepted to communicate qemu the memory regions modified. Not sure if this is what you had in mind. [1] https://www.dpdk.org/wp-content/uploads/sites/35/2019/10/VirtioNet.pdf > If so, is there any infrastructure to "gracefully" do a hand-off from HW mode (pci device is managing the rings) to SW mode, in that when switching from HW->SW, HW is stalled & quiesced, then the SW takes from where HW left off? > > Rob Miller > rob.miller@broadcom.com > (919)721-3339 > > > On Tue, Apr 7, 2020 at 5:53 AM Eugenio Perez Martin <eperezma@redhat.com> wrote: >> >> Hi! >> >> So, from the previous mails, it seems that monitoring the used ring >> (and the packed descriptors) is a good first step in that direction, >> as DPDK did. This way, the device does not need to worry about the >> dirty page tracking using a bitmap and the PCI writes limitation, and >> we can evaluate later the proposed alternatives: >> * Alternate used descriptors in packed. >> * vDPA interface for vDPA devices in a convenient format. >> >> Any thoughts? Do you think that we should start with another way? >> >> Thanks! >> --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-04-07 16:31 ` Eugenio Perez Martin @ 2020-04-08 10:10 ` Jason Wang 0 siblings, 0 replies; 24+ messages in thread From: Jason Wang @ 2020-04-08 10:10 UTC (permalink / raw) To: Eugenio Perez Martin Cc: Rob Miller, Virtio-Dev, Paolo Bonzini, Michael Tsirkin, Juan Quintela ----- Original Message ----- > On Tue, Apr 7, 2020 at 12:28 PM Rob Miller <rob.miller@broadcom.com> wrote: > > > > Does this mean that SW takes over the datapath during LM? > > "Takes over" sounds to me like solving the problem using failover to > switch to a software interface to do the packet forwarding during > migration [1]. In this solution, only the used ring needs to be spied > or intercepted to communicate qemu the memory regions modified. > > Not sure if this is what you had in mind. > > [1] https://www.dpdk.org/wp-content/uploads/sites/35/2019/10/VirtioNet.pdf Yes, so my understanding is, there's two way for doing software assisted live migration: 1) Switch to a full software datapath. I think this is what you meant here, this work but may get more performance degradation. 2) Used ring relay, this means qemu will only take over the used ring, this means when migration start, qemu will teach hardware to use another (mediated) used ring, then qemu can inspect it and log the dirty page from there, and relay the content to the used ring used by guest. This can have better performance. DPDK choose to use method 2). You may refer https://www.dpdk.org/wp-content/uploads/sites/35/2018/12/XiaoWang-DPDK-US-Summit-SW-assisted-VDPA-for-LM-v2.pdf > > > If so, is there any infrastructure to "gracefully" do a hand-off from HW > > mode (pci device is managing the rings) to SW mode, in that when switching > > from HW->SW, HW is stalled & quiesced, then the SW takes from where HW > > left off? Yes, for both methods, it requires a stop & start the device. And it also means the device should support the recovery of virtqueue state. E.g it supports a last_avail_idx set by driver, then when the device is started, it can try to read avail ring index start at last_avail_idx. This is why vDPA bus support set_vq_state(). Thanks > > > > Rob Miller > > rob.miller@broadcom.com > > (919)721-3339 > > > > > > On Tue, Apr 7, 2020 at 5:53 AM Eugenio Perez Martin <eperezma@redhat.com> > > wrote: > >> > >> Hi! > >> > >> So, from the previous mails, it seems that monitoring the used ring > >> (and the packed descriptors) is a good first step in that direction, > >> as DPDK did. This way, the device does not need to worry about the > >> dirty page tracking using a bitmap and the PCI writes limitation, and > >> we can evaluate later the proposed alternatives: > >> * Alternate used descriptors in packed. > >> * vDPA interface for vDPA devices in a convenient format. > >> > >> Any thoughts? Do you think that we should start with another way? > >> > >> Thanks! > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-04-07 9:52 ` Eugenio Perez Martin 2020-04-07 10:27 ` Rob Miller @ 2020-04-07 10:40 ` Rob Miller 2020-04-08 10:00 ` Jason Wang 2020-04-09 21:06 ` Michael S. Tsirkin 2 siblings, 1 reply; 24+ messages in thread From: Rob Miller @ 2020-04-07 10:40 UTC (permalink / raw) To: Eugenio Perez Martin Cc: Virtio-Dev, Paolo Bonzini, Jason Wang, Michael Tsirkin, Juan Quintela [-- Attachment #1: Type: text/plain, Size: 920 bytes --] another question on vDPA vs vendor specific driver portion... Are the subsystem vendor & device IDs to be different from the primary (Red Hat) versions as there has to be a way for a vendor specific driver to "see" its device. Rob Miller rob.miller@broadcom.com (919)721-3339 On Tue, Apr 7, 2020 at 5:53 AM Eugenio Perez Martin <eperezma@redhat.com> wrote: > Hi! > > So, from the previous mails, it seems that monitoring the used ring > (and the packed descriptors) is a good first step in that direction, > as DPDK did. This way, the device does not need to worry about the > dirty page tracking using a bitmap and the PCI writes limitation, and > we can evaluate later the proposed alternatives: > * Alternate used descriptors in packed. > * vDPA interface for vDPA devices in a convenient format. > > Any thoughts? Do you think that we should start with another way? > > Thanks! > > [-- Attachment #2: Type: text/html, Size: 1471 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-04-07 10:40 ` Rob Miller @ 2020-04-08 10:00 ` Jason Wang 0 siblings, 0 replies; 24+ messages in thread From: Jason Wang @ 2020-04-08 10:00 UTC (permalink / raw) To: Rob Miller Cc: Eugenio Perez Martin, Virtio-Dev, Paolo Bonzini, Michael Tsirkin, Juan Quintela ----- Original Message ----- > another question on vDPA vs vendor specific driver portion... > > Are the subsystem vendor & device IDs to be different from the primary (Red > Hat) versions as there has to be a way for a vendor specific driver to > "see" its device. Yes, any kinds of (PCI)device could be registered to the vDPA bus. For PCI driver, it supports exact mathing based on subsystem ID. E.g in IFCVF driver it does: static struct pci_device_id ifcvf_pci_ids[] = { { PCI_DEVICE_SUB(IFCVF_VENDOR_ID, IFCVF_DEVICE_ID, IFCVF_SUBSYS_VENDOR_ID, IFCVF_SUBSYS_DEVICE_ID) }, { 0 }, }; Which uses Redhat primary vendor ID but their own sybsysm vendor/device ID. Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-04-07 9:52 ` Eugenio Perez Martin 2020-04-07 10:27 ` Rob Miller 2020-04-07 10:40 ` Rob Miller @ 2020-04-09 21:06 ` Michael S. Tsirkin 2020-04-10 2:40 ` Jason Wang 2 siblings, 1 reply; 24+ messages in thread From: Michael S. Tsirkin @ 2020-04-09 21:06 UTC (permalink / raw) To: Eugenio Perez Martin Cc: Rob Miller, Virtio-Dev, Paolo Bonzini, Jason Wang, Juan Quintela On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin wrote: > Hi! > > So, from the previous mails, it seems that monitoring the used ring > (and the packed descriptors) is a good first step in that direction, > as DPDK did. This way, the device does not need to worry about the > dirty page tracking using a bitmap and the PCI writes limitation, and > we can evaluate later the proposed alternatives: > * Alternate used descriptors in packed. > * vDPA interface for vDPA devices in a convenient format. > > Any thoughts? Do you think that we should start with another way? > > Thanks! I am concerned that with software in data path, we'll hit RX queue underruns, won't we? Two ways to avoid underruns: - dirty page tracking - page faults I'm working on a proposal for page faults now. If someone wants to work on dirty tracking in addition, that's also an option. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-04-09 21:06 ` Michael S. Tsirkin @ 2020-04-10 2:40 ` Jason Wang 2020-04-13 12:15 ` Eugenio Perez Martin 0 siblings, 1 reply; 24+ messages in thread From: Jason Wang @ 2020-04-10 2:40 UTC (permalink / raw) To: Michael S. Tsirkin, Eugenio Perez Martin Cc: Rob Miller, Virtio-Dev, Paolo Bonzini, Juan Quintela On 2020/4/10 上午5:06, Michael S. Tsirkin wrote: > On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin wrote: >> Hi! >> >> So, from the previous mails, it seems that monitoring the used ring >> (and the packed descriptors) is a good first step in that direction, >> as DPDK did. This way, the device does not need to worry about the >> dirty page tracking using a bitmap and the PCI writes limitation, and >> we can evaluate later the proposed alternatives: >> * Alternate used descriptors in packed. >> * vDPA interface for vDPA devices in a convenient format. >> >> Any thoughts? Do you think that we should start with another way? >> >> Thanks! > I am concerned that with software in data path, we'll hit RX queue > underruns, won't we? Do you mean it will lose some performance? If yes, I think so. > Two ways to avoid underruns: > - dirty page tracking > - page faults It looks to me this will lead even worse performance than software path? There will be lots of page faults during RX. Another direction is to track dirty pages via IOMMU. E.g recent Intel IOMMU has EA and D bit which could be used for tracking pages wrote by devices but not CPU. > > I'm working on a proposal for page faults now. I guess it's better to have a transport independent method. > If someone wants > to work on dirty tracking in addition, that's also an option. I remember Rob mention some challenges of implementing dirty bitmap, I wonder something like queue based interface would be better (similar to Peter did for KVM)? Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-04-10 2:40 ` Jason Wang @ 2020-04-13 12:15 ` Eugenio Perez Martin 2020-04-13 13:30 ` Rob Miller 2020-04-13 13:55 ` Jason Wang 0 siblings, 2 replies; 24+ messages in thread From: Eugenio Perez Martin @ 2020-04-13 12:15 UTC (permalink / raw) To: Jason Wang Cc: Michael S. Tsirkin, Rob Miller, Virtio-Dev, Paolo Bonzini, Juan Quintela On Fri, Apr 10, 2020 at 4:40 AM Jason Wang <jasowang@redhat.com> wrote: > > > On 2020/4/10 上午5:06, Michael S. Tsirkin wrote: > > On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin wrote: > >> Hi! > >> > >> So, from the previous mails, it seems that monitoring the used ring > >> (and the packed descriptors) is a good first step in that direction, > >> as DPDK did. This way, the device does not need to worry about the > >> dirty page tracking using a bitmap and the PCI writes limitation, and > >> we can evaluate later the proposed alternatives: > >> * Alternate used descriptors in packed. > >> * vDPA interface for vDPA devices in a convenient format. > >> > >> Any thoughts? Do you think that we should start with another way? > >> > >> Thanks! > > I am concerned that with software in data path, we'll hit RX queue > > underruns, won't we? > > > Do you mean it will lose some performance? If yes, I think so. > > > > Two ways to avoid underruns: > > - dirty page tracking > > - page faults > > > It looks to me this will lead even worse performance than software path? > There will be lots of page faults during RX. > > Another direction is to track dirty pages via IOMMU. E.g recent Intel > IOMMU has EA and D bit which could be used for tracking pages wrote by > devices but not CPU. > So this could be added on top of the dirty tracking mechanism, isn't? or would it be easier to start another-way around, and to start using modern IOMMU and then extend to old generic code? > > > > > I'm working on a proposal for page faults now. > > > I guess it's better to have a transport independent method. > > > > If someone wants > > to work on dirty tracking in addition, that's also an option. > > > I remember Rob mention some challenges of implementing dirty bitmap, I > wonder something like queue based interface would be better (similar to > Peter did for KVM)? > I think that the main challenge was to write bits instead of writes using PCI bus. From a conversation with Juan, another solution could be to do a byte map DPT, where a byte represents a page, not a bit. While I find this solution simpler, I'm not sure about the performance implications of track this way (memory/TLB pressure, etc). Rob and Juan, please correct me if I'm wrong or missed something. Compared to the used ring interposition, much more logic needs to be in the device driver for both alternatives (byte mapping and queue), isn't it? > Thanks > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-04-13 12:15 ` Eugenio Perez Martin @ 2020-04-13 13:30 ` Rob Miller 2020-04-13 13:49 ` Jason Wang 2020-04-13 13:49 ` Jason Wang 2020-04-13 13:55 ` Jason Wang 1 sibling, 2 replies; 24+ messages in thread From: Rob Miller @ 2020-04-13 13:30 UTC (permalink / raw) To: Virtio-Dev [-- Attachment #1: Type: text/plain, Size: 3429 bytes --] On Mon, Apr 13, 2020 at 8:16 AM Eugenio Perez Martin <eperezma@redhat.com> wrote: > On Fri, Apr 10, 2020 at 4:40 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > On 2020/4/10 上午5:06, Michael S. Tsirkin wrote: > > > On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin wrote: > > >> Hi! > > >> > > >> So, from the previous mails, it seems that monitoring the used ring > > >> (and the packed descriptors) is a good first step in that direction, > > >> as DPDK did. This way, the device does not need to worry about the > > >> dirty page tracking using a bitmap and the PCI writes limitation, and > > >> we can evaluate later the proposed alternatives: > > >> * Alternate used descriptors in packed. > > >> * vDPA interface for vDPA devices in a convenient format. > > >> > > >> Any thoughts? Do you think that we should start with another way? > > >> > > >> Thanks! > > > I am concerned that with software in data path, we'll hit RX queue > > > underruns, won't we? > > > > > > Do you mean it will lose some performance? If yes, I think so. > > > > > > > Two ways to avoid underruns: > > > - dirty page tracking > > > - page faults > > > > > > It looks to me this will lead even worse performance than software path? > > There will be lots of page faults during RX. > > > > Another direction is to track dirty pages via IOMMU. E.g recent Intel > > IOMMU has EA and D bit which could be used for tracking pages wrote by > > devices but not CPU. > > > > So this could be added on top of the dirty tracking mechanism, isn't? > or would it be easier to start another-way around, and to start using > modern IOMMU and then extend to old generic code? > > > > > > > > > I'm working on a proposal for page faults now. > > > > > > I guess it's better to have a transport independent method. > > > > > > > If someone wants > > > to work on dirty tracking in addition, that's also an option. > > > > > > I remember Rob mention some challenges of implementing dirty bitmap, I > > wonder something like queue based interface would be better (similar to > > Peter did for KVM)? > > > > I think that the main challenge was to write bits instead of writes > using PCI bus. > > From a conversation with Juan, another solution could be to do a byte > map DPT, where a byte represents a page, not a bit. While I find this > solution simpler, I'm not sure about the performance implications of > track this way (memory/TLB pressure, etc). Rob and Juan, please > correct me if I'm wrong or missed something. > RJM>] You captured it correctly. We (Broadcom) discussed the idea of using a byte (being 0xFF) which effectively indicate that 64KB were dirty, this would cause potentially refreshing pages to the remote that weren't needed to be refreshed, slowing down the migration. I like the idea of mapping a byte to a page, and teach QEMU (or vDPA, or ...) how to interpret. > > Compared to the used ring interposition, much more logic needs to be > in the device driver for both alternatives (byte mapping and queue), > isn't it? > RJM> Yes, but this would (should) lead to best performance. > > > Thanks > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > [-- Attachment #2: Type: text/html, Size: 4745 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-04-13 13:30 ` Rob Miller @ 2020-04-13 13:49 ` Jason Wang 2020-04-13 13:49 ` Jason Wang 1 sibling, 0 replies; 24+ messages in thread From: Jason Wang @ 2020-04-13 13:49 UTC (permalink / raw) To: Rob Miller, Virtio-Dev On 2020/4/13 下午9:30, Rob Miller wrote: > > > On Mon, Apr 13, 2020 at 8:16 AM Eugenio Perez Martin > <eperezma@redhat.com <mailto:eperezma@redhat.com>> wrote: > > On Fri, Apr 10, 2020 at 4:40 AM Jason Wang <jasowang@redhat.com > <mailto:jasowang@redhat.com>> wrote: > > > > > > On 2020/4/10 上午5:06, Michael S. Tsirkin wrote: > > > On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin > wrote: > > >> Hi! > > >> > > >> So, from the previous mails, it seems that monitoring the > used ring > > >> (and the packed descriptors) is a good first step in that > direction, > > >> as DPDK did. This way, the device does not need to worry > about the > > >> dirty page tracking using a bitmap and the PCI writes > limitation, and > > >> we can evaluate later the proposed alternatives: > > >> * Alternate used descriptors in packed. > > >> * vDPA interface for vDPA devices in a convenient format. > > >> > > >> Any thoughts? Do you think that we should start with another way? > > >> > > >> Thanks! > > > I am concerned that with software in data path, we'll hit RX queue > > > underruns, won't we? > > > > > > Do you mean it will lose some performance? If yes, I think so. > > > > > > > Two ways to avoid underruns: > > > - dirty page tracking > > > - page faults > > > > > > It looks to me this will lead even worse performance than > software path? > > There will be lots of page faults during RX. > > > > Another direction is to track dirty pages via IOMMU. E.g recent > Intel > > IOMMU has EA and D bit which could be used for tracking pages > wrote by > > devices but not CPU. > > > > So this could be added on top of the dirty tracking mechanism, isn't? > or would it be easier to start another-way around, and to start using > modern IOMMU and then extend to old generic code? > > > > > > > > > I'm working on a proposal for page faults now. > > > > > > I guess it's better to have a transport independent method. > > > > > > > If someone wants > > > to work on dirty tracking in addition, that's also an option. > > > > > > I remember Rob mention some challenges of implementing dirty > bitmap, I > > wonder something like queue based interface would be better > (similar to > > Peter did for KVM)? > > > > I think that the main challenge was to write bits instead of writes > using PCI bus. > > From a conversation with Juan, another solution could be to do a byte > map DPT, where a byte represents a page, not a bit. While I find this > solution simpler, I'm not sure about the performance implications of > track this way (memory/TLB pressure, etc). Rob and Juan, please > correct me if I'm wrong or missed something. > It would lead large working set, but the actual impact may require benchmark. > RJM>] You captured it correctly. We (Broadcom) discussed the idea of > using a byte (being 0xFF) which effectively indicate that 64KB were dirty, > this would cause potentially refreshing pages to the remote that weren't > needed to be refreshed, slowing down the migration. I like the idea of > mapping > a byte to a page, and teach QEMU (or vDPA, or ...) how to interpret. That's a way to go, I think we can try to propose an API and move the discussion upstream. > > Compared to the used ring interposition, much more logic needs to be > in the device driver for both alternatives (byte mapping and queue), > isn't it? > > RJM> Yes, but this would (should) lead to best performance. I agree, I think we can start from software support and discuss the software path in parallel. Thanks > > > Thanks > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > virtio-dev-unsubscribe@lists.oasis-open.org > <mailto:virtio-dev-unsubscribe@lists.oasis-open.org> > For additional commands, e-mail: > virtio-dev-help@lists.oasis-open.org > <mailto:virtio-dev-help@lists.oasis-open.org> > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-04-13 13:30 ` Rob Miller 2020-04-13 13:49 ` Jason Wang @ 2020-04-13 13:49 ` Jason Wang 1 sibling, 0 replies; 24+ messages in thread From: Jason Wang @ 2020-04-13 13:49 UTC (permalink / raw) To: Rob Miller, Virtio-Dev On 2020/4/13 下午9:30, Rob Miller wrote: > > > On Mon, Apr 13, 2020 at 8:16 AM Eugenio Perez Martin > <eperezma@redhat.com <mailto:eperezma@redhat.com>> wrote: > > On Fri, Apr 10, 2020 at 4:40 AM Jason Wang <jasowang@redhat.com > <mailto:jasowang@redhat.com>> wrote: > > > > > > On 2020/4/10 上午5:06, Michael S. Tsirkin wrote: > > > On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin > wrote: > > >> Hi! > > >> > > >> So, from the previous mails, it seems that monitoring the > used ring > > >> (and the packed descriptors) is a good first step in that > direction, > > >> as DPDK did. This way, the device does not need to worry > about the > > >> dirty page tracking using a bitmap and the PCI writes > limitation, and > > >> we can evaluate later the proposed alternatives: > > >> * Alternate used descriptors in packed. > > >> * vDPA interface for vDPA devices in a convenient format. > > >> > > >> Any thoughts? Do you think that we should start with another way? > > >> > > >> Thanks! > > > I am concerned that with software in data path, we'll hit RX queue > > > underruns, won't we? > > > > > > Do you mean it will lose some performance? If yes, I think so. > > > > > > > Two ways to avoid underruns: > > > - dirty page tracking > > > - page faults > > > > > > It looks to me this will lead even worse performance than > software path? > > There will be lots of page faults during RX. > > > > Another direction is to track dirty pages via IOMMU. E.g recent > Intel > > IOMMU has EA and D bit which could be used for tracking pages > wrote by > > devices but not CPU. > > > > So this could be added on top of the dirty tracking mechanism, isn't? > or would it be easier to start another-way around, and to start using > modern IOMMU and then extend to old generic code? > > > > > > > > > I'm working on a proposal for page faults now. > > > > > > I guess it's better to have a transport independent method. > > > > > > > If someone wants > > > to work on dirty tracking in addition, that's also an option. > > > > > > I remember Rob mention some challenges of implementing dirty > bitmap, I > > wonder something like queue based interface would be better > (similar to > > Peter did for KVM)? > > > > I think that the main challenge was to write bits instead of writes > using PCI bus. > > From a conversation with Juan, another solution could be to do a byte > map DPT, where a byte represents a page, not a bit. While I find this > solution simpler, I'm not sure about the performance implications of > track this way (memory/TLB pressure, etc). Rob and Juan, please > correct me if I'm wrong or missed something. > It would lead large working set, but the actual impact may require benchmark. > RJM>] You captured it correctly. We (Broadcom) discussed the idea of > using a byte (being 0xFF) which effectively indicate that 64KB were dirty, > this would cause potentially refreshing pages to the remote that weren't > needed to be refreshed, slowing down the migration. I like the idea of > mapping > a byte to a page, and teach QEMU (or vDPA, or ...) how to interpret. That's a way to go, I think we can try to propose an API and move the discussion upstream. > > Compared to the used ring interposition, much more logic needs to be > in the device driver for both alternatives (byte mapping and queue), > isn't it? > > RJM> Yes, but this would (should) lead to best performance. I agree, I think we can start from software support and discuss the hardware support in parallel. Thanks > > > Thanks > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > virtio-dev-unsubscribe@lists.oasis-open.org > <mailto:virtio-dev-unsubscribe@lists.oasis-open.org> > For additional commands, e-mail: > virtio-dev-help@lists.oasis-open.org > <mailto:virtio-dev-help@lists.oasis-open.org> > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-04-13 12:15 ` Eugenio Perez Martin 2020-04-13 13:30 ` Rob Miller @ 2020-04-13 13:55 ` Jason Wang 2020-04-16 10:55 ` Eugenio Perez Martin 1 sibling, 1 reply; 24+ messages in thread From: Jason Wang @ 2020-04-13 13:55 UTC (permalink / raw) To: Eugenio Perez Martin Cc: Michael S. Tsirkin, Rob Miller, Virtio-Dev, Paolo Bonzini, Juan Quintela On 2020/4/13 下午8:15, Eugenio Perez Martin wrote: > On Fri, Apr 10, 2020 at 4:40 AM Jason Wang <jasowang@redhat.com> wrote: >> >> On 2020/4/10 上午5:06, Michael S. Tsirkin wrote: >>> On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin wrote: >>>> Hi! >>>> >>>> So, from the previous mails, it seems that monitoring the used ring >>>> (and the packed descriptors) is a good first step in that direction, >>>> as DPDK did. This way, the device does not need to worry about the >>>> dirty page tracking using a bitmap and the PCI writes limitation, and >>>> we can evaluate later the proposed alternatives: >>>> * Alternate used descriptors in packed. >>>> * vDPA interface for vDPA devices in a convenient format. >>>> >>>> Any thoughts? Do you think that we should start with another way? >>>> >>>> Thanks! >>> I am concerned that with software in data path, we'll hit RX queue >>> underruns, won't we? >> >> Do you mean it will lose some performance? If yes, I think so. >> >> >>> Two ways to avoid underruns: >>> - dirty page tracking >>> - page faults >> >> It looks to me this will lead even worse performance than software path? >> There will be lots of page faults during RX. >> >> Another direction is to track dirty pages via IOMMU. E.g recent Intel >> IOMMU has EA and D bit which could be used for tracking pages wrote by >> devices but not CPU. >> > So this could be added on top of the dirty tracking mechanism, isn't? Yes, it requires the support from IOMMU API and driver. > or would it be easier to start another-way around, and to start using > modern IOMMU and then extend to old generic code? If my understanding is correct, even for modern IOMMU it still require a lot of work. So we need to start from software support and hardware support first. > >>> I'm working on a proposal for page faults now. >> >> I guess it's better to have a transport independent method. >> >> >>> If someone wants >>> to work on dirty tracking in addition, that's also an option. >> >> I remember Rob mention some challenges of implementing dirty bitmap, I >> wonder something like queue based interface would be better (similar to >> Peter did for KVM)? >> > I think that the main challenge was to write bits instead of writes > using PCI bus. Yes, the ring interface will work one the descriptor which would be several bytes instead of bits. Thanks > > From a conversation with Juan, another solution could be to do a byte > map DPT, where a byte represents a page, not a bit. While I find this > solution simpler, I'm not sure about the performance implications of > track this way (memory/TLB pressure, etc). Rob and Juan, please > correct me if I'm wrong or missed something. > > Compared to the used ring interposition, much more logic needs to be > in the device driver for both alternatives (byte mapping and queue), > isn't it? > >> Thanks >> --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [virtio-dev] Dirty Page Tracking (DPT) 2020-04-13 13:55 ` Jason Wang @ 2020-04-16 10:55 ` Eugenio Perez Martin 0 siblings, 0 replies; 24+ messages in thread From: Eugenio Perez Martin @ 2020-04-16 10:55 UTC (permalink / raw) To: Virtio-Dev Cc: Michael S. Tsirkin, Rob Miller, Paolo Bonzini, Juan Quintela, Jason Wang Hi everyone. As proposed in the previous virtio-networking Meeting, I summarized in this document the different proposed requirements/architectures about vDPA live migration, and a starting draft about the proposed actions. Feedback is welcome. Thanks! https://docs.google.com/document/d/1-2kxRxce2CwttfsZMMoqHjyI_c-o_ZjdaHN9JHxja4M/edit?usp=sharing On Mon, Apr 13, 2020 at 3:55 PM Jason Wang <jasowang@redhat.com> wrote: > > > On 2020/4/13 下午8:15, Eugenio Perez Martin wrote: > > On Fri, Apr 10, 2020 at 4:40 AM Jason Wang <jasowang@redhat.com> wrote: > >> > >> On 2020/4/10 上午5:06, Michael S. Tsirkin wrote: > >>> On Tue, Apr 07, 2020 at 11:52:46AM +0200, Eugenio Perez Martin wrote: > >>>> Hi! > >>>> > >>>> So, from the previous mails, it seems that monitoring the used ring > >>>> (and the packed descriptors) is a good first step in that direction, > >>>> as DPDK did. This way, the device does not need to worry about the > >>>> dirty page tracking using a bitmap and the PCI writes limitation, and > >>>> we can evaluate later the proposed alternatives: > >>>> * Alternate used descriptors in packed. > >>>> * vDPA interface for vDPA devices in a convenient format. > >>>> > >>>> Any thoughts? Do you think that we should start with another way? > >>>> > >>>> Thanks! > >>> I am concerned that with software in data path, we'll hit RX queue > >>> underruns, won't we? > >> > >> Do you mean it will lose some performance? If yes, I think so. > >> > >> > >>> Two ways to avoid underruns: > >>> - dirty page tracking > >>> - page faults > >> > >> It looks to me this will lead even worse performance than software path? > >> There will be lots of page faults during RX. > >> > >> Another direction is to track dirty pages via IOMMU. E.g recent Intel > >> IOMMU has EA and D bit which could be used for tracking pages wrote by > >> devices but not CPU. > >> > > So this could be added on top of the dirty tracking mechanism, isn't? > > > Yes, it requires the support from IOMMU API and driver. > > > > or would it be easier to start another-way around, and to start using > > modern IOMMU and then extend to old generic code? > > > If my understanding is correct, even for modern IOMMU it still require a > lot of work. So we need to start from software support and hardware > support first. > > > > > >>> I'm working on a proposal for page faults now. > >> > >> I guess it's better to have a transport independent method. > >> > >> > >>> If someone wants > >>> to work on dirty tracking in addition, that's also an option. > >> > >> I remember Rob mention some challenges of implementing dirty bitmap, I > >> wonder something like queue based interface would be better (similar to > >> Peter did for KVM)? > >> > > I think that the main challenge was to write bits instead of writes > > using PCI bus. > > > Yes, the ring interface will work one the descriptor which would be > several bytes instead of bits. > > Thanks > > > > > > From a conversation with Juan, another solution could be to do a byte > > map DPT, where a byte represents a page, not a bit. While I find this > > solution simpler, I'm not sure about the performance implications of > > track this way (memory/TLB pressure, etc). Rob and Juan, please > > correct me if I'm wrong or missed something. > > > > Compared to the used ring interposition, much more logic needs to be > > in the device driver for both alternatives (byte mapping and queue), > > isn't it? > > > >> Thanks > >> > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2020-04-16 10:55 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-03-06 15:40 [virtio-dev] Dirty Page Tracking (DPT) Rob Miller 2020-03-09 7:38 ` Michael S. Tsirkin 2020-03-09 8:50 ` Jason Wang 2020-03-09 10:13 ` Michael S. Tsirkin 2020-03-10 3:22 ` Jason Wang 2020-03-10 6:24 ` Michael S. Tsirkin 2020-03-10 6:39 ` Jason Wang 2020-03-18 15:13 ` Rob Miller 2020-03-19 3:35 ` Jason Wang 2020-03-19 11:17 ` Paolo Bonzini 2020-04-07 9:52 ` Eugenio Perez Martin 2020-04-07 10:27 ` Rob Miller 2020-04-07 16:31 ` Eugenio Perez Martin 2020-04-08 10:10 ` Jason Wang 2020-04-07 10:40 ` Rob Miller 2020-04-08 10:00 ` Jason Wang 2020-04-09 21:06 ` Michael S. Tsirkin 2020-04-10 2:40 ` Jason Wang 2020-04-13 12:15 ` Eugenio Perez Martin 2020-04-13 13:30 ` Rob Miller 2020-04-13 13:49 ` Jason Wang 2020-04-13 13:49 ` Jason Wang 2020-04-13 13:55 ` Jason Wang 2020-04-16 10:55 ` Eugenio Perez Martin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.