From: Marc Zyngier <maz@kernel.org>
To: Gavin Shan <gshan@redhat.com>
Cc: kvm@vger.kernel.org, catalin.marinas@arm.com,
andrew.jones@linux.dev, dmatlack@google.com, will@kernel.org,
shan.gavin@gmail.com, bgardon@google.com, kvmarm@lists.linux.dev,
pbonzini@redhat.com, zhenyzha@redhat.com, shuah@kernel.org,
kvmarm@lists.cs.columbia.edu, ajones@ventanamicro.com
Subject: Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
Date: Mon, 07 Nov 2022 09:45:53 +0000 [thread overview]
Message-ID: <864jvbqer2.wl-maz@kernel.org> (raw)
In-Reply-To: <0f685682-ba39-53d4-766c-cc2b44ad48dc@redhat.com>
On Sun, 06 Nov 2022 21:40:49 +0000,
Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Marc,
>
> On 11/6/22 11:43 PM, Marc Zyngier wrote:
> > On Fri, 04 Nov 2022 23:40:45 +0000,
> > Gavin Shan <gshan@redhat.com> wrote:
> >>
> >> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> >> enabled. It's conflicting with that ring-based dirty page tracking always
> >> requires a running VCPU context.
> >>
> >> Introduce a new flavor of dirty ring that requires the use of both VCPU
> >> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> >> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> >> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> >> the VM to the target.
> >>
> >> Use an additional capability to advertise this behavior. The newly added
> >> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> >> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> >> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> >>
> >> Suggested-by: Marc Zyngier <maz@kernel.org>
> >> Suggested-by: Peter Xu <peterx@redhat.com>
> >> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> >> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> >> Signed-off-by: Gavin Shan <gshan@redhat.com>
> >> Acked-by: Peter Xu <peterx@redhat.com>
> >> ---
> >> Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
> >> include/linux/kvm_dirty_ring.h | 7 +++++
> >> include/linux/kvm_host.h | 1 +
> >> include/uapi/linux/kvm.h | 1 +
> >> virt/kvm/Kconfig | 8 ++++++
> >> virt/kvm/dirty_ring.c | 10 +++++++
> >> virt/kvm/kvm_main.c | 49 +++++++++++++++++++++++++++-------
> >> 7 files changed, 93 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> >> index eee9f857a986..2ec32bd41792 100644
> >> --- a/Documentation/virt/kvm/api.rst
> >> +++ b/Documentation/virt/kvm/api.rst
> >> @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one
> >> needs to kick the vcpu out of KVM_RUN using a signal. The resulting
> >> vmexit ensures that all dirty GFNs are flushed to the dirty rings.
> >> -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the
> >> corresponding
> >> -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
> >> -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG. After enabling
> >> -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
> >> -machine will switch to ring-buffer dirty page tracking and further
> >> -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
> >> -
> >> NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
> >> should be exposed by weakly ordered architecture, in order to indicate
> >> the additional memory ordering requirements imposed on userspace when
> >> @@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
> >> expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> >> to userspace.
> >> +After using the dirty rings, the userspace needs to detect the
> >> capability
> >
> > using? or enabling? What comes after suggest the latter.
> >
>
> s/using/enabling in next revision :)
>
> >> +of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
> >> +need to be backed by per-slot bitmaps. With this capability advertised
> >> +and supported, it means the architecture can dirty guest pages without
> >
> > If it is advertised, it is supported, right?
> >
>
> Yes, s/advertised and supported/advertised in next revision.
>
> >> +vcpu/ring context, so that some of the dirty information will still be
> >> +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> >> +can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> >> +has been enabled.
> >> +
> >> +Note that the bitmap here is only a backup of the ring structure, and
> >> +normally should only contain a very small amount of dirty pages, which
> >
> > I don't think we can claim this. It is whatever amount of memory is
> > dirtied outside of a vcpu context, and we shouldn't make any claim
> > regarding the number of dirty pages.
> >
>
> It's the pre-requisite to use the backup bitmap. Otherwise, the guest
> will experience long down-time during migration, as mentioned by Peter
> in another thread. So it's appropriate to mention the limit of dirty
> pages here.
See my alternative wording for this in the other sub-thread.
>
> >> +needs to be transferred during VM downtime. Collecting the dirty bitmap
> >> +should be the very last thing that the VMM does before transmitting state
> >> +to the target VM. VMM needs to ensure that the dirty state is final and
> >> +avoid missing dirty pages from another ioctl ordered after the bitmap
> >> +collection.
> >> +
> >> +To collect dirty bits in the backup bitmap, the userspace can use the
> >> +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> >> +and its behavior is undefined since collecting the dirty bitmap always
> >> +happens in the last phase of VM's migration.
> >
> > It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
> > you have multiple devices that dirty the memory, such as multiple
> > ITSs, why shouldn't userspace be allowed to snapshot the dirty state
> > multiple time? This doesn't seem like a reasonable restriction, and I
> > really dislike the idea of undefined behaviour here.
> >
>
> It was actually documenting the expected QEMU's usage. With QEMU
> excluded, KVM_CLEAR_DIRTY_LOG can be used as usual. Undefined behavior
> seems not precise here. We can improve it like below, to avoid talking
> about 'undefined behaviour'.
>
> To collect dirty bits in the backup bitmap, the userspace can use the
> same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> since collecting the dirty bitmap always happens in the last phase of
> VM's migration.
That's better, but the "shouldn't be needed" wording makes things
ambiguous, and we shouldn't mention migration at all (this is not the
only purpose of this API). I'd suggest this:
To collect dirty bits in the backup bitmap, userspace can use the
same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as
long as all the generation of the dirty bits is done in a single
pass.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Gavin Shan <gshan@redhat.com>
Cc: kvmarm@lists.linux.dev, kvmarm@lists.cs.columbia.edu,
kvm@vger.kernel.org, shuah@kernel.org, catalin.marinas@arm.com,
andrew.jones@linux.dev, ajones@ventanamicro.com,
bgardon@google.com, dmatlack@google.com, will@kernel.org,
suzuki.poulose@arm.com, alexandru.elisei@arm.com,
pbonzini@redhat.com, peterx@redhat.com, seanjc@google.com,
oliver.upton@linux.dev, zhenyzha@redhat.com,
shan.gavin@gmail.com
Subject: Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
Date: Mon, 07 Nov 2022 09:45:53 +0000 [thread overview]
Message-ID: <864jvbqer2.wl-maz@kernel.org> (raw)
Message-ID: <20221107094553.neWz3xbnlwoT8crZliOlaZ_Ib1VPcGBTOds_qQSd4HM@z> (raw)
In-Reply-To: <0f685682-ba39-53d4-766c-cc2b44ad48dc@redhat.com>
On Sun, 06 Nov 2022 21:40:49 +0000,
Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Marc,
>
> On 11/6/22 11:43 PM, Marc Zyngier wrote:
> > On Fri, 04 Nov 2022 23:40:45 +0000,
> > Gavin Shan <gshan@redhat.com> wrote:
> >>
> >> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> >> enabled. It's conflicting with that ring-based dirty page tracking always
> >> requires a running VCPU context.
> >>
> >> Introduce a new flavor of dirty ring that requires the use of both VCPU
> >> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> >> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> >> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> >> the VM to the target.
> >>
> >> Use an additional capability to advertise this behavior. The newly added
> >> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> >> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> >> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> >>
> >> Suggested-by: Marc Zyngier <maz@kernel.org>
> >> Suggested-by: Peter Xu <peterx@redhat.com>
> >> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> >> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> >> Signed-off-by: Gavin Shan <gshan@redhat.com>
> >> Acked-by: Peter Xu <peterx@redhat.com>
> >> ---
> >> Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
> >> include/linux/kvm_dirty_ring.h | 7 +++++
> >> include/linux/kvm_host.h | 1 +
> >> include/uapi/linux/kvm.h | 1 +
> >> virt/kvm/Kconfig | 8 ++++++
> >> virt/kvm/dirty_ring.c | 10 +++++++
> >> virt/kvm/kvm_main.c | 49 +++++++++++++++++++++++++++-------
> >> 7 files changed, 93 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> >> index eee9f857a986..2ec32bd41792 100644
> >> --- a/Documentation/virt/kvm/api.rst
> >> +++ b/Documentation/virt/kvm/api.rst
> >> @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one
> >> needs to kick the vcpu out of KVM_RUN using a signal. The resulting
> >> vmexit ensures that all dirty GFNs are flushed to the dirty rings.
> >> -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the
> >> corresponding
> >> -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
> >> -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG. After enabling
> >> -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
> >> -machine will switch to ring-buffer dirty page tracking and further
> >> -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
> >> -
> >> NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
> >> should be exposed by weakly ordered architecture, in order to indicate
> >> the additional memory ordering requirements imposed on userspace when
> >> @@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
> >> expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> >> to userspace.
> >> +After using the dirty rings, the userspace needs to detect the
> >> capability
> >
> > using? or enabling? What comes after suggest the latter.
> >
>
> s/using/enabling in next revision :)
>
> >> +of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
> >> +need to be backed by per-slot bitmaps. With this capability advertised
> >> +and supported, it means the architecture can dirty guest pages without
> >
> > If it is advertised, it is supported, right?
> >
>
> Yes, s/advertised and supported/advertised in next revision.
>
> >> +vcpu/ring context, so that some of the dirty information will still be
> >> +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> >> +can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> >> +has been enabled.
> >> +
> >> +Note that the bitmap here is only a backup of the ring structure, and
> >> +normally should only contain a very small amount of dirty pages, which
> >
> > I don't think we can claim this. It is whatever amount of memory is
> > dirtied outside of a vcpu context, and we shouldn't make any claim
> > regarding the number of dirty pages.
> >
>
> It's the pre-requisite to use the backup bitmap. Otherwise, the guest
> will experience long down-time during migration, as mentioned by Peter
> in another thread. So it's appropriate to mention the limit of dirty
> pages here.
See my alternative wording for this in the other sub-thread.
>
> >> +needs to be transferred during VM downtime. Collecting the dirty bitmap
> >> +should be the very last thing that the VMM does before transmitting state
> >> +to the target VM. VMM needs to ensure that the dirty state is final and
> >> +avoid missing dirty pages from another ioctl ordered after the bitmap
> >> +collection.
> >> +
> >> +To collect dirty bits in the backup bitmap, the userspace can use the
> >> +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> >> +and its behavior is undefined since collecting the dirty bitmap always
> >> +happens in the last phase of VM's migration.
> >
> > It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
> > you have multiple devices that dirty the memory, such as multiple
> > ITSs, why shouldn't userspace be allowed to snapshot the dirty state
> > multiple time? This doesn't seem like a reasonable restriction, and I
> > really dislike the idea of undefined behaviour here.
> >
>
> It was actually documenting the expected QEMU's usage. With QEMU
> excluded, KVM_CLEAR_DIRTY_LOG can be used as usual. Undefined behavior
> seems not precise here. We can improve it like below, to avoid talking
> about 'undefined behaviour'.
>
> To collect dirty bits in the backup bitmap, the userspace can use the
> same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> since collecting the dirty bitmap always happens in the last phase of
> VM's migration.
That's better, but the "shouldn't be needed" wording makes things
ambiguous, and we shouldn't mention migration at all (this is not the
only purpose of this API). I'd suggest this:
To collect dirty bits in the backup bitmap, userspace can use the
same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as
long as all the generation of the dirty bits is done in a single
pass.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2022-11-07 9:46 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-04 23:40 [PATCH v8 0/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 1/7] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 2/7] KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-06 15:43 ` Marc Zyngier
2022-11-06 15:43 ` Marc Zyngier
2022-11-06 16:22 ` Peter Xu
2022-11-06 16:22 ` Peter Xu
2022-11-06 20:12 ` Marc Zyngier
2022-11-06 20:12 ` Marc Zyngier
2022-11-06 21:06 ` Peter Xu
2022-11-06 21:06 ` Peter Xu
2022-11-06 21:23 ` Gavin Shan
2022-11-06 21:23 ` Gavin Shan
2022-11-07 9:38 ` Marc Zyngier
2022-11-07 9:38 ` Marc Zyngier
2022-11-07 14:29 ` Peter Xu
2022-11-07 14:29 ` Peter Xu
2022-11-07 9:21 ` Marc Zyngier
2022-11-07 9:21 ` Marc Zyngier
2022-11-07 14:59 ` Peter Xu
2022-11-07 14:59 ` Peter Xu
2022-11-07 15:30 ` Marc Zyngier
2022-11-07 15:30 ` Marc Zyngier
2022-11-06 21:40 ` Gavin Shan
2022-11-06 21:40 ` Gavin Shan
2022-11-07 9:45 ` Marc Zyngier [this message]
2022-11-07 9:45 ` Marc Zyngier
2022-11-07 10:45 ` Gavin Shan
2022-11-07 10:45 ` Gavin Shan
2022-11-07 11:33 ` Marc Zyngier
2022-11-07 11:33 ` Marc Zyngier
2022-11-07 23:53 ` Gavin Shan
2022-11-07 23:53 ` Gavin Shan
2022-11-07 16:05 ` Sean Christopherson
2022-11-07 16:05 ` Sean Christopherson
2022-11-08 0:44 ` Gavin Shan
2022-11-08 0:44 ` Gavin Shan
2022-11-08 1:13 ` Oliver Upton
2022-11-08 1:13 ` Oliver Upton
2022-11-08 3:30 ` Gavin Shan
2022-11-08 3:30 ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 4/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-06 15:50 ` Marc Zyngier
2022-11-06 15:50 ` Marc Zyngier
2022-11-06 21:46 ` Gavin Shan
2022-11-06 21:46 ` Gavin Shan
2022-11-07 9:47 ` Marc Zyngier
2022-11-07 9:47 ` Marc Zyngier
2022-11-07 10:47 ` Gavin Shan
2022-11-07 10:47 ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 5/7] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 6/7] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 7/7] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-06 16:08 ` [PATCH v8 0/7] KVM: arm64: Enable ring-based dirty memory tracking Marc Zyngier
2022-11-06 16:08 ` Marc Zyngier
2022-11-06 21:50 ` Gavin Shan
2022-11-06 21:50 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=864jvbqer2.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=ajones@ventanamicro.com \
--cc=andrew.jones@linux.dev \
--cc=bgardon@google.com \
--cc=catalin.marinas@arm.com \
--cc=dmatlack@google.com \
--cc=gshan@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=kvmarm@lists.linux.dev \
--cc=pbonzini@redhat.com \
--cc=shan.gavin@gmail.com \
--cc=shuah@kernel.org \
--cc=will@kernel.org \
--cc=zhenyzha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.