From: Marc Zyngier <maz@kernel.org>
To: Gavin Shan <gshan@redhat.com>
Cc: shuah@kernel.org, kvm@vger.kernel.org, catalin.marinas@arm.com,
andrew.jones@linux.dev, dmatlack@google.com,
shan.gavin@gmail.com, bgardon@google.com, kvmarm@lists.linux.dev,
pbonzini@redhat.com, zhenyzha@redhat.com, will@kernel.org,
kvmarm@lists.cs.columbia.edu, ajones@ventanamicro.com
Subject: Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
Date: Mon, 07 Nov 2022 11:33:17 +0000 [thread overview]
Message-ID: <861qqfq9s2.wl-maz@kernel.org> (raw)
In-Reply-To: <ec281dc5-baa2-3e18-8e83-089322db551a@redhat.com>
On Mon, 07 Nov 2022 10:45:34 +0000,
Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Marc, Peter, Oliver and Sean,
>
> On 11/5/22 7:40 AM, Gavin Shan wrote:
> > ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> > enabled. It's conflicting with that ring-based dirty page tracking always
> > requires a running VCPU context.
> >
> > Introduce a new flavor of dirty ring that requires the use of both VCPU
> > dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> > sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> > the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> > the VM to the target.
> >
> > Use an additional capability to advertise this behavior. The newly added
> > capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> > KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> > capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> >
> > Suggested-by: Marc Zyngier <maz@kernel.org>
> > Suggested-by: Peter Xu <peterx@redhat.com>
> > Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > Acked-by: Peter Xu <peterx@redhat.com>
> > ---
> > Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
> > include/linux/kvm_dirty_ring.h | 7 +++++
> > include/linux/kvm_host.h | 1 +
> > include/uapi/linux/kvm.h | 1 +
> > virt/kvm/Kconfig | 8 ++++++
> > virt/kvm/dirty_ring.c | 10 +++++++
> > virt/kvm/kvm_main.c | 49 +++++++++++++++++++++++++++-------
> > 7 files changed, 93 insertions(+), 16 deletions(-)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index eee9f857a986..2ec32bd41792 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one
> > needs to kick the vcpu out of KVM_RUN using a signal. The resulting
> > vmexit ensures that all dirty GFNs are flushed to the dirty rings.
> > -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
> > -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
> > -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG. After enabling
> > -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
> > -machine will switch to ring-buffer dirty page tracking and further
> > -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
> > -
> > NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
> > should be exposed by weakly ordered architecture, in order to indicate
> > the additional memory ordering requirements imposed on userspace when
> > @@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
> > expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> > to userspace.
> > +After using the dirty rings, the userspace needs to detect the
> > capability
> > +of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
> > +need to be backed by per-slot bitmaps. With this capability advertised
> > +and supported, it means the architecture can dirty guest pages without
> > +vcpu/ring context, so that some of the dirty information will still be
> > +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> > +can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> > +has been enabled.
> > +
> > +Note that the bitmap here is only a backup of the ring structure, and
> > +normally should only contain a very small amount of dirty pages, which
> > +needs to be transferred during VM downtime. Collecting the dirty bitmap
> > +should be the very last thing that the VMM does before transmitting state
> > +to the target VM. VMM needs to ensure that the dirty state is final and
> > +avoid missing dirty pages from another ioctl ordered after the bitmap
> > +collection.
> > +
> > +To collect dirty bits in the backup bitmap, the userspace can use the
> > +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> > +and its behavior is undefined since collecting the dirty bitmap always
> > +happens in the last phase of VM's migration.
> > +
> > +NOTE: One example of using the backup bitmap is saving arm64 vgic/its
> > +tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
> > +KVM device "kvm-arm-vgic-its" during VM's migration.
> > +
>
> In order to speed up the review and reduce unnecessary respins. After
> collecting comments on PATCH[v8 3/7] from Marc and Peter, I would change
> above description as below. Could you please confirm it looks good to you?
>
> In the 4th paragraph, the words starting from "Collecting the dirty bitmap..."
> to the end, was previously suggested by Oliver, even Marc suggested to avoid
> mentioning "migration".
>
> After enabling the dirty rings, the userspace needs to detect the
> capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring
> structures need to be backed by per-slot bitmaps. With this capability
s/need/can/. If there was a *need*, it should happen automatically
without user intervention.
> advertised, it means the architecture can dirty guest pages without
> vcpu/ring context, so that some of the dirty information will still be
> maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> can't be enabled if the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> hasn't been enabled, or any memslot has been existing.
>
> Note that the bitmap here is only a backup of the ring structure. The
> use of the ring and bitmap combination is only beneficial if there is
> only a very small amount of memory that is dirtied out of vcpu/ring
> context. Otherwise, the stand-alone per-slot bitmap mechanism needs to
> be considered.
>
> To collect dirty bits in the backup bitmap, userspace can use the same
> KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as long as all
> the generation of the dirty bits is done in a single pass. Collecting
> the dirty bitmap should be the very last thing that the VMM does before
> transmitting state to the target VM. VMM needs to ensure that the dirty
> state is final and avoid missing dirty pages from another ioctl ordered
> after the bitmap collection.
I would replace "transmitting state to the target VM" with
"considering the state as complete", as I still object to casting this
API into the migration mold. People use this stuff more far more than
migration (checkpointing, for example).
>
> NOTE: One example of using the backup bitmap is saving arm64 vgic/its
> tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
> KVM device "kvm-arm-vgic-its" during VM's migration.
Same remark about migration.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Gavin Shan <gshan@redhat.com>
Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org,
catalin.marinas@arm.com, andrew.jones@linux.dev, will@kernel.org,
shan.gavin@gmail.com, bgardon@google.com, dmatlack@google.com,
pbonzini@redhat.com, zhenyzha@redhat.com, shuah@kernel.org,
kvmarm@lists.cs.columbia.edu, ajones@ventanamicro.com
Subject: Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
Date: Mon, 07 Nov 2022 11:33:17 +0000 [thread overview]
Message-ID: <861qqfq9s2.wl-maz@kernel.org> (raw)
Message-ID: <20221107113317.6QIbiDQzGzix19_avBdlvxgpkZuvFPz5_3fqnflz-9s@z> (raw)
In-Reply-To: <ec281dc5-baa2-3e18-8e83-089322db551a@redhat.com>
On Mon, 07 Nov 2022 10:45:34 +0000,
Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Marc, Peter, Oliver and Sean,
>
> On 11/5/22 7:40 AM, Gavin Shan wrote:
> > ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> > enabled. It's conflicting with that ring-based dirty page tracking always
> > requires a running VCPU context.
> >
> > Introduce a new flavor of dirty ring that requires the use of both VCPU
> > dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> > sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> > the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> > the VM to the target.
> >
> > Use an additional capability to advertise this behavior. The newly added
> > capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> > KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> > capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> >
> > Suggested-by: Marc Zyngier <maz@kernel.org>
> > Suggested-by: Peter Xu <peterx@redhat.com>
> > Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > Acked-by: Peter Xu <peterx@redhat.com>
> > ---
> > Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
> > include/linux/kvm_dirty_ring.h | 7 +++++
> > include/linux/kvm_host.h | 1 +
> > include/uapi/linux/kvm.h | 1 +
> > virt/kvm/Kconfig | 8 ++++++
> > virt/kvm/dirty_ring.c | 10 +++++++
> > virt/kvm/kvm_main.c | 49 +++++++++++++++++++++++++++-------
> > 7 files changed, 93 insertions(+), 16 deletions(-)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index eee9f857a986..2ec32bd41792 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one
> > needs to kick the vcpu out of KVM_RUN using a signal. The resulting
> > vmexit ensures that all dirty GFNs are flushed to the dirty rings.
> > -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
> > -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
> > -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG. After enabling
> > -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
> > -machine will switch to ring-buffer dirty page tracking and further
> > -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
> > -
> > NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
> > should be exposed by weakly ordered architecture, in order to indicate
> > the additional memory ordering requirements imposed on userspace when
> > @@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
> > expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> > to userspace.
> > +After using the dirty rings, the userspace needs to detect the
> > capability
> > +of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
> > +need to be backed by per-slot bitmaps. With this capability advertised
> > +and supported, it means the architecture can dirty guest pages without
> > +vcpu/ring context, so that some of the dirty information will still be
> > +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> > +can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> > +has been enabled.
> > +
> > +Note that the bitmap here is only a backup of the ring structure, and
> > +normally should only contain a very small amount of dirty pages, which
> > +needs to be transferred during VM downtime. Collecting the dirty bitmap
> > +should be the very last thing that the VMM does before transmitting state
> > +to the target VM. VMM needs to ensure that the dirty state is final and
> > +avoid missing dirty pages from another ioctl ordered after the bitmap
> > +collection.
> > +
> > +To collect dirty bits in the backup bitmap, the userspace can use the
> > +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> > +and its behavior is undefined since collecting the dirty bitmap always
> > +happens in the last phase of VM's migration.
> > +
> > +NOTE: One example of using the backup bitmap is saving arm64 vgic/its
> > +tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
> > +KVM device "kvm-arm-vgic-its" during VM's migration.
> > +
>
> In order to speed up the review and reduce unnecessary respins. After
> collecting comments on PATCH[v8 3/7] from Marc and Peter, I would change
> above description as below. Could you please confirm it looks good to you?
>
> In the 4th paragraph, the words starting from "Collecting the dirty bitmap..."
> to the end, was previously suggested by Oliver, even Marc suggested to avoid
> mentioning "migration".
>
> After enabling the dirty rings, the userspace needs to detect the
> capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring
> structures need to be backed by per-slot bitmaps. With this capability
s/need/can/. If there was a *need*, it should happen automatically
without user intervention.
> advertised, it means the architecture can dirty guest pages without
> vcpu/ring context, so that some of the dirty information will still be
> maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> can't be enabled if the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> hasn't been enabled, or any memslot has been existing.
>
> Note that the bitmap here is only a backup of the ring structure. The
> use of the ring and bitmap combination is only beneficial if there is
> only a very small amount of memory that is dirtied out of vcpu/ring
> context. Otherwise, the stand-alone per-slot bitmap mechanism needs to
> be considered.
>
> To collect dirty bits in the backup bitmap, userspace can use the same
> KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as long as all
> the generation of the dirty bits is done in a single pass. Collecting
> the dirty bitmap should be the very last thing that the VMM does before
> transmitting state to the target VM. VMM needs to ensure that the dirty
> state is final and avoid missing dirty pages from another ioctl ordered
> after the bitmap collection.
I would replace "transmitting state to the target VM" with
"considering the state as complete", as I still object to casting this
API into the migration mold. People use this stuff more far more than
migration (checkpointing, for example).
>
> NOTE: One example of using the backup bitmap is saving arm64 vgic/its
> tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
> KVM device "kvm-arm-vgic-its" during VM's migration.
Same remark about migration.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2022-11-07 11:33 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-04 23:40 [PATCH v8 0/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 1/7] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 2/7] KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-06 15:43 ` Marc Zyngier
2022-11-06 15:43 ` Marc Zyngier
2022-11-06 16:22 ` Peter Xu
2022-11-06 16:22 ` Peter Xu
2022-11-06 20:12 ` Marc Zyngier
2022-11-06 20:12 ` Marc Zyngier
2022-11-06 21:06 ` Peter Xu
2022-11-06 21:06 ` Peter Xu
2022-11-06 21:23 ` Gavin Shan
2022-11-06 21:23 ` Gavin Shan
2022-11-07 9:38 ` Marc Zyngier
2022-11-07 9:38 ` Marc Zyngier
2022-11-07 14:29 ` Peter Xu
2022-11-07 14:29 ` Peter Xu
2022-11-07 9:21 ` Marc Zyngier
2022-11-07 9:21 ` Marc Zyngier
2022-11-07 14:59 ` Peter Xu
2022-11-07 14:59 ` Peter Xu
2022-11-07 15:30 ` Marc Zyngier
2022-11-07 15:30 ` Marc Zyngier
2022-11-06 21:40 ` Gavin Shan
2022-11-06 21:40 ` Gavin Shan
2022-11-07 9:45 ` Marc Zyngier
2022-11-07 9:45 ` Marc Zyngier
2022-11-07 10:45 ` Gavin Shan
2022-11-07 10:45 ` Gavin Shan
2022-11-07 11:33 ` Marc Zyngier [this message]
2022-11-07 11:33 ` Marc Zyngier
2022-11-07 23:53 ` Gavin Shan
2022-11-07 23:53 ` Gavin Shan
2022-11-07 16:05 ` Sean Christopherson
2022-11-07 16:05 ` Sean Christopherson
2022-11-08 0:44 ` Gavin Shan
2022-11-08 0:44 ` Gavin Shan
2022-11-08 1:13 ` Oliver Upton
2022-11-08 1:13 ` Oliver Upton
2022-11-08 3:30 ` Gavin Shan
2022-11-08 3:30 ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 4/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-06 15:50 ` Marc Zyngier
2022-11-06 15:50 ` Marc Zyngier
2022-11-06 21:46 ` Gavin Shan
2022-11-06 21:46 ` Gavin Shan
2022-11-07 9:47 ` Marc Zyngier
2022-11-07 9:47 ` Marc Zyngier
2022-11-07 10:47 ` Gavin Shan
2022-11-07 10:47 ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 5/7] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 6/7] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 7/7] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-06 16:08 ` [PATCH v8 0/7] KVM: arm64: Enable ring-based dirty memory tracking Marc Zyngier
2022-11-06 16:08 ` Marc Zyngier
2022-11-06 21:50 ` Gavin Shan
2022-11-06 21:50 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=861qqfq9s2.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=ajones@ventanamicro.com \
--cc=andrew.jones@linux.dev \
--cc=bgardon@google.com \
--cc=catalin.marinas@arm.com \
--cc=dmatlack@google.com \
--cc=gshan@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=kvmarm@lists.linux.dev \
--cc=pbonzini@redhat.com \
--cc=shan.gavin@gmail.com \
--cc=shuah@kernel.org \
--cc=will@kernel.org \
--cc=zhenyzha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.