All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oliver Upton <oliver.upton@linux.dev>
To: Gavin Shan <gshan@redhat.com>
Cc: shuah@kernel.org, catalin.marinas@arm.com, kvm@vger.kernel.org,
	maz@kernel.org, andrew.jones@linux.dev, dmatlack@google.com,
	shan.gavin@gmail.com, bgardon@google.com, kvmarm@lists.linux.dev,
	pbonzini@redhat.com, zhenyzha@redhat.com, will@kernel.org,
	kvmarm@lists.cs.columbia.edu, ajones@ventanamicro.com
Subject: Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
Date: Fri, 4 Nov 2022 01:06:06 +0000	[thread overview]
Message-ID: <Y2RlfkyQMCtD6Rbh@google.com> (raw)
In-Reply-To: <d5b86a73-e030-7ce3-e5f3-301f4f505323@redhat.com>

On Fri, Nov 04, 2022 at 08:12:21AM +0800, Gavin Shan wrote:
> Hi Oliver,
> 
> On 11/4/22 7:32 AM, Oliver Upton wrote:
> > On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
> > > ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> > > enabled. It's conflicting with that ring-based dirty page tracking always
> > > requires a running VCPU context.
> > > 
> > > Introduce a new flavor of dirty ring that requires the use of both VCPU
> > > dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> > > sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> > > the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> > > the VM to the target.
> > > 
> > > Use an additional capability to advertise this behavior. The newly added
> > > capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> > > KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> > > capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> > 
> > Whatever ordering requirements we settle on between these capabilities
> > needs to be documented as well.
> > 
> > [...]
> > 
> 
> It's mentioned in 'Documentation/virt/kvm/api.rst' as below.
> 
>   After using the dirty rings, the userspace needs to detect the capability
>   of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
>   need to be backed by per-slot bitmaps. With this capability advertised
>   and supported, it means the architecture can dirty guest pages without
>   vcpu/ring context, so that some of the dirty information will still be
>   maintained in the bitmap structure.
> 
> The description may be not obvious about the ordering. For this, I can
> add the following sentence at end of the section.
> 
>   The capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled
>   until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL has been enabled.
> 
> > > @@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> > >   			return -EINVAL;
> > >   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> > > +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> > > +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> > > +		    !kvm->dirty_ring_size)
> > 
> > I believe this ordering requirement is problematic, as it piles on top
> > of an existing problem w.r.t. KVM_CAP_DIRTY_LOG_RING v. memslot
> > creation.
> > 
> > Example:
> >   - Enable KVM_CAP_DIRTY_LOG_RING
> >   - Create some memslots w/ dirty logging enabled (note that the bitmap
> >     is _not_ allocated)
> >   - Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> >   - Save ITS tables and get a NULL dereference in
> >     mark_page_dirty_in_slot():
> > 
> >                  if (vcpu && kvm->dirty_ring_size)
> >                          kvm_dirty_ring_push(&vcpu->dirty_ring,
> >                                              slot, rel_gfn);
> >                  else
> > ------->		set_bit_le(rel_gfn, memslot->dirty_bitmap);
> > 
> > Similarly, KVM may unnecessarily allocate bitmaps if dirty logging is
> > enabled on memslots before KVM_CAP_DIRTY_LOG_RING is enabled.
> > 
> > You could paper over this issue by disallowing DIRTY_RING_WITH_BITMAP if
> > DIRTY_LOG_RING has already been enabled, but the better approach would
> > be to explicitly check kvm_memslots_empty() such that the real
> > dependency is obvious. Peter, hadn't you mentioned something about
> > checking against memslots in an earlier revision?
> > 
> 
> The userspace (QEMU) needs to ensure that no dirty bitmap is created
> before the capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled.
> It's unknown by QEMU that vgic/its is used when KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> is enabled.

I'm not worried about what QEMU (or any particular VMM for that matter)
does with the UAPI. The problem is this patch provides a trivial way for
userspace to cause a NULL dereference in the kernel. Imposing ordering
between the cap and memslot creation avoids the problem altogether.

So, looking at your example:

>    kvm_initialization
>      enable_KVM_CAP_DIRTY_LOG_RING_ACQ_REL        // Where KVM_CAP_DIRTY_LOG_RING is enabled
>    board_initialization                           // Where QEMU knows if vgic/its is used

Is it possible that QEMU could hoist enabling RING_WITH_BITMAP here?
Based on your description QEMU has decided to use the vGIC ITS but
hasn't yet added any memslots.

>      add_memory_slots
>    kvm_post_initialization
>      enable_KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>    :
>    start_migration
>      enable_dirty_page_tracking
>        create_dirty_bitmap                       // With KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP enabled

Just to make sure we're on the same page, there's two issues:

 (1) If DIRTY_LOG_RING is enabled before memslot creation and
     RING_WITH_BITMAP is enabled after memslots have been created w/
     dirty logging enabled, memslot->dirty_bitmap == NULL and the
     kernel will fault when attempting to save the ITS tables.

 (2) Not your code, but a similar issue. If DIRTY_LOG_RING[_ACQ_REL] is
     enabled after memslots have been created w/ dirty logging enabled,
     memslot->dirty_bitmap != NULL and that memory is wasted until the
     memslot is freed.

I don't expect you to fix #2, though I've mentioned it because using the
same approach to #1 and #2 would be nice.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Oliver Upton <oliver.upton@linux.dev>
To: Gavin Shan <gshan@redhat.com>
Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu, andrew.jones@linux.dev,
	ajones@ventanamicro.com, maz@kernel.org, bgardon@google.com,
	catalin.marinas@arm.com, dmatlack@google.com, will@kernel.org,
	pbonzini@redhat.com, peterx@redhat.com, seanjc@google.com,
	james.morse@arm.com, shuah@kernel.org, suzuki.poulose@arm.com,
	alexandru.elisei@arm.com, zhenyzha@redhat.com,
	shan.gavin@gmail.com
Subject: Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
Date: Fri, 4 Nov 2022 01:06:06 +0000	[thread overview]
Message-ID: <Y2RlfkyQMCtD6Rbh@google.com> (raw)
Message-ID: <20221104010606.L23mzb0YnYwwtyFlaepinjp20Rh8zk0_UGGijUv24Ac@z> (raw)
In-Reply-To: <d5b86a73-e030-7ce3-e5f3-301f4f505323@redhat.com>

On Fri, Nov 04, 2022 at 08:12:21AM +0800, Gavin Shan wrote:
> Hi Oliver,
> 
> On 11/4/22 7:32 AM, Oliver Upton wrote:
> > On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
> > > ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> > > enabled. It's conflicting with that ring-based dirty page tracking always
> > > requires a running VCPU context.
> > > 
> > > Introduce a new flavor of dirty ring that requires the use of both VCPU
> > > dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> > > sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> > > the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> > > the VM to the target.
> > > 
> > > Use an additional capability to advertise this behavior. The newly added
> > > capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> > > KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> > > capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> > 
> > Whatever ordering requirements we settle on between these capabilities
> > needs to be documented as well.
> > 
> > [...]
> > 
> 
> It's mentioned in 'Documentation/virt/kvm/api.rst' as below.
> 
>   After using the dirty rings, the userspace needs to detect the capability
>   of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
>   need to be backed by per-slot bitmaps. With this capability advertised
>   and supported, it means the architecture can dirty guest pages without
>   vcpu/ring context, so that some of the dirty information will still be
>   maintained in the bitmap structure.
> 
> The description may be not obvious about the ordering. For this, I can
> add the following sentence at end of the section.
> 
>   The capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled
>   until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL has been enabled.
> 
> > > @@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> > >   			return -EINVAL;
> > >   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> > > +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> > > +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> > > +		    !kvm->dirty_ring_size)
> > 
> > I believe this ordering requirement is problematic, as it piles on top
> > of an existing problem w.r.t. KVM_CAP_DIRTY_LOG_RING v. memslot
> > creation.
> > 
> > Example:
> >   - Enable KVM_CAP_DIRTY_LOG_RING
> >   - Create some memslots w/ dirty logging enabled (note that the bitmap
> >     is _not_ allocated)
> >   - Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> >   - Save ITS tables and get a NULL dereference in
> >     mark_page_dirty_in_slot():
> > 
> >                  if (vcpu && kvm->dirty_ring_size)
> >                          kvm_dirty_ring_push(&vcpu->dirty_ring,
> >                                              slot, rel_gfn);
> >                  else
> > ------->		set_bit_le(rel_gfn, memslot->dirty_bitmap);
> > 
> > Similarly, KVM may unnecessarily allocate bitmaps if dirty logging is
> > enabled on memslots before KVM_CAP_DIRTY_LOG_RING is enabled.
> > 
> > You could paper over this issue by disallowing DIRTY_RING_WITH_BITMAP if
> > DIRTY_LOG_RING has already been enabled, but the better approach would
> > be to explicitly check kvm_memslots_empty() such that the real
> > dependency is obvious. Peter, hadn't you mentioned something about
> > checking against memslots in an earlier revision?
> > 
> 
> The userspace (QEMU) needs to ensure that no dirty bitmap is created
> before the capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled.
> It's unknown by QEMU that vgic/its is used when KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> is enabled.

I'm not worried about what QEMU (or any particular VMM for that matter)
does with the UAPI. The problem is this patch provides a trivial way for
userspace to cause a NULL dereference in the kernel. Imposing ordering
between the cap and memslot creation avoids the problem altogether.

So, looking at your example:

>    kvm_initialization
>      enable_KVM_CAP_DIRTY_LOG_RING_ACQ_REL        // Where KVM_CAP_DIRTY_LOG_RING is enabled
>    board_initialization                           // Where QEMU knows if vgic/its is used

Is it possible that QEMU could hoist enabling RING_WITH_BITMAP here?
Based on your description QEMU has decided to use the vGIC ITS but
hasn't yet added any memslots.

>      add_memory_slots
>    kvm_post_initialization
>      enable_KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>    :
>    start_migration
>      enable_dirty_page_tracking
>        create_dirty_bitmap                       // With KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP enabled

Just to make sure we're on the same page, there's two issues:

 (1) If DIRTY_LOG_RING is enabled before memslot creation and
     RING_WITH_BITMAP is enabled after memslots have been created w/
     dirty logging enabled, memslot->dirty_bitmap == NULL and the
     kernel will fault when attempting to save the ITS tables.

 (2) Not your code, but a similar issue. If DIRTY_LOG_RING[_ACQ_REL] is
     enabled after memslots have been created w/ dirty logging enabled,
     memslot->dirty_bitmap != NULL and that memory is wasted until the
     memslot is freed.

I don't expect you to fix #2, though I've mentioned it because using the
same approach to #1 and #2 would be nice.

--
Thanks,
Oliver

  reply	other threads:[~2022-11-04  1:06 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31  0:36 [PATCH v7 0/9] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-31  0:36 ` Gavin Shan
2022-10-31  0:36 ` [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-11-01 19:39   ` Sean Christopherson
2022-11-01 19:39     ` Sean Christopherson
2022-11-02 14:29     ` Peter Xu
2022-11-02 14:29       ` Peter Xu
2022-11-02 15:58       ` Marc Zyngier
2022-11-02 15:58         ` Marc Zyngier
2022-11-02 16:11         ` Sean Christopherson
2022-11-02 16:11           ` Sean Christopherson
2022-11-02 16:44           ` Marc Zyngier
2022-11-02 16:44             ` Marc Zyngier
2022-11-03  0:44             ` Gavin Shan
2022-11-03  0:44               ` Gavin Shan
2022-11-02 16:23         ` Peter Xu
2022-11-02 16:23           ` Peter Xu
2022-11-02 16:33           ` Sean Christopherson
2022-11-02 16:33             ` Sean Christopherson
2022-11-02 16:43             ` Peter Xu
2022-11-02 16:43               ` Peter Xu
2022-11-02 16:48           ` Marc Zyngier
2022-11-02 16:48             ` Marc Zyngier
2022-11-02 14:31     ` Marc Zyngier
2022-11-02 14:31       ` Marc Zyngier
2022-10-31  0:36 ` [PATCH v7 2/9] KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-10-31  0:36 ` [PATCH v7 3/9] KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-10-31  9:18   ` Oliver Upton
2022-10-31  9:18     ` Oliver Upton
2022-10-31  0:36 ` [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-11-03 19:33   ` Peter Xu
2022-11-03 19:33     ` Peter Xu
2022-11-03 23:32   ` Oliver Upton
2022-11-03 23:32     ` Oliver Upton
2022-11-04  0:12     ` Gavin Shan
2022-11-04  0:12       ` Gavin Shan
2022-11-04  1:06       ` Oliver Upton [this message]
2022-11-04  1:06         ` Oliver Upton
2022-11-04  6:57         ` Gavin Shan
2022-11-04  6:57           ` Gavin Shan
2022-11-04 20:12           ` Oliver Upton
2022-11-04 20:12             ` Oliver Upton
2022-11-04 21:57             ` Gavin Shan
2022-11-04 21:57               ` Gavin Shan
2022-11-04 22:23               ` Oliver Upton
2022-11-04 22:23                 ` Oliver Upton
2022-10-31  0:36 ` [PATCH v7 5/9] KVM: arm64: Improve no-running-vcpu report for dirty ring Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-10-31  9:08   ` Oliver Upton
2022-10-31  9:08     ` Oliver Upton
2022-10-31 23:08     ` Gavin Shan
2022-10-31 23:08       ` Gavin Shan
2022-11-02 17:18       ` Marc Zyngier
2022-11-02 17:18         ` Marc Zyngier
2022-10-31  0:36 ` [PATCH v7 6/9] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-10-31  0:36 ` [PATCH v7 7/9] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-10-31  0:36 ` [PATCH v7 8/9] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-10-31  0:36 ` [PATCH v7 9/9] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-10-31 17:23 ` (subset) [PATCH v7 0/9] KVM: arm64: Enable ring-based dirty memory tracking Marc Zyngier
2022-10-31 17:23   ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2RlfkyQMCtD6Rbh@google.com \
    --to=oliver.upton@linux.dev \
    --cc=ajones@ventanamicro.com \
    --cc=andrew.jones@linux.dev \
    --cc=bgardon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=dmatlack@google.com \
    --cc=gshan@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=shan.gavin@gmail.com \
    --cc=shuah@kernel.org \
    --cc=will@kernel.org \
    --cc=zhenyzha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.