kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oliver Upton <oliver.upton@linux.dev>
To: Gavin Shan <gshan@redhat.com>
Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu, andrew.jones@linux.dev,
	ajones@ventanamicro.com, maz@kernel.org, bgardon@google.com,
	catalin.marinas@arm.com, dmatlack@google.com, will@kernel.org,
	pbonzini@redhat.com, peterx@redhat.com, seanjc@google.com,
	james.morse@arm.com, shuah@kernel.org, suzuki.poulose@arm.com,
	alexandru.elisei@arm.com, zhenyzha@redhat.com,
	shan.gavin@gmail.com
Subject: Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
Date: Fri, 4 Nov 2022 01:06:06 +0000	[thread overview]
Message-ID: <Y2RlfkyQMCtD6Rbh@google.com> (raw)
In-Reply-To: <d5b86a73-e030-7ce3-e5f3-301f4f505323@redhat.com>

On Fri, Nov 04, 2022 at 08:12:21AM +0800, Gavin Shan wrote:
> Hi Oliver,
> 
> On 11/4/22 7:32 AM, Oliver Upton wrote:
> > On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
> > > ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> > > enabled. It's conflicting with that ring-based dirty page tracking always
> > > requires a running VCPU context.
> > > 
> > > Introduce a new flavor of dirty ring that requires the use of both VCPU
> > > dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> > > sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> > > the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> > > the VM to the target.
> > > 
> > > Use an additional capability to advertise this behavior. The newly added
> > > capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> > > KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> > > capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> > 
> > Whatever ordering requirements we settle on between these capabilities
> > needs to be documented as well.
> > 
> > [...]
> > 
> 
> It's mentioned in 'Documentation/virt/kvm/api.rst' as below.
> 
>   After using the dirty rings, the userspace needs to detect the capability
>   of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
>   need to be backed by per-slot bitmaps. With this capability advertised
>   and supported, it means the architecture can dirty guest pages without
>   vcpu/ring context, so that some of the dirty information will still be
>   maintained in the bitmap structure.
> 
> The description may be not obvious about the ordering. For this, I can
> add the following sentence at end of the section.
> 
>   The capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled
>   until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL has been enabled.
> 
> > > @@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> > >   			return -EINVAL;
> > >   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> > > +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> > > +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> > > +		    !kvm->dirty_ring_size)
> > 
> > I believe this ordering requirement is problematic, as it piles on top
> > of an existing problem w.r.t. KVM_CAP_DIRTY_LOG_RING v. memslot
> > creation.
> > 
> > Example:
> >   - Enable KVM_CAP_DIRTY_LOG_RING
> >   - Create some memslots w/ dirty logging enabled (note that the bitmap
> >     is _not_ allocated)
> >   - Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> >   - Save ITS tables and get a NULL dereference in
> >     mark_page_dirty_in_slot():
> > 
> >                  if (vcpu && kvm->dirty_ring_size)
> >                          kvm_dirty_ring_push(&vcpu->dirty_ring,
> >                                              slot, rel_gfn);
> >                  else
> > ------->		set_bit_le(rel_gfn, memslot->dirty_bitmap);
> > 
> > Similarly, KVM may unnecessarily allocate bitmaps if dirty logging is
> > enabled on memslots before KVM_CAP_DIRTY_LOG_RING is enabled.
> > 
> > You could paper over this issue by disallowing DIRTY_RING_WITH_BITMAP if
> > DIRTY_LOG_RING has already been enabled, but the better approach would
> > be to explicitly check kvm_memslots_empty() such that the real
> > dependency is obvious. Peter, hadn't you mentioned something about
> > checking against memslots in an earlier revision?
> > 
> 
> The userspace (QEMU) needs to ensure that no dirty bitmap is created
> before the capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled.
> It's unknown by QEMU that vgic/its is used when KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> is enabled.

I'm not worried about what QEMU (or any particular VMM for that matter)
does with the UAPI. The problem is this patch provides a trivial way for
userspace to cause a NULL dereference in the kernel. Imposing ordering
between the cap and memslot creation avoids the problem altogether.

So, looking at your example:

>    kvm_initialization
>      enable_KVM_CAP_DIRTY_LOG_RING_ACQ_REL        // Where KVM_CAP_DIRTY_LOG_RING is enabled
>    board_initialization                           // Where QEMU knows if vgic/its is used

Is it possible that QEMU could hoist enabling RING_WITH_BITMAP here?
Based on your description QEMU has decided to use the vGIC ITS but
hasn't yet added any memslots.

>      add_memory_slots
>    kvm_post_initialization
>      enable_KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>    :
>    start_migration
>      enable_dirty_page_tracking
>        create_dirty_bitmap                       // With KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP enabled

Just to make sure we're on the same page, there's two issues:

 (1) If DIRTY_LOG_RING is enabled before memslot creation and
     RING_WITH_BITMAP is enabled after memslots have been created w/
     dirty logging enabled, memslot->dirty_bitmap == NULL and the
     kernel will fault when attempting to save the ITS tables.

 (2) Not your code, but a similar issue. If DIRTY_LOG_RING[_ACQ_REL] is
     enabled after memslots have been created w/ dirty logging enabled,
     memslot->dirty_bitmap != NULL and that memory is wasted until the
     memslot is freed.

I don't expect you to fix #2, though I've mentioned it because using the
same approach to #1 and #2 would be nice.

--
Thanks,
Oliver

  reply	other threads:[~2022-11-04  1:06 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31  0:36 [PATCH v7 0/9] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-31  0:36 ` [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL Gavin Shan
2022-11-01 19:39   ` Sean Christopherson
2022-11-02 14:29     ` Peter Xu
2022-11-02 15:58       ` Marc Zyngier
2022-11-02 16:11         ` Sean Christopherson
2022-11-02 16:44           ` Marc Zyngier
2022-11-03  0:44             ` Gavin Shan
2022-11-02 16:23         ` Peter Xu
2022-11-02 16:33           ` Sean Christopherson
2022-11-02 16:43             ` Peter Xu
2022-11-02 16:48           ` Marc Zyngier
2022-11-02 14:31     ` Marc Zyngier
2022-10-31  0:36 ` [PATCH v7 2/9] KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
2022-10-31  0:36 ` [PATCH v7 3/9] KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them Gavin Shan
2022-10-31  9:18   ` Oliver Upton
2022-10-31  0:36 ` [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap Gavin Shan
2022-11-03 19:33   ` Peter Xu
2022-11-03 23:32   ` Oliver Upton
2022-11-04  0:12     ` Gavin Shan
2022-11-04  1:06       ` Oliver Upton [this message]
2022-11-04  6:57         ` Gavin Shan
2022-11-04 20:12           ` Oliver Upton
2022-11-04 21:57             ` Gavin Shan
2022-11-04 22:23               ` Oliver Upton
2022-10-31  0:36 ` [PATCH v7 5/9] KVM: arm64: Improve no-running-vcpu report for dirty ring Gavin Shan
2022-10-31  9:08   ` Oliver Upton
2022-10-31 23:08     ` Gavin Shan
2022-11-02 17:18       ` Marc Zyngier
2022-10-31  0:36 ` [PATCH v7 6/9] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-31  0:36 ` [PATCH v7 7/9] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
2022-10-31  0:36 ` [PATCH v7 8/9] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
2022-10-31  0:36 ` [PATCH v7 9/9] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
2022-10-31 17:23 ` (subset) [PATCH v7 0/9] KVM: arm64: Enable ring-based dirty memory tracking Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2RlfkyQMCtD6Rbh@google.com \
    --to=oliver.upton@linux.dev \
    --cc=ajones@ventanamicro.com \
    --cc=alexandru.elisei@arm.com \
    --cc=andrew.jones@linux.dev \
    --cc=bgardon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=dmatlack@google.com \
    --cc=gshan@redhat.com \
    --cc=james.morse@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=seanjc@google.com \
    --cc=shan.gavin@gmail.com \
    --cc=shuah@kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    --cc=zhenyzha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).