All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Sean Christopherson <seanjc@google.com>
Cc: shuah@kernel.org, kvm@vger.kernel.org, catalin.marinas@arm.com,
	andrew.jones@linux.dev, dmatlack@google.com,
	shan.gavin@gmail.com, bgardon@google.com, kvmarm@lists.linux.dev,
	pbonzini@redhat.com, zhenyzha@redhat.com, will@kernel.org,
	kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
Date: Thu, 27 Oct 2022 09:29:58 +0100	[thread overview]
Message-ID: <877d0lhdo9.wl-maz@kernel.org> (raw)
In-Reply-To: <Y1ghIKrAsRFwSFsO@google.com>

On Tue, 25 Oct 2022 18:47:12 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Tue, Oct 25, 2022, Marc Zyngier wrote:
> > On Tue, 25 Oct 2022 01:24:19 +0100, Oliver Upton <oliver.upton@linux.dev> wrote:
> > > > That's why I asked if it's possible for KVM to require a dirty_bitmap when KVM
> > > > might end up collecting dirty information without a vCPU.  KVM is still
> > > > technically prescribing a solution to userspace, but only because there's only
> > > > one solution.
> > > 
> > > I was trying to allude to something like this by flat-out requiring
> > > ring + bitmap on arm64.
> > 
> > And I claim that this is wrong. It may suit a particular use case, but
> > that's definitely not a universal truth.
> 
> Agreed, KVM should not unconditionally require a dirty bitmap for arm64.
> 
> > > Otherwise, we'd either need to:
> > > 
> > >  (1) Document the features that explicitly depend on ring + bitmap (i.e.
> > >  GIC ITS, whatever else may come) such that userspace sets up the
> > >  correct configuration based on what its using. The combined likelihood
> > >  of both KVM and userspace getting this right seems low.
> > 
> > But what is there to get wrong? Absolutely nothing.
> 
> I strongly disagree.  On x86, we've had two bugs escape where KVM
> attempted to mark a page dirty without an active vCPU.
> 
>   2efd61a608b0 ("KVM: Warn if mark_page_dirty() is called without an active vCPU") 
>   42dcbe7d8bac ("KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPU")
> 
> Call us incompetent, but I have zero confidence that KVM will never
> unintentionally add a path that invokes mark_page_dirty_in_slot()
> without a running vCPU.

Well, maybe it is time that KVM acknowledges there is a purpose to
dirtying memory outside of a vcpu context, and that if a write happens
in a vcpu context, this vcpu must be explicitly passed down rather
than obtained from kvm_get_running_vcpu(). Yes, this requires some
heavy surgery.

> By completely dropping the rule that KVM must have an active vCPU on
> architectures that support ring+bitmap, those types of bugs will go
> silently unnoticed, and will manifest as guest data corruption after
> live migration.

The elephant in the room is still userspace writing to its view of the
guest memory for device emulation. Do they get it right? I doubt it.

> And ideally such bugs would detected without relying on userspace to
> enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
> time and was only found when mark_page_dirty_in_slot() started
> WARNing.
> 
> I'm ok if arm64 wants to let userspace shoot itself in the foot with
> the ITS, but I'm not ok dropping the protections in the common
> mark_page_dirty_in_slot().
> 
> One somewhat gross idea would be to let architectures override the
> "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
> in kvm->arch in its kvm_write_guest_lock() to note that an expected
> write without a vCPU is in-progress:
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8c5c69ba47a7..d1da8914f749 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>         struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>  
>  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
> +               return;
> +
> +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>                 return;
>  #endif
>  
> @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>                 unsigned long rel_gfn = gfn - memslot->base_gfn;
>                 u32 slot = (memslot->as_id << 16) | memslot->id;
>  
> -               if (kvm->dirty_ring_size)
> +               if (kvm->dirty_ring_size && vcpu)
>                         kvm_dirty_ring_push(&vcpu->dirty_ring,
>                                             slot, rel_gfn);
> -               else
> +               else if (memslot->dirty_bitmap)
>                         set_bit_le(rel_gfn, memslot->dirty_bitmap);
>         }
>  }

I think this is equally wrong. Writes occur from both CPUs and devices
*concurrently*, and I don't see why KVM should keep ignoring this
pretty obvious fact.

Yes, your patch papers over the problem, and it can probably work if
the kvm->arch flag only gets set in the ITS saving code, which is
already exclusive of vcpus running.

But in the long run, with dirty bits being collected from the IOMMU
page tables or directly from devices, we will need a way to reconcile
the dirty tracking. The above doesn't quite cut it, unfortunately.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Sean Christopherson <seanjc@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>,
	Gavin Shan <gshan@redhat.com>,
	kvmarm@lists.linux.dev, kvmarm@lists.cs.columbia.edu,
	kvm@vger.kernel.org, peterx@redhat.com, will@kernel.org,
	catalin.marinas@arm.com, bgardon@google.com, shuah@kernel.org,
	andrew.jones@linux.dev, dmatlack@google.com, pbonzini@redhat.com,
	zhenyzha@redhat.com, james.morse@arm.com, suzuki.poulose@arm.com,
	alexandru.elisei@arm.com, shan.gavin@gmail.com
Subject: Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
Date: Thu, 27 Oct 2022 09:29:58 +0100	[thread overview]
Message-ID: <877d0lhdo9.wl-maz@kernel.org> (raw)
Message-ID: <20221027082958.UBpwhBdApE7LpFCLoCaGT1tN5377zNy8JInk-EeG8sY@z> (raw)
In-Reply-To: <Y1ghIKrAsRFwSFsO@google.com>

On Tue, 25 Oct 2022 18:47:12 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Tue, Oct 25, 2022, Marc Zyngier wrote:
> > On Tue, 25 Oct 2022 01:24:19 +0100, Oliver Upton <oliver.upton@linux.dev> wrote:
> > > > That's why I asked if it's possible for KVM to require a dirty_bitmap when KVM
> > > > might end up collecting dirty information without a vCPU.  KVM is still
> > > > technically prescribing a solution to userspace, but only because there's only
> > > > one solution.
> > > 
> > > I was trying to allude to something like this by flat-out requiring
> > > ring + bitmap on arm64.
> > 
> > And I claim that this is wrong. It may suit a particular use case, but
> > that's definitely not a universal truth.
> 
> Agreed, KVM should not unconditionally require a dirty bitmap for arm64.
> 
> > > Otherwise, we'd either need to:
> > > 
> > >  (1) Document the features that explicitly depend on ring + bitmap (i.e.
> > >  GIC ITS, whatever else may come) such that userspace sets up the
> > >  correct configuration based on what its using. The combined likelihood
> > >  of both KVM and userspace getting this right seems low.
> > 
> > But what is there to get wrong? Absolutely nothing.
> 
> I strongly disagree.  On x86, we've had two bugs escape where KVM
> attempted to mark a page dirty without an active vCPU.
> 
>   2efd61a608b0 ("KVM: Warn if mark_page_dirty() is called without an active vCPU") 
>   42dcbe7d8bac ("KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPU")
> 
> Call us incompetent, but I have zero confidence that KVM will never
> unintentionally add a path that invokes mark_page_dirty_in_slot()
> without a running vCPU.

Well, maybe it is time that KVM acknowledges there is a purpose to
dirtying memory outside of a vcpu context, and that if a write happens
in a vcpu context, this vcpu must be explicitly passed down rather
than obtained from kvm_get_running_vcpu(). Yes, this requires some
heavy surgery.

> By completely dropping the rule that KVM must have an active vCPU on
> architectures that support ring+bitmap, those types of bugs will go
> silently unnoticed, and will manifest as guest data corruption after
> live migration.

The elephant in the room is still userspace writing to its view of the
guest memory for device emulation. Do they get it right? I doubt it.

> And ideally such bugs would detected without relying on userspace to
> enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
> time and was only found when mark_page_dirty_in_slot() started
> WARNing.
> 
> I'm ok if arm64 wants to let userspace shoot itself in the foot with
> the ITS, but I'm not ok dropping the protections in the common
> mark_page_dirty_in_slot().
> 
> One somewhat gross idea would be to let architectures override the
> "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
> in kvm->arch in its kvm_write_guest_lock() to note that an expected
> write without a vCPU is in-progress:
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8c5c69ba47a7..d1da8914f749 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>         struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>  
>  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
> +               return;
> +
> +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>                 return;
>  #endif
>  
> @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>                 unsigned long rel_gfn = gfn - memslot->base_gfn;
>                 u32 slot = (memslot->as_id << 16) | memslot->id;
>  
> -               if (kvm->dirty_ring_size)
> +               if (kvm->dirty_ring_size && vcpu)
>                         kvm_dirty_ring_push(&vcpu->dirty_ring,
>                                             slot, rel_gfn);
> -               else
> +               else if (memslot->dirty_bitmap)
>                         set_bit_le(rel_gfn, memslot->dirty_bitmap);
>         }
>  }

I think this is equally wrong. Writes occur from both CPUs and devices
*concurrently*, and I don't see why KVM should keep ignoring this
pretty obvious fact.

Yes, your patch papers over the problem, and it can probably work if
the kvm->arch flag only gets set in the ITS saving code, which is
already exclusive of vcpus running.

But in the long run, with dirty bits being collected from the IOMMU
page tables or directly from devices, we will need a way to reconcile
the dirty tracking. The above doesn't quite cut it, unfortunately.

	M.

-- 
Without deviation from the norm, progress is not possible.

  reply	other threads:[~2022-10-27  8:30 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-11  6:14 [PATCH v6 0/8] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-11  6:14 ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-20 22:42   ` Sean Christopherson
2022-10-20 22:42     ` Sean Christopherson
2022-10-21  5:54     ` Gavin Shan
2022-10-21  5:54       ` Gavin Shan
2022-10-21 15:25       ` Sean Christopherson
2022-10-21 15:25         ` Sean Christopherson
2022-10-21 23:03         ` Gavin Shan
2022-10-21 23:03           ` Gavin Shan
2022-10-21 23:48           ` Sean Christopherson
2022-10-21 23:48             ` Sean Christopherson
2022-10-22  0:16             ` Gavin Shan
2022-10-22  0:16               ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 2/8] KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-18 16:07   ` Peter Xu
2022-10-18 16:07     ` Peter Xu
2022-10-18 22:20     ` Gavin Shan
2022-10-18 22:20       ` Gavin Shan
2022-10-20 18:58       ` Oliver Upton
2022-10-20 18:58         ` Oliver Upton
2022-10-20 23:44   ` Sean Christopherson
2022-10-20 23:44     ` Sean Christopherson
2022-10-21  8:06     ` Marc Zyngier
2022-10-21  8:06       ` Marc Zyngier
2022-10-21 16:05       ` Sean Christopherson
2022-10-21 16:05         ` Sean Christopherson
2022-10-22  8:27         ` Gavin Shan
2022-10-22  8:27           ` Gavin Shan
2022-10-22 10:54           ` Marc Zyngier
2022-10-22 10:54             ` Marc Zyngier
2022-10-22 10:33         ` Marc Zyngier
2022-10-22 10:33           ` Marc Zyngier
2022-10-24 23:50           ` Sean Christopherson
2022-10-24 23:50             ` Sean Christopherson
2022-10-25  0:08             ` Sean Christopherson
2022-10-25  0:08               ` Sean Christopherson
2022-10-25  0:24             ` Oliver Upton
2022-10-25  0:24               ` Oliver Upton
2022-10-25  7:31               ` Marc Zyngier
2022-10-25  7:31                 ` Marc Zyngier
2022-10-25 17:47                 ` Sean Christopherson
2022-10-25 17:47                   ` Sean Christopherson
2022-10-27  8:29                   ` Marc Zyngier [this message]
2022-10-27  8:29                     ` Marc Zyngier
2022-10-27 17:44                     ` Sean Christopherson
2022-10-27 17:44                       ` Sean Christopherson
2022-10-27 18:30                       ` Marc Zyngier
2022-10-27 18:30                         ` Marc Zyngier
2022-10-27 19:09                         ` Sean Christopherson
2022-10-27 19:09                           ` Sean Christopherson
2022-10-28  6:43                         ` Gavin Shan
2022-10-28  6:43                           ` Gavin Shan
2022-10-28 16:51                           ` Sean Christopherson
2022-10-28 16:51                             ` Sean Christopherson
2022-10-31  3:37                             ` Gavin Shan
2022-10-31  3:37                               ` Gavin Shan
2022-10-31  9:08                             ` Marc Zyngier
2022-10-31  9:08                               ` Marc Zyngier
2022-10-31 22:48                               ` Gavin Shan
2022-10-31 22:48                                 ` Gavin Shan
2022-10-25  7:22             ` Marc Zyngier
2022-10-25  7:22               ` Marc Zyngier
2022-10-21 10:13     ` Gavin Shan
2022-10-21 10:13       ` Gavin Shan
2022-10-21 23:20       ` Sean Christopherson
2022-10-21 23:20         ` Sean Christopherson
2022-10-22  0:33         ` Gavin Shan
2022-10-22  0:33           ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 4/8] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 5/8] KVM: selftests: Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP if possible Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 6/8] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 7/8] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 8/8] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:23 ` [PATCH v6 0/8] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-11  6:23   ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877d0lhdo9.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=andrew.jones@linux.dev \
    --cc=bgardon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=kvmarm@lists.linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=shan.gavin@gmail.com \
    --cc=shuah@kernel.org \
    --cc=will@kernel.org \
    --cc=zhenyzha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.