All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Sean Christopherson <seanjc@google.com>
Cc: shuah@kernel.org, kvm@vger.kernel.org, catalin.marinas@arm.com,
	andrew.jones@linux.dev, dmatlack@google.com,
	shan.gavin@gmail.com, bgardon@google.com, kvmarm@lists.linux.dev,
	pbonzini@redhat.com, zhenyzha@redhat.com, will@kernel.org,
	kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
Date: Tue, 25 Oct 2022 08:22:32 +0100	[thread overview]
Message-ID: <87a65kgyfb.wl-maz@kernel.org> (raw)
In-Reply-To: <Y1ckxYst3tc0LCqb@google.com>

On Tue, 25 Oct 2022 00:50:29 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Sat, Oct 22, 2022, Marc Zyngier wrote:
> > On Fri, 21 Oct 2022 17:05:26 +0100, Sean Christopherson <seanjc@google.com> wrote:
> > > 
> > > On Fri, Oct 21, 2022, Marc Zyngier wrote:
> > > > Because dirtying memory outside of a vcpu context makes it
> > > > incredibly awkward to handle a "ring full" condition?
> > > 
> > > Kicking all vCPUs with the soft-full request isn't _that_ awkward.
> > > It's certainly sub-optimal, but if inserting into the per-VM ring is
> > > relatively rare, then in practice it's unlikely to impact guest
> > > performance.
> > 
> > But there is *nothing* to kick here. The kernel is dirtying pages,
> > devices are dirtying pages (DMA), and there is no context associated
> > with that. Which is why a finite ring is the wrong abstraction.
> 
> I don't follow.  If there's a VM, KVM can always kick all vCPUs.
> Again, might be far from optimal, but it's an option.  If there's
> literally no VM, then KVM isn't involved at all and there's no "ring
> vs. bitmap" decision.

The key word is *device*. No vcpu is involved here. Actually, we
actively prevent save/restore of the ITS while vcpus are running. How
could you even expect to snapshot a consistent state if the interrupt
state is changing under your feet?

> 
> > > Would it be possible to require a dirty bitmap when an ITS is
> > > created?  That would allow treating the above condition as a KVM
> > > bug.
> > 
> > No. This should be optional. Everything about migration should be
> > absolutely optional (I run plenty of concurrent VMs on sub-2GB
> > systems). You want to migrate a VM that has an ITS or will collect
> > dirty bits originating from a SMMU with HTTU, you enable the dirty
> > bitmap. You want to have *vcpu* based dirty rings, you enable them.
> > 
> > In short, there shouldn't be any reason for the two are either
> > mandatory or conflated. Both should be optional, independent, because
> > they cover completely disjoined use cases. *userspace* should be in
> > charge of deciding this.
> 
> I agree about userspace being in control, what I want to avoid is
> letting userspace put KVM into a bad state without any indication
> from KVM that the setup is wrong until something actually dirties a
> page.

I can't see how that can result in a bad state for KVM itself. All you
lack is a way for userspace to *track* the dirtying. Just like we
don't have a way to track the dirtying of a page from the VMM.

> Specifically, if mark_page_dirty_in_slot() is invoked without a
> running vCPU, on a memslot with dirty tracking enabled but without a
> dirty bitmap, then the migration is doomed.

Yup, and that's a luser error. Too bad. Userspace can still transfer
all the memory, and all will be fine.

> Dropping the dirty page isn't a sane response as that'd all but
> guaranatee memory corruption in the guest.

Again, user error. Userspace can readily write over all the guest
memory (virtio), and no amount of KVM-side tracking will help. What
are you going to do about it?

At the end of the day, what are you trying to do? All the dirty
tracking muck (bitmap and ring) is only a way for userspace to track
dirty pages more easily and accelerate the transfer. If userspace
doesn't tell KVM to track these writes, tough luck. If the author of a
VMM doesn't understand that, then maybe they shouldn't be in charge of
the VMM. Worse case, they can still transfer the whole thing, no harm
done.

> At best, KVM could kick all vCPUs out to userspace
> with a new exit reason, but that's not a very good experience for
> userspace as either the VM is unexpectedly unmigratable or the VM
> ends up being killed (or I suppose userspace could treat the exit as
> a per-VM dirty ring of size '1').

Can we please stop the exit nonsense? There is no vcpu involved
here. This is a device (emulated or not) writing to memory, triggered
by an ioctl from userspace. If you're thinking vcpu, you have the
wrong end of the stick.

Memory gets dirtied system wide, not just by CPUs, and no amount of
per-vcpu resource is going to solve this problem. VM-based rings can
help with if they provide a way to recover from an overflow. But that
obviously doesn't work here as we can't checkpoint and restart the
saving process on overflow.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Sean Christopherson <seanjc@google.com>
Cc: Gavin Shan <gshan@redhat.com>,
	kvmarm@lists.linux.dev, kvmarm@lists.cs.columbia.edu,
	kvm@vger.kernel.org, peterx@redhat.com, will@kernel.org,
	catalin.marinas@arm.com, bgardon@google.com, shuah@kernel.org,
	andrew.jones@linux.dev, dmatlack@google.com, pbonzini@redhat.com,
	zhenyzha@redhat.com, james.morse@arm.com, suzuki.poulose@arm.com,
	alexandru.elisei@arm.com, oliver.upton@linux.dev,
	shan.gavin@gmail.com
Subject: Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
Date: Tue, 25 Oct 2022 08:22:32 +0100	[thread overview]
Message-ID: <87a65kgyfb.wl-maz@kernel.org> (raw)
Message-ID: <20221025072232.BNUffPQUuFncV1JH5T21hhRaN3Luv9N_G7kro_IiqxU@z> (raw)
In-Reply-To: <Y1ckxYst3tc0LCqb@google.com>

On Tue, 25 Oct 2022 00:50:29 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Sat, Oct 22, 2022, Marc Zyngier wrote:
> > On Fri, 21 Oct 2022 17:05:26 +0100, Sean Christopherson <seanjc@google.com> wrote:
> > > 
> > > On Fri, Oct 21, 2022, Marc Zyngier wrote:
> > > > Because dirtying memory outside of a vcpu context makes it
> > > > incredibly awkward to handle a "ring full" condition?
> > > 
> > > Kicking all vCPUs with the soft-full request isn't _that_ awkward.
> > > It's certainly sub-optimal, but if inserting into the per-VM ring is
> > > relatively rare, then in practice it's unlikely to impact guest
> > > performance.
> > 
> > But there is *nothing* to kick here. The kernel is dirtying pages,
> > devices are dirtying pages (DMA), and there is no context associated
> > with that. Which is why a finite ring is the wrong abstraction.
> 
> I don't follow.  If there's a VM, KVM can always kick all vCPUs.
> Again, might be far from optimal, but it's an option.  If there's
> literally no VM, then KVM isn't involved at all and there's no "ring
> vs. bitmap" decision.

The key word is *device*. No vcpu is involved here. Actually, we
actively prevent save/restore of the ITS while vcpus are running. How
could you even expect to snapshot a consistent state if the interrupt
state is changing under your feet?

> 
> > > Would it be possible to require a dirty bitmap when an ITS is
> > > created?  That would allow treating the above condition as a KVM
> > > bug.
> > 
> > No. This should be optional. Everything about migration should be
> > absolutely optional (I run plenty of concurrent VMs on sub-2GB
> > systems). You want to migrate a VM that has an ITS or will collect
> > dirty bits originating from a SMMU with HTTU, you enable the dirty
> > bitmap. You want to have *vcpu* based dirty rings, you enable them.
> > 
> > In short, there shouldn't be any reason for the two are either
> > mandatory or conflated. Both should be optional, independent, because
> > they cover completely disjoined use cases. *userspace* should be in
> > charge of deciding this.
> 
> I agree about userspace being in control, what I want to avoid is
> letting userspace put KVM into a bad state without any indication
> from KVM that the setup is wrong until something actually dirties a
> page.

I can't see how that can result in a bad state for KVM itself. All you
lack is a way for userspace to *track* the dirtying. Just like we
don't have a way to track the dirtying of a page from the VMM.

> Specifically, if mark_page_dirty_in_slot() is invoked without a
> running vCPU, on a memslot with dirty tracking enabled but without a
> dirty bitmap, then the migration is doomed.

Yup, and that's a luser error. Too bad. Userspace can still transfer
all the memory, and all will be fine.

> Dropping the dirty page isn't a sane response as that'd all but
> guaranatee memory corruption in the guest.

Again, user error. Userspace can readily write over all the guest
memory (virtio), and no amount of KVM-side tracking will help. What
are you going to do about it?

At the end of the day, what are you trying to do? All the dirty
tracking muck (bitmap and ring) is only a way for userspace to track
dirty pages more easily and accelerate the transfer. If userspace
doesn't tell KVM to track these writes, tough luck. If the author of a
VMM doesn't understand that, then maybe they shouldn't be in charge of
the VMM. Worse case, they can still transfer the whole thing, no harm
done.

> At best, KVM could kick all vCPUs out to userspace
> with a new exit reason, but that's not a very good experience for
> userspace as either the VM is unexpectedly unmigratable or the VM
> ends up being killed (or I suppose userspace could treat the exit as
> a per-VM dirty ring of size '1').

Can we please stop the exit nonsense? There is no vcpu involved
here. This is a device (emulated or not) writing to memory, triggered
by an ioctl from userspace. If you're thinking vcpu, you have the
wrong end of the stick.

Memory gets dirtied system wide, not just by CPUs, and no amount of
per-vcpu resource is going to solve this problem. VM-based rings can
help with if they provide a way to recover from an overflow. But that
obviously doesn't work here as we can't checkpoint and restart the
saving process on overflow.

	M.

-- 
Without deviation from the norm, progress is not possible.

  parent reply	other threads:[~2022-10-25  7:23 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-11  6:14 [PATCH v6 0/8] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-11  6:14 ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-20 22:42   ` Sean Christopherson
2022-10-20 22:42     ` Sean Christopherson
2022-10-21  5:54     ` Gavin Shan
2022-10-21  5:54       ` Gavin Shan
2022-10-21 15:25       ` Sean Christopherson
2022-10-21 15:25         ` Sean Christopherson
2022-10-21 23:03         ` Gavin Shan
2022-10-21 23:03           ` Gavin Shan
2022-10-21 23:48           ` Sean Christopherson
2022-10-21 23:48             ` Sean Christopherson
2022-10-22  0:16             ` Gavin Shan
2022-10-22  0:16               ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 2/8] KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-18 16:07   ` Peter Xu
2022-10-18 16:07     ` Peter Xu
2022-10-18 22:20     ` Gavin Shan
2022-10-18 22:20       ` Gavin Shan
2022-10-20 18:58       ` Oliver Upton
2022-10-20 18:58         ` Oliver Upton
2022-10-20 23:44   ` Sean Christopherson
2022-10-20 23:44     ` Sean Christopherson
2022-10-21  8:06     ` Marc Zyngier
2022-10-21  8:06       ` Marc Zyngier
2022-10-21 16:05       ` Sean Christopherson
2022-10-21 16:05         ` Sean Christopherson
2022-10-22  8:27         ` Gavin Shan
2022-10-22  8:27           ` Gavin Shan
2022-10-22 10:54           ` Marc Zyngier
2022-10-22 10:54             ` Marc Zyngier
2022-10-22 10:33         ` Marc Zyngier
2022-10-22 10:33           ` Marc Zyngier
2022-10-24 23:50           ` Sean Christopherson
2022-10-24 23:50             ` Sean Christopherson
2022-10-25  0:08             ` Sean Christopherson
2022-10-25  0:08               ` Sean Christopherson
2022-10-25  0:24             ` Oliver Upton
2022-10-25  0:24               ` Oliver Upton
2022-10-25  7:31               ` Marc Zyngier
2022-10-25  7:31                 ` Marc Zyngier
2022-10-25 17:47                 ` Sean Christopherson
2022-10-25 17:47                   ` Sean Christopherson
2022-10-27  8:29                   ` Marc Zyngier
2022-10-27  8:29                     ` Marc Zyngier
2022-10-27 17:44                     ` Sean Christopherson
2022-10-27 17:44                       ` Sean Christopherson
2022-10-27 18:30                       ` Marc Zyngier
2022-10-27 18:30                         ` Marc Zyngier
2022-10-27 19:09                         ` Sean Christopherson
2022-10-27 19:09                           ` Sean Christopherson
2022-10-28  6:43                         ` Gavin Shan
2022-10-28  6:43                           ` Gavin Shan
2022-10-28 16:51                           ` Sean Christopherson
2022-10-28 16:51                             ` Sean Christopherson
2022-10-31  3:37                             ` Gavin Shan
2022-10-31  3:37                               ` Gavin Shan
2022-10-31  9:08                             ` Marc Zyngier
2022-10-31  9:08                               ` Marc Zyngier
2022-10-31 22:48                               ` Gavin Shan
2022-10-31 22:48                                 ` Gavin Shan
2022-10-25  7:22             ` Marc Zyngier [this message]
2022-10-25  7:22               ` Marc Zyngier
2022-10-21 10:13     ` Gavin Shan
2022-10-21 10:13       ` Gavin Shan
2022-10-21 23:20       ` Sean Christopherson
2022-10-21 23:20         ` Sean Christopherson
2022-10-22  0:33         ` Gavin Shan
2022-10-22  0:33           ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 4/8] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 5/8] KVM: selftests: Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP if possible Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 6/8] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 7/8] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 8/8] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:23 ` [PATCH v6 0/8] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-11  6:23   ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a65kgyfb.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=andrew.jones@linux.dev \
    --cc=bgardon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=kvmarm@lists.linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=shan.gavin@gmail.com \
    --cc=shuah@kernel.org \
    --cc=will@kernel.org \
    --cc=zhenyzha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.