Re: [RFC PATCH] KVM: Introduce KVM VIRTIO device

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Kevin Tian <kevin.tian@intel.com>
Cc: "kraxel@redhat.com" <kraxel@redhat.com>,
	Yan Y Zhao <yan.y.zhao@intel.com>,
	Zhenyu Z Wang <zhenyu.z.wang@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Yiwei Zhang <zzyiwei@google.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"gurchetansingh@chromium.org" <gurchetansingh@chromium.org>,
	"wanpengli@tencent.com" <wanpengli@tencent.com>,
	Yongwei Ma <yongwei.ma@intel.com>,
	Zhiyuan Lv <zhiyuan.lv@intel.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"jmattson@google.com" <jmattson@google.com>
Subject: Re: [RFC PATCH] KVM: Introduce KVM VIRTIO device
Date: Mon, 18 Dec 2023 07:08:51 -0800	[thread overview]
Message-ID: <ZXzx1zXfZ6GV9TgI@google.com> (raw)
In-Reply-To: <BN9PR11MB5276BE04CBB6D07039086D658C93A@BN9PR11MB5276.namprd11.prod.outlook.com>

+Yiwei

On Fri, Dec 15, 2023, Kevin Tian wrote:
> > From: Zhao, Yan Y <yan.y.zhao@intel.com>
> > Sent: Thursday, December 14, 2023 6:35 PM
> > 
> > - For host non-MMIO pages,
> >   * virtio guest frontend and host backend driver should be synced to use
> >     the same memory type to map a buffer. Otherwise, there will be
> >     potential problem for incorrect memory data. But this will only impact
> >     the buggy guest alone.
> >   * for live migration,
> >     as QEMU will read all guest memory during live migration, page aliasing
> >     could happen.
> >     Current thinking is to disable live migration if a virtio device has
> >     indicated its noncoherent state.
> >     As a follow-up, we can discuss other solutions. e.g.
> >     (a) switching back to coherent path before starting live migration.
> 
> both guest/host switching to coherent or host-only?
> 
> host-only certainly is problematic if guest is still using non-coherent.
> 
> on the other hand I'm not sure whether the host/guest gfx stack is
> capable of switching between coherent and non-coherent path in-fly
> when the buffer is right being rendered.
> 
> >     (b) read/write of guest memory with clflush during live migration.
> 
> write is irrelevant as it's only done in the resume path where the
> guest is not running.
> 
> > 
> > Implementation Consideration
> > ===
> > There is a previous series [1] from google to serve the same purpose to
> > let KVM be aware of virtio GPU's noncoherent DMA status. That series
> > requires a new memslot flag, and special memslots in user space.
> > 
> > We don't choose to use memslot flag to request honoring guest memory
> > type.
> 
> memslot flag has the potential to restrict the impact e.g. when using
> clflush-before-read in migration?

Yep, exactly.  E.g. if KVM needs to ensure coherency when freeing memory back to
the host kernel, then the memslot flag will allow for a much more targeted
operation.

> Of course the implication is to honor guest type only for the selected slot
> in KVM instead of applying to the entire guest memory as in previous series
> (which selects this way because vmx_get_mt_mask() is in perf-critical path
> hence not good to check memslot flag?)

Checking a memslot flag won't impact performance.  KVM already has the memslot
when creating SPTEs, e.g. the sole caller of vmx_get_mt_mask(), make_spte(), has
access to the memslot.

That isn't coincidental, KVM _must_ have the memslot to construct the SPTE, e.g.
to retrieve the associated PFN, update write-tracking for shadow pages, etc.

I added Yiwei, who I think is planning on posting another RFC for the memslot
idea (I actually completely forgot that the memslot idea had been thought of and
posted a few years back).

> > Instead we hope to make the honoring request to be explicit (not tied to a
> > memslot flag). This is because once guest memory type is honored, not only
> > memory used by guest virtio device, but all guest memory is facing page
> > aliasing issue potentially. KVM needs a generic solution to take care of
> > page aliasing issue rather than counting on memory type of a special
> > memslot being aligned in host and guest.
> > (we can discuss what a generic solution to handle page aliasing issue will
> > look like in later follow-up series).
> > 
> > On the other hand, we choose to introduce a KVM virtio device rather than
> > just provide an ioctl to wrap kvm_arch_[un]register_noncoherent_dma()
> > directly, which is based on considerations that
> 
> I wonder it's over-engineered for the purpose.
> 
> why not just introducing a KVM_CAP and allowing the VMM to enable?
> KVM doesn't need to know the exact source of requiring it...

Agreed.  If we end up needing to grant the whole VM access for some reason, just
give userspace a direct toggle.

WARNING: multiple messages have this Message-ID (diff)

From: Sean Christopherson <seanjc@google.com>
To: Kevin Tian <kevin.tian@intel.com>
Cc: Yan Y Zhao <yan.y.zhao@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	 "dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	 "pbonzini@redhat.com" <pbonzini@redhat.com>,
	"olvaffe@gmail.com" <olvaffe@gmail.com>,
	 Zhiyuan Lv <zhiyuan.lv@intel.com>,
	Zhenyu Z Wang <zhenyu.z.wang@intel.com>,
	 Yongwei Ma <yongwei.ma@intel.com>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	 "wanpengli@tencent.com" <wanpengli@tencent.com>,
	"jmattson@google.com" <jmattson@google.com>,
	 "joro@8bytes.org" <joro@8bytes.org>,
	 "gurchetansingh@chromium.org" <gurchetansingh@chromium.org>,
	"kraxel@redhat.com" <kraxel@redhat.com>,
	 Yiwei Zhang <zzyiwei@google.com>
Subject: Re: [RFC PATCH] KVM: Introduce KVM VIRTIO device
Date: Mon, 18 Dec 2023 07:08:51 -0800	[thread overview]
Message-ID: <ZXzx1zXfZ6GV9TgI@google.com> (raw)
In-Reply-To: <BN9PR11MB5276BE04CBB6D07039086D658C93A@BN9PR11MB5276.namprd11.prod.outlook.com>

+Yiwei

On Fri, Dec 15, 2023, Kevin Tian wrote:
> > From: Zhao, Yan Y <yan.y.zhao@intel.com>
> > Sent: Thursday, December 14, 2023 6:35 PM
> > 
> > - For host non-MMIO pages,
> >   * virtio guest frontend and host backend driver should be synced to use
> >     the same memory type to map a buffer. Otherwise, there will be
> >     potential problem for incorrect memory data. But this will only impact
> >     the buggy guest alone.
> >   * for live migration,
> >     as QEMU will read all guest memory during live migration, page aliasing
> >     could happen.
> >     Current thinking is to disable live migration if a virtio device has
> >     indicated its noncoherent state.
> >     As a follow-up, we can discuss other solutions. e.g.
> >     (a) switching back to coherent path before starting live migration.
> 
> both guest/host switching to coherent or host-only?
> 
> host-only certainly is problematic if guest is still using non-coherent.
> 
> on the other hand I'm not sure whether the host/guest gfx stack is
> capable of switching between coherent and non-coherent path in-fly
> when the buffer is right being rendered.
> 
> >     (b) read/write of guest memory with clflush during live migration.
> 
> write is irrelevant as it's only done in the resume path where the
> guest is not running.
> 
> > 
> > Implementation Consideration
> > ===
> > There is a previous series [1] from google to serve the same purpose to
> > let KVM be aware of virtio GPU's noncoherent DMA status. That series
> > requires a new memslot flag, and special memslots in user space.
> > 
> > We don't choose to use memslot flag to request honoring guest memory
> > type.
> 
> memslot flag has the potential to restrict the impact e.g. when using
> clflush-before-read in migration?

Yep, exactly.  E.g. if KVM needs to ensure coherency when freeing memory back to
the host kernel, then the memslot flag will allow for a much more targeted
operation.

> Of course the implication is to honor guest type only for the selected slot
> in KVM instead of applying to the entire guest memory as in previous series
> (which selects this way because vmx_get_mt_mask() is in perf-critical path
> hence not good to check memslot flag?)

Checking a memslot flag won't impact performance.  KVM already has the memslot
when creating SPTEs, e.g. the sole caller of vmx_get_mt_mask(), make_spte(), has
access to the memslot.

That isn't coincidental, KVM _must_ have the memslot to construct the SPTE, e.g.
to retrieve the associated PFN, update write-tracking for shadow pages, etc.

I added Yiwei, who I think is planning on posting another RFC for the memslot
idea (I actually completely forgot that the memslot idea had been thought of and
posted a few years back).

> > Instead we hope to make the honoring request to be explicit (not tied to a
> > memslot flag). This is because once guest memory type is honored, not only
> > memory used by guest virtio device, but all guest memory is facing page
> > aliasing issue potentially. KVM needs a generic solution to take care of
> > page aliasing issue rather than counting on memory type of a special
> > memslot being aligned in host and guest.
> > (we can discuss what a generic solution to handle page aliasing issue will
> > look like in later follow-up series).
> > 
> > On the other hand, we choose to introduce a KVM virtio device rather than
> > just provide an ioctl to wrap kvm_arch_[un]register_noncoherent_dma()
> > directly, which is based on considerations that
> 
> I wonder it's over-engineered for the purpose.
> 
> why not just introducing a KVM_CAP and allowing the VMM to enable?
> KVM doesn't need to know the exact source of requiring it...

Agreed.  If we end up needing to grant the whole VM access for some reason, just
give userspace a direct toggle.

next prev parent reply	other threads:[~2023-12-18 19:27 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-14 10:35 [RFC PATCH] KVM: Introduce KVM VIRTIO device Yan Zhao
2023-12-14 10:35 ` Yan Zhao
2023-12-15  6:23 ` Tian, Kevin
2023-12-15  6:23   ` Tian, Kevin
2023-12-18  2:58   ` Yan Zhao
2023-12-18  2:58     ` Yan Zhao
2023-12-18 15:08   ` Sean Christopherson [this message]
2023-12-18 15:08     ` Sean Christopherson
2023-12-18 18:14     ` Yiwei Zhang
2023-12-18 18:14       ` Yiwei Zhang
2023-12-19  4:26     ` Yan Zhao
2023-12-19  4:26       ` Yan Zhao
2023-12-20  2:15       ` Yan Zhao
2023-12-21  2:12         ` Sean Christopherson
2023-12-21  2:12           ` Sean Christopherson
2023-12-21  2:44           ` Yan Zhao
2023-12-21  2:44             ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZXzx1zXfZ6GV9TgI@google.com \
    --to=seanjc@google.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gurchetansingh@chromium.org \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yongwei.ma@intel.com \
    --cc=zhenyu.z.wang@intel.com \
    --cc=zhiyuan.lv@intel.com \
    --cc=zzyiwei@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.