All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: kvm-riscv@lists.infradead.org
Subject: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback
Date: Wed, 12 Jun 2024 08:22:49 -0700	[thread overview]
Message-ID: <Zmm9SdVfg18RECT5@google.com> (raw)
In-Reply-To: <20240419112432.GB2972@willie-the-truck>

On Fri, Apr 19, 2024, Will Deacon wrote:
> On Thu, Apr 18, 2024 at 12:53:26PM -0700, Sean Christopherson wrote:
> > On Thu, Apr 18, 2024, Will Deacon wrote:
> > > > I assume the idea would be to let arch code do single-page invalidations of
> > > > stage-2 entries for each gfn?
> > > 
> > > Right, as it's the only code which knows which ptes actually ended up
> > > being aged.
> > > 
> > > > Unless I'm having a brain fart, x86 can't make use of that functionality.  Intel
> > > > doesn't provide any way to do targeted invalidation of stage-2 mappings.  AMD
> > > > provides an instruction to do broadcast invalidations, but it takes a virtual
> > > > address, i.e. a stage-1 address.  I can't tell if it's a host virtual address or
> > > > a guest virtual address, but it's a moot point because KVM doen't have the guest
> > > > virtual address, and if it's a host virtual address, there would need to be valid
> > > > mappings in the host page tables for it to work, which KVM can't guarantee.
> > > 
> > > Ah, so it sounds like it would need to be an arch opt-in then.
> > 
> > Even if x86 (or some other arch code) could use the precise tracking, I think it
> > would make sense to have the behavior be arch specific.  Adding infrastructure
> > to get information from arch code, only to turn around and give it back to arch
> > code would be odd.
> 
> Sorry, yes, that's what I had in mind. Basically, a way for the arch code
> to say "I've handled the TLBI, don't worry about it."
> 
> > Unless arm64 can't do the invalidation immediately after aging the stage-2 PTE,
> > the best/easiest solution would be to let arm64 opt out of the common TLB flush
> > when a SPTE is made young.
> > 
> > With the range-based flushing bundled in, this?
> > 
> > ---
> >  include/linux/kvm_host.h |  2 ++
> >  virt/kvm/kvm_main.c      | 40 +++++++++++++++++++++++++---------------
> >  2 files changed, 27 insertions(+), 15 deletions(-)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index afbc99264ffa..8fe5f5e16919 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2010,6 +2010,8 @@ extern const struct kvm_stats_header kvm_vcpu_stats_header;
> >  extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[];
> >  
> >  #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER
> > +int kvm_arch_flush_tlb_if_young(void);
> > +
> >  static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq)
> >  {
> >  	if (unlikely(kvm->mmu_invalidate_in_progress))
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 38b498669ef9..5ebef8ef239c 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -595,6 +595,11 @@ static void kvm_null_fn(void)
> >  }
> >  #define IS_KVM_NULL_FN(fn) ((fn) == (void *)kvm_null_fn)
> >  
> > +int __weak kvm_arch_flush_tlb_if_young(void)
> > +{
> > +	return true;
> > +}
> 
> I tend to find __weak functions a little ugly, but I think the gist of the
> diff looks good to me. Thanks for putting it together!

Circling back to this, I don't think we should pursue this specific tweak, at
least not without hard data for a concrete use case.

The clear_flush_young() hook is the only callback that overloads the return value,
e.g. for invalidate_range_start(), arch code can simply return false if the flush
has already been performed.

And clear_flush_young() _always_ operates on a single page, i.e. the range will
only ever cover a single page in the primary MMU.  It's obviously possible that
KVM's MMU has mapped a transparent hugepage using multiple smaller pages, but
that should be relatively uncommon, and probably not worth optimizing for.


WARNING: multiple messages have this Message-ID (diff)
From: Sean Christopherson <seanjc@google.com>
To: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	linux-kernel@vger.kernel.org,  kvm@vger.kernel.org,
	Oliver Upton <oliver.upton@linux.dev>,
	 Tianrui Zhao <zhaotianrui@loongson.cn>,
	Bibo Mao <maobibo@loongson.cn>,
	 Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Nicholas Piggin <npiggin@gmail.com>,
	 Anup Patel <anup@brainfault.org>,
	Atish Patra <atishp@atishpatra.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	 linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	 loongarch@lists.linux.dev, linux-mips@vger.kernel.org,
	 linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org,
	 linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org,
	 linux-perf-users@vger.kernel.org
Subject: Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback
Date: Wed, 12 Jun 2024 08:22:49 -0700	[thread overview]
Message-ID: <Zmm9SdVfg18RECT5@google.com> (raw)
In-Reply-To: <20240419112432.GB2972@willie-the-truck>

On Fri, Apr 19, 2024, Will Deacon wrote:
> On Thu, Apr 18, 2024 at 12:53:26PM -0700, Sean Christopherson wrote:
> > On Thu, Apr 18, 2024, Will Deacon wrote:
> > > > I assume the idea would be to let arch code do single-page invalidations of
> > > > stage-2 entries for each gfn?
> > > 
> > > Right, as it's the only code which knows which ptes actually ended up
> > > being aged.
> > > 
> > > > Unless I'm having a brain fart, x86 can't make use of that functionality.  Intel
> > > > doesn't provide any way to do targeted invalidation of stage-2 mappings.  AMD
> > > > provides an instruction to do broadcast invalidations, but it takes a virtual
> > > > address, i.e. a stage-1 address.  I can't tell if it's a host virtual address or
> > > > a guest virtual address, but it's a moot point because KVM doen't have the guest
> > > > virtual address, and if it's a host virtual address, there would need to be valid
> > > > mappings in the host page tables for it to work, which KVM can't guarantee.
> > > 
> > > Ah, so it sounds like it would need to be an arch opt-in then.
> > 
> > Even if x86 (or some other arch code) could use the precise tracking, I think it
> > would make sense to have the behavior be arch specific.  Adding infrastructure
> > to get information from arch code, only to turn around and give it back to arch
> > code would be odd.
> 
> Sorry, yes, that's what I had in mind. Basically, a way for the arch code
> to say "I've handled the TLBI, don't worry about it."
> 
> > Unless arm64 can't do the invalidation immediately after aging the stage-2 PTE,
> > the best/easiest solution would be to let arm64 opt out of the common TLB flush
> > when a SPTE is made young.
> > 
> > With the range-based flushing bundled in, this?
> > 
> > ---
> >  include/linux/kvm_host.h |  2 ++
> >  virt/kvm/kvm_main.c      | 40 +++++++++++++++++++++++++---------------
> >  2 files changed, 27 insertions(+), 15 deletions(-)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index afbc99264ffa..8fe5f5e16919 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2010,6 +2010,8 @@ extern const struct kvm_stats_header kvm_vcpu_stats_header;
> >  extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[];
> >  
> >  #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER
> > +int kvm_arch_flush_tlb_if_young(void);
> > +
> >  static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq)
> >  {
> >  	if (unlikely(kvm->mmu_invalidate_in_progress))
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 38b498669ef9..5ebef8ef239c 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -595,6 +595,11 @@ static void kvm_null_fn(void)
> >  }
> >  #define IS_KVM_NULL_FN(fn) ((fn) == (void *)kvm_null_fn)
> >  
> > +int __weak kvm_arch_flush_tlb_if_young(void)
> > +{
> > +	return true;
> > +}
> 
> I tend to find __weak functions a little ugly, but I think the gist of the
> diff looks good to me. Thanks for putting it together!

Circling back to this, I don't think we should pursue this specific tweak, at
least not without hard data for a concrete use case.

The clear_flush_young() hook is the only callback that overloads the return value,
e.g. for invalidate_range_start(), arch code can simply return false if the flush
has already been performed.

And clear_flush_young() _always_ operates on a single page, i.e. the range will
only ever cover a single page in the primary MMU.  It's obviously possible that
KVM's MMU has mapped a transparent hugepage using multiple smaller pages, but
that should be relatively uncommon, and probably not worth optimizing for.

WARNING: multiple messages have this Message-ID (diff)
From: Sean Christopherson <seanjc@google.com>
To: Will Deacon <will@kernel.org>
Cc: kvm@vger.kernel.org, David Hildenbrand <david@redhat.com>,
	linux-mips@vger.kernel.org, linux-mm@kvack.org,
	Marc Zyngier <maz@kernel.org>,
	linux-trace-kernel@vger.kernel.org,
	Nicholas Piggin <npiggin@gmail.com>,
	Bibo Mao <maobibo@loongson.cn>,
	loongarch@lists.linux.dev, Atish Patra <atishp@atishpatra.org>,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	linux-kernel@vger.kernel.org,
	Oliver Upton <oliver.upton@linux.dev>,
	linux-perf-users@vger.kernel.org, kvm-riscv@lists.infradead.org,
	Anup Patel <anup@brainfault.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tianrui Zhao <zhaotianrui@loongson.cn>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback
Date: Wed, 12 Jun 2024 08:22:49 -0700	[thread overview]
Message-ID: <Zmm9SdVfg18RECT5@google.com> (raw)
In-Reply-To: <20240419112432.GB2972@willie-the-truck>

On Fri, Apr 19, 2024, Will Deacon wrote:
> On Thu, Apr 18, 2024 at 12:53:26PM -0700, Sean Christopherson wrote:
> > On Thu, Apr 18, 2024, Will Deacon wrote:
> > > > I assume the idea would be to let arch code do single-page invalidations of
> > > > stage-2 entries for each gfn?
> > > 
> > > Right, as it's the only code which knows which ptes actually ended up
> > > being aged.
> > > 
> > > > Unless I'm having a brain fart, x86 can't make use of that functionality.  Intel
> > > > doesn't provide any way to do targeted invalidation of stage-2 mappings.  AMD
> > > > provides an instruction to do broadcast invalidations, but it takes a virtual
> > > > address, i.e. a stage-1 address.  I can't tell if it's a host virtual address or
> > > > a guest virtual address, but it's a moot point because KVM doen't have the guest
> > > > virtual address, and if it's a host virtual address, there would need to be valid
> > > > mappings in the host page tables for it to work, which KVM can't guarantee.
> > > 
> > > Ah, so it sounds like it would need to be an arch opt-in then.
> > 
> > Even if x86 (or some other arch code) could use the precise tracking, I think it
> > would make sense to have the behavior be arch specific.  Adding infrastructure
> > to get information from arch code, only to turn around and give it back to arch
> > code would be odd.
> 
> Sorry, yes, that's what I had in mind. Basically, a way for the arch code
> to say "I've handled the TLBI, don't worry about it."
> 
> > Unless arm64 can't do the invalidation immediately after aging the stage-2 PTE,
> > the best/easiest solution would be to let arm64 opt out of the common TLB flush
> > when a SPTE is made young.
> > 
> > With the range-based flushing bundled in, this?
> > 
> > ---
> >  include/linux/kvm_host.h |  2 ++
> >  virt/kvm/kvm_main.c      | 40 +++++++++++++++++++++++++---------------
> >  2 files changed, 27 insertions(+), 15 deletions(-)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index afbc99264ffa..8fe5f5e16919 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2010,6 +2010,8 @@ extern const struct kvm_stats_header kvm_vcpu_stats_header;
> >  extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[];
> >  
> >  #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER
> > +int kvm_arch_flush_tlb_if_young(void);
> > +
> >  static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq)
> >  {
> >  	if (unlikely(kvm->mmu_invalidate_in_progress))
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 38b498669ef9..5ebef8ef239c 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -595,6 +595,11 @@ static void kvm_null_fn(void)
> >  }
> >  #define IS_KVM_NULL_FN(fn) ((fn) == (void *)kvm_null_fn)
> >  
> > +int __weak kvm_arch_flush_tlb_if_young(void)
> > +{
> > +	return true;
> > +}
> 
> I tend to find __weak functions a little ugly, but I think the gist of the
> diff looks good to me. Thanks for putting it together!

Circling back to this, I don't think we should pursue this specific tweak, at
least not without hard data for a concrete use case.

The clear_flush_young() hook is the only callback that overloads the return value,
e.g. for invalidate_range_start(), arch code can simply return false if the flush
has already been performed.

And clear_flush_young() _always_ operates on a single page, i.e. the range will
only ever cover a single page in the primary MMU.  It's obviously possible that
KVM's MMU has mapped a transparent hugepage using multiple smaller pages, but
that should be relatively uncommon, and probably not worth optimizing for.

  parent reply	other threads:[~2024-06-12 15:22 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-05 11:58 [PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify() Paolo Bonzini
2024-04-05 11:58 ` Paolo Bonzini
2024-04-05 11:58 ` Paolo Bonzini
2024-04-05 11:58 ` Paolo Bonzini
2024-04-05 11:58 ` [PATCH 1/4] KVM: delete .change_pte MMU notifier callback Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-07  4:50   ` Anup Patel
2024-04-07  4:50     ` Anup Patel
2024-04-07  4:50     ` Anup Patel
2024-04-07  4:50     ` Anup Patel
2024-04-08  7:23   ` maobibo
2024-04-08  7:23     ` maobibo
2024-04-08  7:23     ` maobibo
2024-04-08  7:23     ` maobibo
2024-04-08 11:45   ` Michael Ellerman
2024-04-08 11:45     ` Michael Ellerman
2024-04-08 11:45     ` Michael Ellerman
2024-04-08 11:45     ` Michael Ellerman
2024-04-08 13:56   ` Peter Xu
2024-04-08 13:56     ` Peter Xu
2024-04-08 13:56     ` Peter Xu
2024-04-08 13:56     ` Peter Xu
2024-04-11 16:55     ` Paolo Bonzini
2024-04-11 16:55       ` Paolo Bonzini
2024-04-11 16:55       ` Paolo Bonzini
2024-04-11 16:55       ` Paolo Bonzini
2024-04-11 18:47       ` Peter Xu
2024-04-11 18:47         ` Peter Xu
2024-04-11 18:47         ` Peter Xu
2024-04-11 18:47         ` Peter Xu
2024-04-12 20:01       ` David Hildenbrand
2024-04-12 20:01         ` David Hildenbrand
2024-04-12 20:01         ` David Hildenbrand
2024-04-12 20:01         ` David Hildenbrand
2024-04-12 10:44   ` Will Deacon
2024-04-12 10:44     ` Will Deacon
2024-04-12 10:44     ` Will Deacon
2024-04-12 10:44     ` Will Deacon
2024-04-12 13:15     ` Marc Zyngier
2024-04-12 13:15       ` Marc Zyngier
2024-04-12 13:15       ` Marc Zyngier
2024-04-12 13:15       ` Marc Zyngier
2024-04-12 14:54       ` Sean Christopherson
2024-04-12 14:54         ` Sean Christopherson
2024-04-12 14:54         ` Sean Christopherson
2024-04-12 14:54         ` Sean Christopherson
2024-04-13  9:56         ` Marc Zyngier
2024-04-13  9:56           ` Marc Zyngier
2024-04-13  9:56           ` Marc Zyngier
2024-04-13  9:56           ` Marc Zyngier
2024-04-15 17:03           ` Sean Christopherson
2024-04-15 17:03             ` Sean Christopherson
2024-04-15 17:03             ` Sean Christopherson
2024-04-15 17:03             ` Sean Christopherson
2024-04-18 14:19             ` Will Deacon
2024-04-18 14:19               ` Will Deacon
2024-04-18 14:19               ` Will Deacon
2024-04-18 14:19               ` Will Deacon
2024-04-18 19:53               ` Sean Christopherson
2024-04-18 19:53                 ` Sean Christopherson
2024-04-18 19:53                 ` Sean Christopherson
2024-04-18 19:53                 ` Sean Christopherson
2024-04-19 11:24                 ` Will Deacon
2024-04-19 11:24                   ` Will Deacon
2024-04-19 11:24                   ` Will Deacon
2024-04-19 11:24                   ` Will Deacon
2024-04-19 13:58                   ` Sean Christopherson
2024-04-19 13:58                     ` Sean Christopherson
2024-04-19 13:58                     ` Sean Christopherson
2024-04-19 13:58                     ` Sean Christopherson
2024-06-12 15:22                   ` Sean Christopherson [this message]
2024-06-12 15:22                     ` Sean Christopherson
2024-06-12 15:22                     ` Sean Christopherson
2024-06-12 17:31                     ` Sean Christopherson
2024-06-12 17:31                       ` Sean Christopherson
2024-06-12 17:31                       ` Sean Christopherson
2024-04-05 11:58 ` [PATCH 2/4] KVM: remove unused argument of kvm_handle_hva_range() Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-08  6:31   ` Philippe Mathieu-Daudé
2024-04-08  6:31     ` Philippe Mathieu-Daudé
2024-04-08  6:31     ` Philippe Mathieu-Daudé
2024-04-08  6:31     ` Philippe Mathieu-Daudé
2024-04-05 11:58 ` [PATCH 3/4] mmu_notifier: remove the .change_pte() callback Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-08  7:35   ` David Hildenbrand
2024-04-08  7:35     ` David Hildenbrand
2024-04-08  7:35     ` David Hildenbrand
2024-04-08  7:35     ` David Hildenbrand
2024-04-05 11:58 ` [PATCH 4/4] mm: replace set_pte_at_notify() with just set_pte_at() Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-05 11:58   ` Paolo Bonzini
2024-04-08  6:28   ` Philippe Mathieu-Daudé
2024-04-08  6:28     ` Philippe Mathieu-Daudé
2024-04-08  6:28     ` Philippe Mathieu-Daudé
2024-04-08  6:28     ` Philippe Mathieu-Daudé
2024-04-08  7:36   ` David Hildenbrand
2024-04-08  7:36     ` David Hildenbrand
2024-04-08  7:36     ` David Hildenbrand
2024-04-08  7:36     ` David Hildenbrand
2024-04-10 21:30 ` [PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify() Andrew Morton
2024-04-10 21:30   ` Andrew Morton
2024-04-10 21:30   ` Andrew Morton
2024-04-10 21:30   ` Andrew Morton
2024-04-11 16:57   ` Paolo Bonzini
2024-04-11 16:57     ` Paolo Bonzini
2024-04-11 16:57     ` Paolo Bonzini
2024-04-11 16:57     ` Paolo Bonzini
2024-04-12 13:07 ` Marc Zyngier
2024-04-12 13:07   ` Marc Zyngier
2024-04-12 13:07   ` Marc Zyngier
2024-04-12 13:07   ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zmm9SdVfg18RECT5@google.com \
    --to=seanjc@google.com \
    --cc=kvm-riscv@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.