linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	linux-kernel@vger.kernel.org,  kvm@vger.kernel.org,
	Oliver Upton <oliver.upton@linux.dev>,
	 Tianrui Zhao <zhaotianrui@loongson.cn>,
	Bibo Mao <maobibo@loongson.cn>,
	 Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Nicholas Piggin <npiggin@gmail.com>,
	 Anup Patel <anup@brainfault.org>,
	Atish Patra <atishp@atishpatra.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	 linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	 loongarch@lists.linux.dev, linux-mips@vger.kernel.org,
	 linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org,
	 linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org,
	 linux-perf-users@vger.kernel.org
Subject: Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback
Date: Wed, 12 Jun 2024 08:22:49 -0700	[thread overview]
Message-ID: <Zmm9SdVfg18RECT5@google.com> (raw)
In-Reply-To: <20240419112432.GB2972@willie-the-truck>

On Fri, Apr 19, 2024, Will Deacon wrote:
> On Thu, Apr 18, 2024 at 12:53:26PM -0700, Sean Christopherson wrote:
> > On Thu, Apr 18, 2024, Will Deacon wrote:
> > > > I assume the idea would be to let arch code do single-page invalidations of
> > > > stage-2 entries for each gfn?
> > > 
> > > Right, as it's the only code which knows which ptes actually ended up
> > > being aged.
> > > 
> > > > Unless I'm having a brain fart, x86 can't make use of that functionality.  Intel
> > > > doesn't provide any way to do targeted invalidation of stage-2 mappings.  AMD
> > > > provides an instruction to do broadcast invalidations, but it takes a virtual
> > > > address, i.e. a stage-1 address.  I can't tell if it's a host virtual address or
> > > > a guest virtual address, but it's a moot point because KVM doen't have the guest
> > > > virtual address, and if it's a host virtual address, there would need to be valid
> > > > mappings in the host page tables for it to work, which KVM can't guarantee.
> > > 
> > > Ah, so it sounds like it would need to be an arch opt-in then.
> > 
> > Even if x86 (or some other arch code) could use the precise tracking, I think it
> > would make sense to have the behavior be arch specific.  Adding infrastructure
> > to get information from arch code, only to turn around and give it back to arch
> > code would be odd.
> 
> Sorry, yes, that's what I had in mind. Basically, a way for the arch code
> to say "I've handled the TLBI, don't worry about it."
> 
> > Unless arm64 can't do the invalidation immediately after aging the stage-2 PTE,
> > the best/easiest solution would be to let arm64 opt out of the common TLB flush
> > when a SPTE is made young.
> > 
> > With the range-based flushing bundled in, this?
> > 
> > ---
> >  include/linux/kvm_host.h |  2 ++
> >  virt/kvm/kvm_main.c      | 40 +++++++++++++++++++++++++---------------
> >  2 files changed, 27 insertions(+), 15 deletions(-)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index afbc99264ffa..8fe5f5e16919 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2010,6 +2010,8 @@ extern const struct kvm_stats_header kvm_vcpu_stats_header;
> >  extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[];
> >  
> >  #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER
> > +int kvm_arch_flush_tlb_if_young(void);
> > +
> >  static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq)
> >  {
> >  	if (unlikely(kvm->mmu_invalidate_in_progress))
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 38b498669ef9..5ebef8ef239c 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -595,6 +595,11 @@ static void kvm_null_fn(void)
> >  }
> >  #define IS_KVM_NULL_FN(fn) ((fn) == (void *)kvm_null_fn)
> >  
> > +int __weak kvm_arch_flush_tlb_if_young(void)
> > +{
> > +	return true;
> > +}
> 
> I tend to find __weak functions a little ugly, but I think the gist of the
> diff looks good to me. Thanks for putting it together!

Circling back to this, I don't think we should pursue this specific tweak, at
least not without hard data for a concrete use case.

The clear_flush_young() hook is the only callback that overloads the return value,
e.g. for invalidate_range_start(), arch code can simply return false if the flush
has already been performed.

And clear_flush_young() _always_ operates on a single page, i.e. the range will
only ever cover a single page in the primary MMU.  It's obviously possible that
KVM's MMU has mapped a transparent hugepage using multiple smaller pages, but
that should be relatively uncommon, and probably not worth optimizing for.

  parent reply	other threads:[~2024-06-12 15:22 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-05 11:58 [PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify() Paolo Bonzini
2024-04-05 11:58 ` [PATCH 1/4] KVM: delete .change_pte MMU notifier callback Paolo Bonzini
2024-04-07  4:50   ` Anup Patel
2024-04-08  7:23   ` maobibo
2024-04-08 11:45   ` Michael Ellerman
2024-04-08 13:56   ` Peter Xu
2024-04-11 16:55     ` Paolo Bonzini
2024-04-11 18:47       ` Peter Xu
2024-04-12 20:01       ` David Hildenbrand
2024-04-12 10:44   ` Will Deacon
2024-04-12 13:15     ` Marc Zyngier
2024-04-12 14:54       ` Sean Christopherson
2024-04-13  9:56         ` Marc Zyngier
2024-04-15 17:03           ` Sean Christopherson
2024-04-18 14:19             ` Will Deacon
2024-04-18 19:53               ` Sean Christopherson
2024-04-19 11:24                 ` Will Deacon
2024-04-19 13:58                   ` Sean Christopherson
2024-06-12 15:22                   ` Sean Christopherson [this message]
2024-06-12 17:31                     ` Sean Christopherson
2024-04-05 11:58 ` [PATCH 2/4] KVM: remove unused argument of kvm_handle_hva_range() Paolo Bonzini
2024-04-08  6:31   ` Philippe Mathieu-Daudé
2024-04-05 11:58 ` [PATCH 3/4] mmu_notifier: remove the .change_pte() callback Paolo Bonzini
2024-04-08  7:35   ` David Hildenbrand
2024-04-05 11:58 ` [PATCH 4/4] mm: replace set_pte_at_notify() with just set_pte_at() Paolo Bonzini
2024-04-08  6:28   ` Philippe Mathieu-Daudé
2024-04-08  7:36   ` David Hildenbrand
2024-04-10 21:30 ` [PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify() Andrew Morton
2024-04-11 16:57   ` Paolo Bonzini
2024-04-12 13:07 ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zmm9SdVfg18RECT5@google.com \
    --to=seanjc@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=anup@brainfault.org \
    --cc=atishp@atishpatra.org \
    --cc=david@redhat.com \
    --cc=kvm-riscv@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=loongarch@lists.linux.dev \
    --cc=maobibo@loongson.cn \
    --cc=maz@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=will@kernel.org \
    --cc=zhaotianrui@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).