All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Matlack <dmatlack@google.com>
To: James Houghton <jthoughton@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Paolo Bonzini <pbonzini@redhat.com>, Yu Zhao <yuzhao@google.com>,
	Marc Zyngier <maz@kernel.org>,
	Oliver Upton <oliver.upton@linux.dev>,
	Sean Christopherson <seanjc@google.com>,
	Jonathan Corbet <corbet@lwn.net>,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Shaoqin Huang <shahuang@redhat.com>,
	Gavin Shan <gshan@redhat.com>,
	Ricardo Koller <ricarkol@google.com>,
	Raghavendra Rao Ananta <rananta@google.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	David Rientjes <rientjes@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	kvm@vger.kernel.org, linux-mm@kvack.org,
	linux-trace-kernel@vger.kernel.org
Subject: Re: [PATCH v3 3/7] KVM: Add basic bitmap support into kvm_mmu_notifier_test/clear_young
Date: Fri, 12 Apr 2024 13:28:40 -0700	[thread overview]
Message-ID: <ZhmZeFPRwOJVue5y@google.com> (raw)
In-Reply-To: <20240401232946.1837665-4-jthoughton@google.com>

On 2024-04-01 11:29 PM, James Houghton wrote:
> Add kvm_arch_prepare_bitmap_age() for architectures to indiciate that
> they support bitmap-based aging in kvm_mmu_notifier_test_clear_young()
> and that they do not need KVM to grab the MMU lock for writing. This
> function allows architectures to do other locking or other preparatory
> work that it needs.

There's a lot going on here. I know it's extra work but I think the
series would be easier to understand and simplify if you introduced the
KVM support for lockless test/clear_young() first, and then introduce
support for the bitmap-based look-around.

Specifically:

 1. Make all test/clear_young() notifiers lockless. i.e. Move the
    mmu_lock into the architecture-specific code (kvm_age_gfn() and
    kvm_test_age_gfn()).

 2. Convert KVM/x86's kvm_{test,}_age_gfn() to be lockless for the TDP
    MMU.

 4. Convert KVM/arm64's kvm_{test,}_age_gfn() to hold the mmu_lock in
    read-mode.

 5. Add bitmap-based look-around support to KVM/x86 and KVM/arm64
    (probably 2-3 patches).

> 
> If an architecture does not implement kvm_arch_prepare_bitmap_age() or
> is unable to do bitmap-based aging at runtime (and marks the bitmap as
> unreliable):
>  1. If a bitmap was provided, we inform the caller that the bitmap is
>     unreliable (MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE).
>  2. If a bitmap was not provided, fall back to the old logic.
> 
> Also add logic for architectures to easily use the provided bitmap if
> they are able. The expectation is that the architecture's implementation
> of kvm_gfn_test_age() will use kvm_gfn_record_young(), and
> kvm_gfn_age() will use kvm_gfn_should_age().
> 
> Suggested-by: Yu Zhao <yuzhao@google.com>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  include/linux/kvm_host.h | 60 ++++++++++++++++++++++++++
>  virt/kvm/kvm_main.c      | 92 +++++++++++++++++++++++++++++-----------
>  2 files changed, 127 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 1800d03a06a9..5862fd7b5f9b 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1992,6 +1992,26 @@ extern const struct _kvm_stats_desc kvm_vm_stats_desc[];
>  extern const struct kvm_stats_header kvm_vcpu_stats_header;
>  extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[];
>  
> +/*
> + * Architectures that support using bitmaps for kvm_age_gfn() and
> + * kvm_test_age_gfn should return true for kvm_arch_prepare_bitmap_age()
> + * and do any work they need to prepare. The subsequent walk will not
> + * automatically grab the KVM MMU lock, so some architectures may opt
> + * to grab it.
> + *
> + * If true is returned, a subsequent call to kvm_arch_finish_bitmap_age() is
> + * guaranteed.
> + */
> +#ifndef kvm_arch_prepare_bitmap_age
> +static inline bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn)

I find the name of these architecture callbacks misleading/confusing.
The lockless path is used even when a bitmap is not provided. i.e.
bitmap can be NULL in between kvm_arch_prepare/finish_bitmap_age().

> +{
> +	return false;
> +}
> +#endif
> +#ifndef kvm_arch_finish_bitmap_age
> +static inline void kvm_arch_finish_bitmap_age(struct mmu_notifier *mn) {}
> +#endif

kvm_arch_finish_bitmap_age() seems unnecessary. I think the KVM/arm64
code could acquire/release the mmu_lock in read-mode in
kvm_test_age_gfn() and kvm_age_gfn() right?

> +
>  #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER
>  static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
>  {
> @@ -2076,9 +2096,16 @@ static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm,
>  	return READ_ONCE(kvm->mmu_invalidate_seq) != mmu_seq;
>  }
>  
> +struct test_clear_young_metadata {
> +	unsigned long *bitmap;
> +	unsigned long bitmap_offset_end;

bitmap_offset_end is unused.

> +	unsigned long end;
> +	bool unreliable;
> +};
>  union kvm_mmu_notifier_arg {
>  	pte_t pte;
>  	unsigned long attributes;
> +	struct test_clear_young_metadata *metadata;

nit: Maybe s/metadata/test_clear_young/ ?

>  };
>  
>  struct kvm_gfn_range {
> @@ -2087,11 +2114,44 @@ struct kvm_gfn_range {
>  	gfn_t end;
>  	union kvm_mmu_notifier_arg arg;
>  	bool may_block;
> +	bool lockless;

Please document this as it's somewhat subtle. A reader might think this
implies the entire operation runs without taking the mmu_lock.

>  };
>  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> +
> +static inline void kvm_age_set_unreliable(struct kvm_gfn_range *range)
> +{
> +	struct test_clear_young_metadata *args = range->arg.metadata;
> +
> +	args->unreliable = true;
> +}
> +static inline unsigned long kvm_young_bitmap_offset(struct kvm_gfn_range *range,
> +						    gfn_t gfn)
> +{
> +	struct test_clear_young_metadata *args = range->arg.metadata;
> +
> +	return hva_to_gfn_memslot(args->end - 1, range->slot) - gfn;
> +}
> +static inline void kvm_gfn_record_young(struct kvm_gfn_range *range, gfn_t gfn)
> +{
> +	struct test_clear_young_metadata *args = range->arg.metadata;
> +
> +	WARN_ON_ONCE(gfn < range->start || gfn >= range->end);
> +	if (args->bitmap)
> +		__set_bit(kvm_young_bitmap_offset(range, gfn), args->bitmap);
> +}
> +static inline bool kvm_gfn_should_age(struct kvm_gfn_range *range, gfn_t gfn)
> +{
> +	struct test_clear_young_metadata *args = range->arg.metadata;
> +
> +	WARN_ON_ONCE(gfn < range->start || gfn >= range->end);
> +	if (args->bitmap)
> +		return test_bit(kvm_young_bitmap_offset(range, gfn),
> +				args->bitmap);
> +	return true;
> +}
>  #endif
>  
>  #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d0545d88c802..7d80321e2ece 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -550,6 +550,7 @@ struct kvm_mmu_notifier_range {
>  	on_lock_fn_t on_lock;
>  	bool flush_on_ret;
>  	bool may_block;
> +	bool lockless;
>  };
>  
>  /*
> @@ -598,6 +599,8 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
>  	struct kvm_memslots *slots;
>  	int i, idx;
>  
> +	BUILD_BUG_ON(sizeof(gfn_range.arg) != sizeof(gfn_range.arg.pte));
> +
>  	if (WARN_ON_ONCE(range->end <= range->start))
>  		return r;
>  
> @@ -637,15 +640,18 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
>  			gfn_range.start = hva_to_gfn_memslot(hva_start, slot);
>  			gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot);
>  			gfn_range.slot = slot;
> +			gfn_range.lockless = range->lockless;
>  
>  			if (!r.found_memslot) {
>  				r.found_memslot = true;
> -				KVM_MMU_LOCK(kvm);
> -				if (!IS_KVM_NULL_FN(range->on_lock))
> -					range->on_lock(kvm);
> -
> -				if (IS_KVM_NULL_FN(range->handler))
> -					break;
> +				if (!range->lockless) {
> +					KVM_MMU_LOCK(kvm);
> +					if (!IS_KVM_NULL_FN(range->on_lock))
> +						range->on_lock(kvm);
> +
> +					if (IS_KVM_NULL_FN(range->handler))
> +						break;
> +				}
>  			}
>  			r.ret |= range->handler(kvm, &gfn_range);
>  		}
> @@ -654,7 +660,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
>  	if (range->flush_on_ret && r.ret)
>  		kvm_flush_remote_tlbs(kvm);
>  
> -	if (r.found_memslot)
> +	if (r.found_memslot && !range->lockless)
>  		KVM_MMU_UNLOCK(kvm);
>  
>  	srcu_read_unlock(&kvm->srcu, idx);
> @@ -682,19 +688,24 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
>  	return __kvm_handle_hva_range(kvm, &range).ret;
>  }
>  
> -static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn,
> -							 unsigned long start,
> -							 unsigned long end,
> -							 gfn_handler_t handler)
> +static __always_inline int kvm_handle_hva_range_no_flush(
> +		struct mmu_notifier *mn,
> +		unsigned long start,
> +		unsigned long end,
> +		gfn_handler_t handler,
> +		union kvm_mmu_notifier_arg arg,
> +		bool lockless)
>  {
>  	struct kvm *kvm = mmu_notifier_to_kvm(mn);
>  	const struct kvm_mmu_notifier_range range = {
>  		.start		= start,
>  		.end		= end,
>  		.handler	= handler,
> +		.arg		= arg,
>  		.on_lock	= (void *)kvm_null_fn,
>  		.flush_on_ret	= false,
>  		.may_block	= false,
> +		.lockless	= lockless,
>  	};
>  
>  	return __kvm_handle_hva_range(kvm, &range).ret;
> @@ -909,15 +920,36 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
>  				    kvm_age_gfn);
>  }
>  
> -static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn,
> -					struct mm_struct *mm,
> -					unsigned long start,
> -					unsigned long end,
> -					unsigned long *bitmap)
> +static int kvm_mmu_notifier_test_clear_young(struct mmu_notifier *mn,
> +					     struct mm_struct *mm,
> +					     unsigned long start,
> +					     unsigned long end,
> +					     unsigned long *bitmap,
> +					     bool clear)

Perhaps pass in the callback (kvm_test_age_gfn/kvm_age_gfn) instead of
true/false to avoid the naked booleans at the callsites?

>  {
> -	trace_kvm_age_hva(start, end);
> +	if (kvm_arch_prepare_bitmap_age(mn)) {
> +		struct test_clear_young_metadata args = {
> +			.bitmap		= bitmap,
> +			.end		= end,
> +			.unreliable	= false,
> +		};
> +		union kvm_mmu_notifier_arg arg = {
> +			.metadata = &args
> +		};
> +		bool young;
> +
> +		young = kvm_handle_hva_range_no_flush(
> +					mn, start, end,
> +					clear ? kvm_age_gfn : kvm_test_age_gfn,
> +					arg, true);

I suspect the end result will be cleaner we make all architectures
lockless. i.e. Move the mmu_lock acquire/release into the
architecture-specific code.

This could result in more acquire/release calls (one per memslot that
overlaps the provided range) but that should be a single memslot in the
majority of cases I think?

Then unconditionally pass in the metadata structure.

Then you don't need any special casing for the fast path / bitmap path.
The only thing needed is to figure out whether to return
MMU_NOTIFIER_YOUNG vs MMU_NOTIFIER_YOUNG_LOOK_AROUND and that can be
plumbed via test_clear_young_metadata or by changing gfn_handler_t to
return an int instead of a bool.

> +
> +		kvm_arch_finish_bitmap_age(mn);
>  
> -	/* We don't support bitmaps. Don't test or clear anything. */
> +		if (!args.unreliable)
> +			return young ? MMU_NOTIFIER_YOUNG_FAST : 0;
> +	}
> +
> +	/* A bitmap was passed but the architecture doesn't support bitmaps */
>  	if (bitmap)
>  		return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE;
>  
> @@ -934,7 +966,21 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn,
>  	 * cadence. If we find this inaccurate, we might come up with a
>  	 * more sophisticated heuristic later.
>  	 */
> -	return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn);
> +	return kvm_handle_hva_range_no_flush(
> +			mn, start, end, clear ? kvm_age_gfn : kvm_test_age_gfn,
> +			KVM_MMU_NOTIFIER_NO_ARG, false);

Should this return MMU_NOTIFIER_YOUNG explicitly? This code is assuming
MMU_NOTIFIER_YOUNG == (int)true.

> +}
> +
> +static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn,
> +					struct mm_struct *mm,
> +					unsigned long start,
> +					unsigned long end,
> +					unsigned long *bitmap)
> +{
> +	trace_kvm_age_hva(start, end);
> +
> +	return kvm_mmu_notifier_test_clear_young(mn, mm, start, end, bitmap,
> +						 true);
>  }
>  
>  static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn,
> @@ -945,12 +991,8 @@ static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn,
>  {
>  	trace_kvm_test_age_hva(start, end);
>  
> -	/* We don't support bitmaps. Don't test or clear anything. */
> -	if (bitmap)
> -		return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE;
> -
> -	return kvm_handle_hva_range_no_flush(mn, start, end,
> -					     kvm_test_age_gfn);
> +	return kvm_mmu_notifier_test_clear_young(mn, mm, start, end, bitmap,
> +						 false);
>  }
>  
>  static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
> -- 
> 2.44.0.478.gd926399ef9-goog
> 

WARNING: multiple messages have this Message-ID (diff)
From: David Matlack <dmatlack@google.com>
To: James Houghton <jthoughton@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Paolo Bonzini <pbonzini@redhat.com>, Yu Zhao <yuzhao@google.com>,
	Marc Zyngier <maz@kernel.org>,
	Oliver Upton <oliver.upton@linux.dev>,
	Sean Christopherson <seanjc@google.com>,
	Jonathan Corbet <corbet@lwn.net>,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Shaoqin Huang <shahuang@redhat.com>,
	Gavin Shan <gshan@redhat.com>,
	Ricardo Koller <ricarkol@google.com>,
	Raghavendra Rao Ananta <rananta@google.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	David Rientjes <rientjes@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	kvm@vger.kernel.org, linux-mm@kvack.org,
	linux-trace-kernel@vger.kernel.org
Subject: Re: [PATCH v3 3/7] KVM: Add basic bitmap support into kvm_mmu_notifier_test/clear_young
Date: Fri, 12 Apr 2024 13:28:40 -0700	[thread overview]
Message-ID: <ZhmZeFPRwOJVue5y@google.com> (raw)
In-Reply-To: <20240401232946.1837665-4-jthoughton@google.com>

On 2024-04-01 11:29 PM, James Houghton wrote:
> Add kvm_arch_prepare_bitmap_age() for architectures to indiciate that
> they support bitmap-based aging in kvm_mmu_notifier_test_clear_young()
> and that they do not need KVM to grab the MMU lock for writing. This
> function allows architectures to do other locking or other preparatory
> work that it needs.

There's a lot going on here. I know it's extra work but I think the
series would be easier to understand and simplify if you introduced the
KVM support for lockless test/clear_young() first, and then introduce
support for the bitmap-based look-around.

Specifically:

 1. Make all test/clear_young() notifiers lockless. i.e. Move the
    mmu_lock into the architecture-specific code (kvm_age_gfn() and
    kvm_test_age_gfn()).

 2. Convert KVM/x86's kvm_{test,}_age_gfn() to be lockless for the TDP
    MMU.

 4. Convert KVM/arm64's kvm_{test,}_age_gfn() to hold the mmu_lock in
    read-mode.

 5. Add bitmap-based look-around support to KVM/x86 and KVM/arm64
    (probably 2-3 patches).

> 
> If an architecture does not implement kvm_arch_prepare_bitmap_age() or
> is unable to do bitmap-based aging at runtime (and marks the bitmap as
> unreliable):
>  1. If a bitmap was provided, we inform the caller that the bitmap is
>     unreliable (MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE).
>  2. If a bitmap was not provided, fall back to the old logic.
> 
> Also add logic for architectures to easily use the provided bitmap if
> they are able. The expectation is that the architecture's implementation
> of kvm_gfn_test_age() will use kvm_gfn_record_young(), and
> kvm_gfn_age() will use kvm_gfn_should_age().
> 
> Suggested-by: Yu Zhao <yuzhao@google.com>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  include/linux/kvm_host.h | 60 ++++++++++++++++++++++++++
>  virt/kvm/kvm_main.c      | 92 +++++++++++++++++++++++++++++-----------
>  2 files changed, 127 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 1800d03a06a9..5862fd7b5f9b 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1992,6 +1992,26 @@ extern const struct _kvm_stats_desc kvm_vm_stats_desc[];
>  extern const struct kvm_stats_header kvm_vcpu_stats_header;
>  extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[];
>  
> +/*
> + * Architectures that support using bitmaps for kvm_age_gfn() and
> + * kvm_test_age_gfn should return true for kvm_arch_prepare_bitmap_age()
> + * and do any work they need to prepare. The subsequent walk will not
> + * automatically grab the KVM MMU lock, so some architectures may opt
> + * to grab it.
> + *
> + * If true is returned, a subsequent call to kvm_arch_finish_bitmap_age() is
> + * guaranteed.
> + */
> +#ifndef kvm_arch_prepare_bitmap_age
> +static inline bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn)

I find the name of these architecture callbacks misleading/confusing.
The lockless path is used even when a bitmap is not provided. i.e.
bitmap can be NULL in between kvm_arch_prepare/finish_bitmap_age().

> +{
> +	return false;
> +}
> +#endif
> +#ifndef kvm_arch_finish_bitmap_age
> +static inline void kvm_arch_finish_bitmap_age(struct mmu_notifier *mn) {}
> +#endif

kvm_arch_finish_bitmap_age() seems unnecessary. I think the KVM/arm64
code could acquire/release the mmu_lock in read-mode in
kvm_test_age_gfn() and kvm_age_gfn() right?

> +
>  #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER
>  static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
>  {
> @@ -2076,9 +2096,16 @@ static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm,
>  	return READ_ONCE(kvm->mmu_invalidate_seq) != mmu_seq;
>  }
>  
> +struct test_clear_young_metadata {
> +	unsigned long *bitmap;
> +	unsigned long bitmap_offset_end;

bitmap_offset_end is unused.

> +	unsigned long end;
> +	bool unreliable;
> +};
>  union kvm_mmu_notifier_arg {
>  	pte_t pte;
>  	unsigned long attributes;
> +	struct test_clear_young_metadata *metadata;

nit: Maybe s/metadata/test_clear_young/ ?

>  };
>  
>  struct kvm_gfn_range {
> @@ -2087,11 +2114,44 @@ struct kvm_gfn_range {
>  	gfn_t end;
>  	union kvm_mmu_notifier_arg arg;
>  	bool may_block;
> +	bool lockless;

Please document this as it's somewhat subtle. A reader might think this
implies the entire operation runs without taking the mmu_lock.

>  };
>  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> +
> +static inline void kvm_age_set_unreliable(struct kvm_gfn_range *range)
> +{
> +	struct test_clear_young_metadata *args = range->arg.metadata;
> +
> +	args->unreliable = true;
> +}
> +static inline unsigned long kvm_young_bitmap_offset(struct kvm_gfn_range *range,
> +						    gfn_t gfn)
> +{
> +	struct test_clear_young_metadata *args = range->arg.metadata;
> +
> +	return hva_to_gfn_memslot(args->end - 1, range->slot) - gfn;
> +}
> +static inline void kvm_gfn_record_young(struct kvm_gfn_range *range, gfn_t gfn)
> +{
> +	struct test_clear_young_metadata *args = range->arg.metadata;
> +
> +	WARN_ON_ONCE(gfn < range->start || gfn >= range->end);
> +	if (args->bitmap)
> +		__set_bit(kvm_young_bitmap_offset(range, gfn), args->bitmap);
> +}
> +static inline bool kvm_gfn_should_age(struct kvm_gfn_range *range, gfn_t gfn)
> +{
> +	struct test_clear_young_metadata *args = range->arg.metadata;
> +
> +	WARN_ON_ONCE(gfn < range->start || gfn >= range->end);
> +	if (args->bitmap)
> +		return test_bit(kvm_young_bitmap_offset(range, gfn),
> +				args->bitmap);
> +	return true;
> +}
>  #endif
>  
>  #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d0545d88c802..7d80321e2ece 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -550,6 +550,7 @@ struct kvm_mmu_notifier_range {
>  	on_lock_fn_t on_lock;
>  	bool flush_on_ret;
>  	bool may_block;
> +	bool lockless;
>  };
>  
>  /*
> @@ -598,6 +599,8 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
>  	struct kvm_memslots *slots;
>  	int i, idx;
>  
> +	BUILD_BUG_ON(sizeof(gfn_range.arg) != sizeof(gfn_range.arg.pte));
> +
>  	if (WARN_ON_ONCE(range->end <= range->start))
>  		return r;
>  
> @@ -637,15 +640,18 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
>  			gfn_range.start = hva_to_gfn_memslot(hva_start, slot);
>  			gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot);
>  			gfn_range.slot = slot;
> +			gfn_range.lockless = range->lockless;
>  
>  			if (!r.found_memslot) {
>  				r.found_memslot = true;
> -				KVM_MMU_LOCK(kvm);
> -				if (!IS_KVM_NULL_FN(range->on_lock))
> -					range->on_lock(kvm);
> -
> -				if (IS_KVM_NULL_FN(range->handler))
> -					break;
> +				if (!range->lockless) {
> +					KVM_MMU_LOCK(kvm);
> +					if (!IS_KVM_NULL_FN(range->on_lock))
> +						range->on_lock(kvm);
> +
> +					if (IS_KVM_NULL_FN(range->handler))
> +						break;
> +				}
>  			}
>  			r.ret |= range->handler(kvm, &gfn_range);
>  		}
> @@ -654,7 +660,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
>  	if (range->flush_on_ret && r.ret)
>  		kvm_flush_remote_tlbs(kvm);
>  
> -	if (r.found_memslot)
> +	if (r.found_memslot && !range->lockless)
>  		KVM_MMU_UNLOCK(kvm);
>  
>  	srcu_read_unlock(&kvm->srcu, idx);
> @@ -682,19 +688,24 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
>  	return __kvm_handle_hva_range(kvm, &range).ret;
>  }
>  
> -static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn,
> -							 unsigned long start,
> -							 unsigned long end,
> -							 gfn_handler_t handler)
> +static __always_inline int kvm_handle_hva_range_no_flush(
> +		struct mmu_notifier *mn,
> +		unsigned long start,
> +		unsigned long end,
> +		gfn_handler_t handler,
> +		union kvm_mmu_notifier_arg arg,
> +		bool lockless)
>  {
>  	struct kvm *kvm = mmu_notifier_to_kvm(mn);
>  	const struct kvm_mmu_notifier_range range = {
>  		.start		= start,
>  		.end		= end,
>  		.handler	= handler,
> +		.arg		= arg,
>  		.on_lock	= (void *)kvm_null_fn,
>  		.flush_on_ret	= false,
>  		.may_block	= false,
> +		.lockless	= lockless,
>  	};
>  
>  	return __kvm_handle_hva_range(kvm, &range).ret;
> @@ -909,15 +920,36 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
>  				    kvm_age_gfn);
>  }
>  
> -static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn,
> -					struct mm_struct *mm,
> -					unsigned long start,
> -					unsigned long end,
> -					unsigned long *bitmap)
> +static int kvm_mmu_notifier_test_clear_young(struct mmu_notifier *mn,
> +					     struct mm_struct *mm,
> +					     unsigned long start,
> +					     unsigned long end,
> +					     unsigned long *bitmap,
> +					     bool clear)

Perhaps pass in the callback (kvm_test_age_gfn/kvm_age_gfn) instead of
true/false to avoid the naked booleans at the callsites?

>  {
> -	trace_kvm_age_hva(start, end);
> +	if (kvm_arch_prepare_bitmap_age(mn)) {
> +		struct test_clear_young_metadata args = {
> +			.bitmap		= bitmap,
> +			.end		= end,
> +			.unreliable	= false,
> +		};
> +		union kvm_mmu_notifier_arg arg = {
> +			.metadata = &args
> +		};
> +		bool young;
> +
> +		young = kvm_handle_hva_range_no_flush(
> +					mn, start, end,
> +					clear ? kvm_age_gfn : kvm_test_age_gfn,
> +					arg, true);

I suspect the end result will be cleaner we make all architectures
lockless. i.e. Move the mmu_lock acquire/release into the
architecture-specific code.

This could result in more acquire/release calls (one per memslot that
overlaps the provided range) but that should be a single memslot in the
majority of cases I think?

Then unconditionally pass in the metadata structure.

Then you don't need any special casing for the fast path / bitmap path.
The only thing needed is to figure out whether to return
MMU_NOTIFIER_YOUNG vs MMU_NOTIFIER_YOUNG_LOOK_AROUND and that can be
plumbed via test_clear_young_metadata or by changing gfn_handler_t to
return an int instead of a bool.

> +
> +		kvm_arch_finish_bitmap_age(mn);
>  
> -	/* We don't support bitmaps. Don't test or clear anything. */
> +		if (!args.unreliable)
> +			return young ? MMU_NOTIFIER_YOUNG_FAST : 0;
> +	}
> +
> +	/* A bitmap was passed but the architecture doesn't support bitmaps */
>  	if (bitmap)
>  		return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE;
>  
> @@ -934,7 +966,21 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn,
>  	 * cadence. If we find this inaccurate, we might come up with a
>  	 * more sophisticated heuristic later.
>  	 */
> -	return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn);
> +	return kvm_handle_hva_range_no_flush(
> +			mn, start, end, clear ? kvm_age_gfn : kvm_test_age_gfn,
> +			KVM_MMU_NOTIFIER_NO_ARG, false);

Should this return MMU_NOTIFIER_YOUNG explicitly? This code is assuming
MMU_NOTIFIER_YOUNG == (int)true.

> +}
> +
> +static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn,
> +					struct mm_struct *mm,
> +					unsigned long start,
> +					unsigned long end,
> +					unsigned long *bitmap)
> +{
> +	trace_kvm_age_hva(start, end);
> +
> +	return kvm_mmu_notifier_test_clear_young(mn, mm, start, end, bitmap,
> +						 true);
>  }
>  
>  static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn,
> @@ -945,12 +991,8 @@ static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn,
>  {
>  	trace_kvm_test_age_hva(start, end);
>  
> -	/* We don't support bitmaps. Don't test or clear anything. */
> -	if (bitmap)
> -		return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE;
> -
> -	return kvm_handle_hva_range_no_flush(mn, start, end,
> -					     kvm_test_age_gfn);
> +	return kvm_mmu_notifier_test_clear_young(mn, mm, start, end, bitmap,
> +						 false);
>  }
>  
>  static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
> -- 
> 2.44.0.478.gd926399ef9-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2024-04-12 20:28 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-01 23:29 [PATCH v3 0/7] mm/kvm: Improve parallelism for access bit harvesting James Houghton
2024-04-01 23:29 ` James Houghton
2024-04-01 23:29 ` [PATCH v3 1/7] mm: Add a bitmap into mmu_notifier_{clear,test}_young James Houghton
2024-04-01 23:29   ` James Houghton
2024-04-04 18:52   ` David Hildenbrand
2024-04-04 18:52     ` David Hildenbrand
2024-04-09 18:31     ` James Houghton
2024-04-09 18:31       ` James Houghton
2024-04-09 19:35       ` David Hildenbrand
2024-04-09 19:35         ` David Hildenbrand
2024-04-11  0:35         ` James Houghton
2024-04-11  0:35           ` James Houghton
2024-04-12 18:45   ` David Matlack
2024-04-12 18:45     ` David Matlack
2024-04-19 20:34     ` James Houghton
2024-04-19 20:34       ` James Houghton
2024-04-01 23:29 ` [PATCH v3 2/7] KVM: Move MMU notifier function declarations James Houghton
2024-04-01 23:29   ` James Houghton
2024-04-01 23:29 ` [PATCH v3 3/7] KVM: Add basic bitmap support into kvm_mmu_notifier_test/clear_young James Houghton
2024-04-01 23:29   ` James Houghton
2024-04-12 20:28   ` David Matlack [this message]
2024-04-12 20:28     ` David Matlack
2024-04-19 20:41     ` James Houghton
2024-04-19 20:41       ` James Houghton
2024-04-01 23:29 ` [PATCH v3 4/7] KVM: x86: Move tdp_mmu_enabled and shadow_accessed_mask James Houghton
2024-04-01 23:29   ` James Houghton
2024-04-01 23:29 ` [PATCH v3 5/7] KVM: x86: Participate in bitmap-based PTE aging James Houghton
2024-04-01 23:29   ` James Houghton
2024-04-11 17:08   ` David Matlack
2024-04-11 17:08     ` David Matlack
2024-04-11 17:28     ` David Matlack
2024-04-11 17:28       ` David Matlack
2024-04-11 18:00       ` David Matlack
2024-04-11 18:00         ` David Matlack
2024-04-11 18:07         ` David Matlack
2024-04-11 18:07           ` David Matlack
2024-04-19 20:47       ` James Houghton
2024-04-19 20:47         ` James Houghton
2024-04-19 21:06         ` David Matlack
2024-04-19 21:06           ` David Matlack
2024-04-19 21:48           ` James Houghton
2024-04-19 21:48             ` James Houghton
2024-04-21  0:19             ` Yu Zhao
2024-04-21  0:19               ` Yu Zhao
2024-04-12 20:44   ` David Matlack
2024-04-12 20:44     ` David Matlack
2024-04-19 20:54     ` James Houghton
2024-04-19 20:54       ` James Houghton
2024-04-01 23:29 ` [PATCH v3 6/7] KVM: arm64: " James Houghton
2024-04-01 23:29   ` James Houghton
2024-04-02  4:06   ` Yu Zhao
2024-04-02  4:06     ` Yu Zhao
2024-04-02  7:00     ` Oliver Upton
2024-04-02  7:00       ` Oliver Upton
2024-04-02  7:33     ` Marc Zyngier
2024-04-02  7:33       ` Marc Zyngier
2024-04-01 23:29 ` [PATCH v3 7/7] mm: multi-gen LRU: use mmu_notifier_test_clear_young() James Houghton
2024-04-01 23:29   ` James Houghton
2024-04-12 18:41 ` [PATCH v3 0/7] mm/kvm: Improve parallelism for access bit harvesting David Matlack
2024-04-12 18:41   ` David Matlack
2024-04-19 20:57   ` James Houghton
2024-04-19 20:57     ` James Houghton
2024-04-19 22:23     ` Oliver Upton
2024-04-19 22:23       ` Oliver Upton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZhmZeFPRwOJVue5y@google.com \
    --to=dmatlack@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=gshan@redhat.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=jthoughton@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=maz@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=rananta@google.com \
    --cc=ricarkol@google.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=ryan.roberts@arm.com \
    --cc=seanjc@google.com \
    --cc=shahuang@redhat.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.