kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oliver Upton <oliver.upton@linux.dev>
To: Ricardo Koller <ricarkol@google.com>
Cc: pbonzini@redhat.com, maz@kernel.org, oupton@google.com,
	yuzenghui@huawei.com, dmatlack@google.com, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev, qperret@google.com,
	catalin.marinas@arm.com, andrew.jones@linux.dev,
	seanjc@google.com, alexandru.elisei@arm.com,
	suzuki.poulose@arm.com, eric.auger@redhat.com, gshan@redhat.com,
	reijiw@google.com, rananta@google.com, bgardon@google.com,
	ricarkol@gmail.com
Subject: Re: [PATCH v3 04/12] KVM: arm64: Add kvm_pgtable_stage2_split()
Date: Thu, 16 Feb 2023 00:36:18 +0000	[thread overview]
Message-ID: <Y+16gsTbsZyUBMAt@linux.dev> (raw)
In-Reply-To: <20230215174046.2201432-5-ricarkol@google.com>

On Wed, Feb 15, 2023 at 05:40:38PM +0000, Ricardo Koller wrote:

[...]

> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index fed314f2b320..e2fb78398b3d 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -1229,6 +1229,111 @@ int kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
>  	return 0;
>  }
>  
> +struct stage2_split_data {
> +	struct kvm_s2_mmu		*mmu;
> +	void				*memcache;
> +	u64				mc_capacity;
> +};
> +
> +/*
> + * Get the number of page-tables needed to replace a bock with a fully
> + * populated tree, up to the PTE level, at particular level.
> + */
> +static inline u32 stage2_block_get_nr_page_tables(u32 level)
> +{
> +	switch (level) {
> +	/* There are no blocks at level 0 */
> +	case 1: return 1 + PTRS_PER_PTE;
> +	case 2: return 1;
> +	case 3: return 0;
> +	default:
> +		WARN_ON_ONCE(1);
> +		return ~0;
> +	}
> +}

This doesn't take into account our varying degrees of hugepage support
across page sizes. Perhaps:

  static inline int stage2_block_get_nr_page_tables(u32 level)
  {
          if (WARN_ON_ONCE(level < KVM_PGTABLE_MIN_BLOCK_LEVEL ||
	                   level >= KVM_PGTABLE_MAX_LEVELS))
	          return -EINVAL;

          switch (level) {
	  case 1:
	  	return PTRS_PER_PTE + 1;
	  case 2:
	  	return 1;
	  case 3:
	  	return 0;
	  }
  }

paired with an explicit error check and early return on the caller side.

> +static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			       enum kvm_pgtable_walk_flags visit)
> +{
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> +	struct stage2_split_data *data = ctx->arg;
> +	kvm_pte_t pte = ctx->old, new, *childp;
> +	enum kvm_pgtable_prot prot;
> +	void *mc = data->memcache;
> +	u32 level = ctx->level;
> +	u64 phys, nr_pages;
> +	bool force_pte;
> +	int ret;
> +
> +	/* No huge-pages exist at the last level */
> +	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
> +		return 0;
> +
> +	/* We only split valid block mappings */
> +	if (!kvm_pte_valid(pte))
> +		return 0;
> +
> +	nr_pages = stage2_block_get_nr_page_tables(level);
> +	if (data->mc_capacity >= nr_pages) {
> +		/* Build a tree mapped down to the PTE granularity. */
> +		force_pte = true;
> +	} else {
> +		/*
> +		 * Don't force PTEs. This requires a single page of PMDs at the
> +		 * PUD level, or a single page of PTEs at the PMD level. If we
> +		 * are at the PUD level, the PTEs will be created recursively.
> +		 */
> +		force_pte = false;
> +		nr_pages = 1;
> +	}

Do we know if the 'else' branch here is even desirable? I.e. has
recursive shattering been tested with PUD hugepages (HugeTLB 1G) and
shown to improve guest performance while dirty tracking?

The observations we've made on existing systems were that the successive
break-before-make operations led to a measurable slowdown in guest
pre-copy performance. Recursively building the page tables should
actually result in *more* break-before-makes than if we just let the vCPU
fault path lazily shatter hugepages.

-- 
Thanks,
Oliver

  reply	other threads:[~2023-02-16  0:36 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-15 17:40 [PATCH v3 00/12] Implement Eager Page Splitting for ARM Ricardo Koller
2023-02-15 17:40 ` [PATCH v3 01/12] KVM: arm64: Add KVM_PGTABLE_WALK ctx->flags for skipping BBM and CMO Ricardo Koller
2023-02-16  2:56   ` Shaoqin Huang
2023-02-16 18:41     ` Ricardo Koller
2023-02-15 17:40 ` [PATCH v3 02/12] KVM: arm64: Rename free_unlinked to free_removed Ricardo Koller
2023-02-15 23:51   ` Oliver Upton
2023-02-16  3:13   ` Shaoqin Huang
2023-02-16 18:44     ` Ricardo Koller
2023-02-15 17:40 ` [PATCH v3 03/12] KVM: arm64: Add helper for creating unlinked stage2 subtrees Ricardo Koller
2023-02-16  0:13   ` Oliver Upton
2023-02-16 18:34     ` Ricardo Koller
2023-02-15 17:40 ` [PATCH v3 04/12] KVM: arm64: Add kvm_pgtable_stage2_split() Ricardo Koller
2023-02-16  0:36   ` Oliver Upton [this message]
     [not found]     ` <CAOHnOrzoBp7zh=yjjtgto-m1p1P1sWQ5gJRTzChemGk93J+T4g@mail.gmail.com>
2023-02-16 18:14       ` Ricardo Koller
2023-02-16 12:22   ` Shaoqin Huang
2023-02-16 18:30     ` Ricardo Koller
2023-02-17  4:07   ` Shaoqin Huang
2023-02-15 17:40 ` [PATCH v3 05/12] KVM: arm64: Refactor kvm_arch_commit_memory_region() Ricardo Koller
2023-02-15 17:40 ` [PATCH v3 06/12] KVM: arm64: Add kvm_uninit_stage2_mmu() Ricardo Koller
2023-02-15 17:40 ` [PATCH v3 07/12] KVM: arm64: Export kvm_are_all_memslots_empty() Ricardo Koller
2023-02-15 17:40 ` [PATCH v3 08/12] KVM: arm64: Add KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE Ricardo Koller
2023-02-15 17:40 ` [PATCH v3 09/12] KVM: arm64: Split huge pages when dirty logging is enabled Ricardo Koller
2023-02-15 17:40 ` [PATCH v3 10/12] KVM: arm64: Open-code kvm_mmu_write_protect_pt_masked() Ricardo Koller
2023-02-15 17:40 ` [PATCH v3 11/12] KVM: arm64: Split huge pages during KVM_CLEAR_DIRTY_LOG Ricardo Koller
2023-02-15 17:40 ` [PATCH v3 12/12] KVM: arm64: Use local TLBI on permission relaxation Ricardo Koller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y+16gsTbsZyUBMAt@linux.dev \
    --to=oliver.upton@linux.dev \
    --cc=alexandru.elisei@arm.com \
    --cc=andrew.jones@linux.dev \
    --cc=bgardon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=dmatlack@google.com \
    --cc=eric.auger@redhat.com \
    --cc=gshan@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=oupton@google.com \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=rananta@google.com \
    --cc=reijiw@google.com \
    --cc=ricarkol@gmail.com \
    --cc=ricarkol@google.com \
    --cc=seanjc@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).