linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Raghavendra Rao Ananta <rananta@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Ricardo Koller <ricarkol@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jing Zhang <jingzhangos@google.com>,
	Colton Lewis <coltonlewis@google.com>,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap
Date: Mon, 29 May 2023 15:18:25 +0100	[thread overview]
Message-ID: <87ttvvjk5q.wl-maz@kernel.org> (raw)
In-Reply-To: <20230519005231.3027912-7-rananta@google.com>

On Fri, 19 May 2023 01:52:31 +0100,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> The current implementation of the stage-2 unmap walker traverses
> the given range and, as a part of break-before-make, performs
> TLB invalidations with a DSB for every PTE. A multitude of this
> combination could cause a performance bottleneck.
> 
> Hence, if the system supports FEAT_TLBIRANGE, defer the TLB
> invalidations until the entire walk is finished, and then
> use range-based instructions to invalidate the TLBs in one go.
> Condition this upon S2FWB in order to avoid walking the page-table
> again to perform the CMOs after issuing the TLBI.

But that's the real bottleneck. TLBIs are cheap compared to CMOs, even
on remarkably bad implementations. What is your plan to fix this?

> 
> Rename stage2_put_pte() to stage2_unmap_put_pte() as the function
> now serves the stage-2 unmap walker specifically, rather than
> acting generic.
> 
> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 35 ++++++++++++++++++++++++++++++-----
>  1 file changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b8f0dbd12f773..5832ee3418fb0 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -771,16 +771,34 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n
>  	smp_store_release(ctx->ptep, new);
>  }
>  
> -static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> -			   struct kvm_pgtable_mm_ops *mm_ops)
> +static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt)
>  {
> +	/*
> +	 * If FEAT_TLBIRANGE is implemented, defer the individial PTE
> +	 * TLB invalidations until the entire walk is finished, and
> +	 * then use the range-based TLBI instructions to do the
> +	 * invalidations. Condition this upon S2FWB in order to avoid
> +	 * a page-table walk again to perform the CMOs after TLBI.
> +	 */
> +	return system_supports_tlb_range() && stage2_has_fwb(pgt);
> +}
> +
> +static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
> +				struct kvm_s2_mmu *mmu,
> +				struct kvm_pgtable_mm_ops *mm_ops)
> +{
> +	struct kvm_pgtable *pgt = ctx->arg;
> +
>  	/*
>  	 * Clear the existing PTE, and perform break-before-make with
>  	 * TLB maintenance if it was valid.
>  	 */
>  	if (kvm_pte_valid(ctx->old)) {
>  		kvm_clear_pte(ctx->ptep);
> -		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> +
> +		if (!stage2_unmap_defer_tlb_flush(pgt))
> +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu,
> +					ctx->addr, ctx->level);

This really doesn't match the comment anymore.

Overall, I'm very concerned that we lose the consistency property that
the current code has: once called, the TLBs and the page tables are
synchronised.

Yes, this patch looks correct. But it is also really fragile.

>  	}
>  
>  	mm_ops->put_page(ctx->ptep);
> @@ -1015,7 +1033,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  	 * block entry and rely on the remaining portions being faulted
>  	 * back lazily.
>  	 */
> -	stage2_put_pte(ctx, mmu, mm_ops);
> +	stage2_unmap_put_pte(ctx, mmu, mm_ops);
>  
>  	if (need_flush && mm_ops->dcache_clean_inval_poc)
>  		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
> @@ -1029,13 +1047,20 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  
>  int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  {
> +	int ret;
>  	struct kvm_pgtable_walker walker = {
>  		.cb	= stage2_unmap_walker,
>  		.arg	= pgt,
>  		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
>  	};
>  
> -	return kvm_pgtable_walk(pgt, addr, size, &walker);
> +	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> +	if (stage2_unmap_defer_tlb_flush(pgt))
> +		/* Perform the deferred TLB invalidations */
> +		kvm_call_hyp(__kvm_tlb_flush_vmid_range, pgt->mmu,
> +				addr, addr + size);

This "kvm_call_hyp(__kvm_tlb_flush_vmid_range,...)" could do with a
wrapper from the point where you introduce it.

> +
> +	return ret;
>  }
>  

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2023-05-29 14:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-19  0:52 [PATCH v4 0/6] KVM: arm64: Add support for FEAT_TLBIRANGE Raghavendra Rao Ananta
2023-05-19  0:52 ` [PATCH v4 1/6] arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range Raghavendra Rao Ananta
2023-05-19  0:52 ` [PATCH v4 2/6] KVM: arm64: Implement __kvm_tlb_flush_vmid_range() Raghavendra Rao Ananta
2023-05-29 13:54   ` Marc Zyngier
2023-05-30 21:14     ` Raghavendra Rao Ananta
2023-05-19  0:52 ` [PATCH v4 3/6] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range() Raghavendra Rao Ananta
2023-05-29 14:00   ` Marc Zyngier
2023-05-30 21:22     ` Raghavendra Rao Ananta
2023-05-31  8:46       ` Marc Zyngier
2023-06-02  1:37         ` Raghavendra Rao Ananta
2023-06-02  8:25           ` Marc Zyngier
2023-05-19  0:52 ` [PATCH v4 4/6] KVM: arm64: Flush only the memslot after write-protect Raghavendra Rao Ananta
2023-05-19  0:52 ` [PATCH v4 5/6] KVM: arm64: Invalidate the table entries upon a range Raghavendra Rao Ananta
2023-05-19  0:52 ` [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap Raghavendra Rao Ananta
2023-05-21 19:32   ` Oliver Upton
2023-05-29 14:18   ` Marc Zyngier [this message]
2023-05-30 21:35     ` Raghavendra Rao Ananta
2023-05-31  8:54       ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ttvvjk5q.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=coltonlewis@google.com \
    --cc=james.morse@arm.com \
    --cc=jingzhangos@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=rananta@google.com \
    --cc=ricarkol@google.com \
    --cc=suzuki.poulose@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).