All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ricardo Koller <ricarkol@google.com>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: ricarkol@gmail.com, kvm@vger.kernel.org, catalin.marinas@arm.com,
	kvmarm@lists.linux.dev, andrew.jones@linux.dev,
	bgardon@google.com, maz@kernel.org, dmatlack@google.com,
	pbonzini@redhat.com, kvmarm@lists.cs.columbia.edu
Subject: Re: [RFC PATCH 04/12] KVM: arm64: Add kvm_pgtable_stage2_split()
Date: Tue, 15 Nov 2022 15:03:42 -0800	[thread overview]
Message-ID: <Y3QazjAUVE+T6rHh@google.com> (raw)
In-Reply-To: <Y3KrHG4WMXMUquUy@google.com>

On Mon, Nov 14, 2022 at 08:54:52PM +0000, Oliver Upton wrote:
> Hi Ricardo,
> 
> On Sat, Nov 12, 2022 at 08:17:06AM +0000, Ricardo Koller wrote:
> 
> [...]
> 
> > +/**
> > + * kvm_pgtable_stage2_split() - Split a range of huge pages into leaf PTEs pointing
> > + *				to PAGE_SIZE guest pages.
> > + * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
> > + * @addr:	Intermediate physical address from which to split.
> > + * @size:	Size of the range.
> > + * @mc:		Cache of pre-allocated and zeroed memory from which to allocate
> > + *		page-table pages.
> > + *
> > + * @addr and the end (@addr + @size) are effectively aligned down and up to
> > + * the top level huge-page block size. This is an exampe using 1GB
> > + * huge-pages and 4KB granules.
> > + *
> > + *                          [---input range---]
> > + *                          :                 :
> > + * [--1G block pte--][--1G block pte--][--1G block pte--][--1G block pte--]
> > + *                          :                 :
> > + *                   [--2MB--][--2MB--][--2MB--][--2MB--]
> > + *                          :                 :
> > + *                   [ ][ ][:][ ][ ][ ][ ][ ][:][ ][ ][ ]
> > + *                          :                 :
> > + *
> > + * Return: 0 on success, negative error code on failure. Note that
> > + * kvm_pgtable_stage2_split() is best effort: it tries to break as many
> > + * blocks in the input range as allowed by the size of the memcache. It
> > + * will fail it wasn't able to break any block.
> > + */
> > +int kvm_pgtable_stage2_split(struct kvm_pgtable *pgt, u64 addr, u64 size, void *mc);
> > +
> >  /**
> >   * kvm_pgtable_walk() - Walk a page-table.
> >   * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index d1f309128118..9c42eff6d42e 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -1267,6 +1267,80 @@ static int stage2_create_removed(kvm_pte_t *ptep, u64 phys, u32 level,
> >  	return __kvm_pgtable_visit(&data, mm_ops, ptep, level);
> >  }
> >  
> > +struct stage2_split_data {
> > +	struct kvm_s2_mmu		*mmu;
> > +	void				*memcache;
> > +	struct kvm_pgtable_mm_ops	*mm_ops;
> 
> You can also get at mm_ops through kvm_pgtable_visit_ctx
> 
> > +};
> > +
> > +static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx,
> > +			       enum kvm_pgtable_walk_flags visit)
> > +{
> > +	struct stage2_split_data *data = ctx->arg;
> > +	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> > +	kvm_pte_t pte = ctx->old, attr, new;
> > +	enum kvm_pgtable_prot prot;
> > +	void *mc = data->memcache;
> > +	u32 level = ctx->level;
> > +	u64 phys;
> > +
> > +	if (WARN_ON_ONCE(kvm_pgtable_walk_shared(ctx)))
> > +		return -EINVAL;
> > +
> > +	/* Nothing to split at the last level */
> > +	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
> > +		return 0;
> > +
> > +	/* We only split valid block mappings */
> > +	if (!kvm_pte_valid(pte) || kvm_pte_table(pte, ctx->level))
> > +		return 0;
> > +
> > +	phys = kvm_pte_to_phys(pte);
> > +	prot = kvm_pgtable_stage2_pte_prot(pte);
> > +	stage2_set_prot_attr(data->mmu->pgt, prot, &attr);
> > +
> > +	/*
> > +	 * Eager page splitting is best-effort, so we can ignore the error.
> > +	 * The returned PTE (new) will be valid even if this call returns
> > +	 * error: new will be a single (big) block PTE.  The only issue is
> > +	 * that it will affect dirty logging performance, as the huge-pages
> > +	 * will have to be split on fault, and so we WARN.
> > +	 */
> > +	WARN_ON(stage2_create_removed(&new, phys, level, attr, mc, mm_ops));
> 
> I don't believe we should warn in this case, at least not
> unconditionally. ENOMEM is an expected outcome, for example.

Given that "eager page splitting" is best-effort, the error must be
ignored somewhere: either here or by the caller (in mmu.c). It seems
that ignoring the error here is not a very good idea.

> 
> Additionally, I believe you'll want to bail out at this point to avoid
> installing a potentially garbage PTE as well.

It should be fine as stage2_create_removed() is also best-effort. The
returned PTE is valid even when it fails; it just returns a big block
PTE.

> 
> > +	stage2_put_pte(ctx, data->mmu, mm_ops);
> 
> Ah, I see why you've relaxed the WARN in patch 1 now.
> 
> I would recommend you follow the break-before-make pattern and use the
> helpers here as well. stage2_try_break_pte() will demote the store to
> WRITE_ONCE() if called from a non-shared context.
> 

ACK, I can do that. The only reason why I didnt' is because I would have
to handle the potential error from stage2_try_break_pte(). It would feel
wrong not to, even if it's !shared. On the other hand, I would like to
easily experiment with both the !shared and the shared approaches
easily.

> Then the WARN will behave as expected in stage2_make_pte().
> 
> > +	/*
> > +	 * Note, the contents of the page table are guaranteed to be made
> > +	 * visible before the new PTE is assigned because
> > +	 * stage2_make__pte() writes the PTE using smp_store_release().
> 
> typo: stage2_make_pte()
> 
> --
> Thanks,
> Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Ricardo Koller <ricarkol@google.com>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: pbonzini@redhat.com, maz@kernel.org, dmatlack@google.com,
	qperret@google.com, catalin.marinas@arm.com,
	andrew.jones@linux.dev, seanjc@google.com,
	alexandru.elisei@arm.com, suzuki.poulose@arm.com,
	eric.auger@redhat.com, gshan@redhat.com, reijiw@google.com,
	rananta@google.com, bgardon@google.com, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev, kvmarm@lists.cs.columbia.edu,
	ricarkol@gmail.com
Subject: Re: [RFC PATCH 04/12] KVM: arm64: Add kvm_pgtable_stage2_split()
Date: Tue, 15 Nov 2022 15:03:42 -0800	[thread overview]
Message-ID: <Y3QazjAUVE+T6rHh@google.com> (raw)
Message-ID: <20221115230342.GUfI6XYZZMUh-ypv47DayaiBP0BLZfYbyFSE8xbfDfg@z> (raw)
In-Reply-To: <Y3KrHG4WMXMUquUy@google.com>

On Mon, Nov 14, 2022 at 08:54:52PM +0000, Oliver Upton wrote:
> Hi Ricardo,
> 
> On Sat, Nov 12, 2022 at 08:17:06AM +0000, Ricardo Koller wrote:
> 
> [...]
> 
> > +/**
> > + * kvm_pgtable_stage2_split() - Split a range of huge pages into leaf PTEs pointing
> > + *				to PAGE_SIZE guest pages.
> > + * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
> > + * @addr:	Intermediate physical address from which to split.
> > + * @size:	Size of the range.
> > + * @mc:		Cache of pre-allocated and zeroed memory from which to allocate
> > + *		page-table pages.
> > + *
> > + * @addr and the end (@addr + @size) are effectively aligned down and up to
> > + * the top level huge-page block size. This is an exampe using 1GB
> > + * huge-pages and 4KB granules.
> > + *
> > + *                          [---input range---]
> > + *                          :                 :
> > + * [--1G block pte--][--1G block pte--][--1G block pte--][--1G block pte--]
> > + *                          :                 :
> > + *                   [--2MB--][--2MB--][--2MB--][--2MB--]
> > + *                          :                 :
> > + *                   [ ][ ][:][ ][ ][ ][ ][ ][:][ ][ ][ ]
> > + *                          :                 :
> > + *
> > + * Return: 0 on success, negative error code on failure. Note that
> > + * kvm_pgtable_stage2_split() is best effort: it tries to break as many
> > + * blocks in the input range as allowed by the size of the memcache. It
> > + * will fail it wasn't able to break any block.
> > + */
> > +int kvm_pgtable_stage2_split(struct kvm_pgtable *pgt, u64 addr, u64 size, void *mc);
> > +
> >  /**
> >   * kvm_pgtable_walk() - Walk a page-table.
> >   * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index d1f309128118..9c42eff6d42e 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -1267,6 +1267,80 @@ static int stage2_create_removed(kvm_pte_t *ptep, u64 phys, u32 level,
> >  	return __kvm_pgtable_visit(&data, mm_ops, ptep, level);
> >  }
> >  
> > +struct stage2_split_data {
> > +	struct kvm_s2_mmu		*mmu;
> > +	void				*memcache;
> > +	struct kvm_pgtable_mm_ops	*mm_ops;
> 
> You can also get at mm_ops through kvm_pgtable_visit_ctx
> 
> > +};
> > +
> > +static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx,
> > +			       enum kvm_pgtable_walk_flags visit)
> > +{
> > +	struct stage2_split_data *data = ctx->arg;
> > +	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> > +	kvm_pte_t pte = ctx->old, attr, new;
> > +	enum kvm_pgtable_prot prot;
> > +	void *mc = data->memcache;
> > +	u32 level = ctx->level;
> > +	u64 phys;
> > +
> > +	if (WARN_ON_ONCE(kvm_pgtable_walk_shared(ctx)))
> > +		return -EINVAL;
> > +
> > +	/* Nothing to split at the last level */
> > +	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
> > +		return 0;
> > +
> > +	/* We only split valid block mappings */
> > +	if (!kvm_pte_valid(pte) || kvm_pte_table(pte, ctx->level))
> > +		return 0;
> > +
> > +	phys = kvm_pte_to_phys(pte);
> > +	prot = kvm_pgtable_stage2_pte_prot(pte);
> > +	stage2_set_prot_attr(data->mmu->pgt, prot, &attr);
> > +
> > +	/*
> > +	 * Eager page splitting is best-effort, so we can ignore the error.
> > +	 * The returned PTE (new) will be valid even if this call returns
> > +	 * error: new will be a single (big) block PTE.  The only issue is
> > +	 * that it will affect dirty logging performance, as the huge-pages
> > +	 * will have to be split on fault, and so we WARN.
> > +	 */
> > +	WARN_ON(stage2_create_removed(&new, phys, level, attr, mc, mm_ops));
> 
> I don't believe we should warn in this case, at least not
> unconditionally. ENOMEM is an expected outcome, for example.

Given that "eager page splitting" is best-effort, the error must be
ignored somewhere: either here or by the caller (in mmu.c). It seems
that ignoring the error here is not a very good idea.

> 
> Additionally, I believe you'll want to bail out at this point to avoid
> installing a potentially garbage PTE as well.

It should be fine as stage2_create_removed() is also best-effort. The
returned PTE is valid even when it fails; it just returns a big block
PTE.

> 
> > +	stage2_put_pte(ctx, data->mmu, mm_ops);
> 
> Ah, I see why you've relaxed the WARN in patch 1 now.
> 
> I would recommend you follow the break-before-make pattern and use the
> helpers here as well. stage2_try_break_pte() will demote the store to
> WRITE_ONCE() if called from a non-shared context.
> 

ACK, I can do that. The only reason why I didnt' is because I would have
to handle the potential error from stage2_try_break_pte(). It would feel
wrong not to, even if it's !shared. On the other hand, I would like to
easily experiment with both the !shared and the shared approaches
easily.

> Then the WARN will behave as expected in stage2_make_pte().
> 
> > +	/*
> > +	 * Note, the contents of the page table are guaranteed to be made
> > +	 * visible before the new PTE is assigned because
> > +	 * stage2_make__pte() writes the PTE using smp_store_release().
> 
> typo: stage2_make_pte()
> 
> --
> Thanks,
> Oliver

  reply	other threads:[~2022-11-15 23:03 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-12  8:17 [RFC PATCH 00/12] KVM: arm64: Eager huge-page splitting for dirty-logging Ricardo Koller
2022-11-12  8:17 ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 01/12] KVM: arm64: Relax WARN check in stage2_make_pte() Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-14 20:59   ` Oliver Upton
2022-11-14 20:59     ` Oliver Upton
2022-11-12  8:17 ` [RFC PATCH 02/12] KVM: arm64: Allow visiting block PTEs in post-order Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-14 18:48   ` Oliver Upton
2022-11-14 18:48     ` Oliver Upton
2023-01-13  3:44     ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 03/12] KVM: arm64: Add stage2_create_removed() Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 04/12] KVM: arm64: Add kvm_pgtable_stage2_split() Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-14 20:54   ` Oliver Upton
2022-11-14 20:54     ` Oliver Upton
2022-11-15 23:03     ` Ricardo Koller [this message]
2022-11-15 23:03       ` Ricardo Koller
2022-11-15 23:27       ` Ricardo Koller
2022-11-15 23:27         ` Ricardo Koller
2022-11-15 23:54         ` Oliver Upton
2022-11-15 23:54           ` Oliver Upton
2022-11-17 21:50           ` Ricardo Koller
2022-11-17 21:50             ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 05/12] arm64: Add a capability for FEAT_BBM level 2 Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 06/12] KVM: arm64: Split block PTEs without using break-before-make Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-14 18:56   ` Oliver Upton
2022-11-14 18:56     ` Oliver Upton
2022-11-12  8:17 ` [RFC PATCH 07/12] KVM: arm64: Refactor kvm_arch_commit_memory_region() Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 08/12] KVM: arm64: Add kvm_uninit_stage2_mmu() Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 09/12] KVM: arm64: Split huge pages when dirty logging is enabled Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 10/12] KVM: arm64: Open-code kvm_mmu_write_protect_pt_masked() Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 11/12] KVM: arm64: Split huge pages during KVM_CLEAR_DIRTY_LOG Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 12/12] KVM: arm64: Use local TLBI on permission relaxation Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-14 18:42 ` [RFC PATCH 00/12] KVM: arm64: Eager huge-page splitting for dirty-logging Oliver Upton
2022-11-14 18:42   ` Oliver Upton
2023-01-13  3:42   ` Ricardo Koller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y3QazjAUVE+T6rHh@google.com \
    --to=ricarkol@google.com \
    --cc=andrew.jones@linux.dev \
    --cc=bgardon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=ricarkol@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.