Re: [RFC PATCH] kvm: nv: Optimize the unmapping of shadow S2-MMU tables.

From: Oliver Upton <oliver.upton@linux.dev>
To: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, maz@kernel.org,
	darren@os.amperecomputing.com,
	d.scott.phillips@amperecomputing.com,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>
Subject: Re: [RFC PATCH] kvm: nv: Optimize the unmapping of shadow S2-MMU tables.
Date: Tue, 5 Mar 2024 08:46:44 +0000	[thread overview]
Message-ID: <Zebb9CyihqC4JqnK@linux.dev> (raw)
In-Reply-To: <20240305054606.13261-1-gankulkarni@os.amperecomputing.com>

-cc old kvmarm list
+cc new kvmarm list, reviewers

Please run scripts/get_maintainer.pl next time around so we get the
right people looking at a patch.

On Mon, Mar 04, 2024 at 09:46:06PM -0800, Ganapatrao Kulkarni wrote:
> @@ -216,6 +223,13 @@ struct kvm_s2_mmu {
>  	 * >0: Somebody is actively using this.
>  	 */
>  	atomic_t refcnt;
> +
> +	/*
> +	 * For a Canonical IPA to Shadow IPA mapping.
> +	 */
> +	struct rb_root nested_mapipa_root;

There isn't any benefit to tracking the canonical IPA -> shadow IPA(s)
mapping on a per-S2 basis, as there already exists a one-to-many problem
(more below). Maintaining a per-VM data structure (since this is keyed
by canonical IPA) makes a bit more sense.

> +	rwlock_t mmu_lock;
> +

Err, is there any reason the existing mmu_lock is insufficient here?
Surely taking a new reference on a canonical IPA for a shadow S2 must be
done behind the MMU lock for it to be safe against MMU notifiers...

Also, Reusing the exact same name for it is sure to produce some lock
imbalance funnies.

>  };
>  
>  static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index da7ebd2f6e24..c31a59a1fdc6 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -65,6 +65,9 @@ extern void kvm_init_nested(struct kvm *kvm);
>  extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
>  extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
>  extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu);
> +extern void add_shadow_ipa_map_node(
> +		struct kvm_s2_mmu *mmu,
> +		phys_addr_t ipa, phys_addr_t shadow_ipa, long size);

style nitpick: no newline between the open bracket and first parameter.
Wrap as needed at 80 (or a bit more) columns.

> +/*
> + * Create a node and add to lookup table, when a page is mapped to
> + * Canonical IPA and also mapped to Shadow IPA.
> + */
> +void add_shadow_ipa_map_node(struct kvm_s2_mmu *mmu,
> +			phys_addr_t ipa,
> +			phys_addr_t shadow_ipa, long size)
> +{
> +	struct rb_root *ipa_root = &(mmu->nested_mapipa_root);
> +	struct rb_node **node = &(ipa_root->rb_node), *parent = NULL;
> +	struct mapipa_node *new;
> +
> +	new = kzalloc(sizeof(struct mapipa_node), GFP_KERNEL);
> +	if (!new)
> +		return;

Should be GFP_KERNEL_ACCOUNT, you want to charge this to the user.

> +
> +	new->shadow_ipa = shadow_ipa;
> +	new->ipa = ipa;
> +	new->size = size;

What about aliasing? You could have multiple shadow IPAs that point to
the same canonical IPA, even within a single MMU.

> +	write_lock(&mmu->mmu_lock);
> +
> +	while (*node) {
> +		struct mapipa_node *tmp;
> +
> +		tmp = container_of(*node, struct mapipa_node, node);
> +		parent = *node;
> +		if (new->ipa < tmp->ipa) {
> +			node = &(*node)->rb_left;
> +		} else if (new->ipa > tmp->ipa) {
> +			node = &(*node)->rb_right;
> +		} else {
> +			write_unlock(&mmu->mmu_lock);
> +			kfree(new);
> +			return;
> +		}
> +	}
> +
> +	rb_link_node(&new->node, parent, node);
> +	rb_insert_color(&new->node, ipa_root);
> +	write_unlock(&mmu->mmu_lock);

Meh, one of the annoying things with rbtree is you have to build your
own search functions...

It would appear that the rbtree intends to express intervals (i.e. GPA +
size), but the search implementation treats GPA as an index. So I don't
think this works as intended.

Have you considered other abstract data types (e.g. xarray, maple tree)
and how they might apply here?

> +bool get_shadow_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa, phys_addr_t *shadow_ipa, long *size)
> +{
> +	struct rb_node *node;
> +	struct mapipa_node *tmp = NULL;
> +
> +	read_lock(&mmu->mmu_lock);
> +	node = mmu->nested_mapipa_root.rb_node;
> +
> +	while (node) {
> +		tmp = container_of(node, struct mapipa_node, node);
> +
> +		if (tmp->ipa == ipa)
> +			break;
> +		else if (ipa > tmp->ipa)
> +			node = node->rb_right;
> +		else
> +			node = node->rb_left;
> +	}
> +
> +	read_unlock(&mmu->mmu_lock);
> +
> +	if (tmp && tmp->ipa == ipa) {
> +		*shadow_ipa = tmp->shadow_ipa;
> +		*size = tmp->size;
> +		write_lock(&mmu->mmu_lock);
> +		rb_erase(&tmp->node, &mmu->nested_mapipa_root);
> +		write_unlock(&mmu->mmu_lock);
> +		kfree(tmp);
> +		return true;
> +	}

Implicitly evicting the entry isn't going to work if we want to use it
for updates to a stage-2 that do not evict the mapping, like write
protection or access flag updates.

-- 
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel