From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id EC7633CF041; Fri, 20 Mar 2026 16:15:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774023307; cv=none; b=aYlM9adWK32HACIYdyKw0ylYVV41+LOtSBRF4rvB/v0uWD9w43/ic4IFchHP6bWanwvIVk+bf+mxOuv+HAmMHe/o9IZxLrST9EoVZrS+X5DEcPF30RuAQqwLiGraJYLIeUM/l2UdeexvJrM5BM3bZKOGgvsRcs+LijbeEBNmtwk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774023307; c=relaxed/simple; bh=HVv9c0RDv4+nPDqGfkNmqcojZfLaMFz2O3iNejwcTAU=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=Q/k5+TMFisrNdm4r16IGq6BQt8YRxx4VWin9aY1QPgNae+/hyguOXVS1U1nkWS9690v7aFBBl8/mpIwIvjZ+9FHNjM/qIEo1dOQfRgweAX6qCtGE+0fk8qTnEtVWO+zylNqqLd3fAuAemgzi9hPfoIGIVpGsbBHaxxJRBG0szvE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3F8091682; Fri, 20 Mar 2026 09:14:59 -0700 (PDT) Received: from [10.1.29.20] (e122027.cambridge.arm.com [10.1.29.20]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 069013F7BD; Fri, 20 Mar 2026 09:15:00 -0700 (PDT) Message-ID: <34815656-b73f-4f07-857d-103abf654acf@arm.com> Date: Fri, 20 Mar 2026 16:14:58 +0000 Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v13 15/48] arm64: RMI: RTT tear down To: Suzuki K Poulose , kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: Catalin Marinas , Marc Zyngier , Will Deacon , James Morse , Oliver Upton , Zenghui Yu , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Joey Gouly , Alexandru Elisei , Christoffer Dall , Fuad Tabba , linux-coco@lists.linux.dev, Ganapatrao Kulkarni , Gavin Shan , Shanker Donthineni , Alper Gun , "Aneesh Kumar K . V" , Emi Kisanuki , Vishal Annapurve References: <20260318155413.793430-1-steven.price@arm.com> <20260318155413.793430-16-steven.price@arm.com> From: Steven Price Content-Language: en-GB In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 20/03/2026 10:37, Suzuki K Poulose wrote: > On 18/03/2026 15:53, Steven Price wrote: >> The RMM owns the stage 2 page tables for a realm, and KVM must request >> that the RMM creates/destroys entries as necessary. The physical pages >> to store the page tables are delegated to the realm as required, and can >> be undelegated when no longer used. >> >> Creating new RTTs is the easy part, tearing down is a little more >> tricky. The result of realm_rtt_destroy() can be used to effectively >> walk the tree and destroy the entries (undelegating pages that were >> given to the realm). >> >> Signed-off-by: Steven Price >> --- >> Changes since v12: >>   * Simplify some functions now we know RMM page size is the same as the >>     host's. >> Changes since v11: >>   * Moved some code from earlier in the series to this one so that it's >>     added when it's first used. >> Changes since v10: >>   * RME->RMI rename. >>   * Some code to handle freeing stage 2 PGD moved into this patch where >>     it belongs. >> Changes since v9: >>   * Add a comment clarifying that root level RTTs are not destroyed until >>     after the RD is destroyed. >> Changes since v8: >>   * Introduce free_rtt() wrapper which calls free_delegated_granule() >>     followed by kvm_account_pgtable_pages(). This makes it clear where an >>     RTT is being freed rather than just a delegated granule. >> Changes since v6: >>   * Move rme_rtt_level_mapsize() and supporting defines from kvm_rme.h >>     into rme.c as they are only used in that file. >> Changes since v5: >>   * Rename some RME_xxx defines to do with page sizes as RMM_xxx - >> they are >>     a property of the RMM specification not the RME architecture. >> Changes since v2: >>   * Moved {alloc,free}_delegated_page() and ensure_spare_page() to a >>     later patch when they are actually used. >>   * Some simplifications now rmi_xxx() functions allow NULL as an output >>     parameter. >>   * Improved comments and code layout. >> --- >>   arch/arm64/include/asm/kvm_rmi.h |   7 ++ >>   arch/arm64/kvm/mmu.c             |  15 +++- >>   arch/arm64/kvm/rmi.c             | 145 +++++++++++++++++++++++++++++++ >>   3 files changed, 166 insertions(+), 1 deletion(-) >> >> diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/ >> asm/kvm_rmi.h >> index 0ada525af18f..16a297f3091a 100644 >> --- a/arch/arm64/include/asm/kvm_rmi.h >> +++ b/arch/arm64/include/asm/kvm_rmi.h >> @@ -68,5 +68,12 @@ u32 kvm_realm_ipa_limit(void); >>     int kvm_init_realm_vm(struct kvm *kvm); >>   void kvm_destroy_realm(struct kvm *kvm); >> +void kvm_realm_destroy_rtts(struct kvm *kvm); >> + >> +static inline bool kvm_realm_is_private_address(struct realm *realm, >> +                        unsigned long addr) >> +{ >> +    return !(addr & BIT(realm->ia_bits - 1)); >> +} >>     #endif /* __ASM_KVM_RMI_H */ >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >> index 9dc242c3b9c8..41152abf55b2 100644 >> --- a/arch/arm64/kvm/mmu.c >> +++ b/arch/arm64/kvm/mmu.c >> @@ -1098,10 +1098,23 @@ void stage2_unmap_vm(struct kvm *kvm) >>   void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu) >>   { >>       struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu); >> -    struct kvm_pgtable *pgt = NULL; >> +    struct kvm_pgtable *pgt; >>         write_lock(&kvm->mmu_lock); >>       pgt = mmu->pgt; >> +    if (kvm_is_realm(kvm) && >> +        (kvm_realm_state(kvm) != REALM_STATE_DEAD && >> +         kvm_realm_state(kvm) != REALM_STATE_NONE)) { >> +        write_unlock(&kvm->mmu_lock); >> +        kvm_realm_destroy_rtts(kvm); >> + >> +        /* >> +         * The PGD pages can be reclaimed only after the realm (RD) is >> +         * destroyed. We call this again from kvm_destroy_realm() after >> +         * the RD is destroyed. >> +         */ >> +        return; >> +    } >>       if (pgt) { >>           mmu->pgd_phys = 0; >>           mmu->pgt = NULL; >> diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c >> index 700b8c935d29..1fd2c18f7381 100644 >> --- a/arch/arm64/kvm/rmi.c >> +++ b/arch/arm64/kvm/rmi.c >> @@ -15,6 +15,19 @@ >>   static unsigned long rmm_feat_reg0; >>   static unsigned long rmm_feat_reg1; >>   > > --> > >> +#define RMM_RTT_BLOCK_LEVEL    2 > ... >> + >> +#define RMM_L2_BLOCK_SIZE    PMD_SIZE > > <-- > > Unused ? Even better we could use PMD_SIZE directly if at all we need > it, as we are using PAGE_SIZE They are used in realm_map_protected() to calculate 'map_level'. But actually I should be able to drop that use. With the range-based APIs there's no longer a need to create a temporary RTT when populating a huge-page. > > minor nit: Also, may be we can have a generic name for the > RMM_RTT_MAX_LEVEL ? This applies to all page tables ? > > I see we have KVM_PGTALBE_LAST_LEVEL, may be we could use that ? Yes that would work - thanks for the suggestion. > >> + >> +static inline unsigned long rmi_rtt_level_mapsize(int level) >> +{ >> +    if (WARN_ON(level > RMM_RTT_MAX_LEVEL)) >> +        return PAGE_SIZE; >> + >> +    return (1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(level)); >> +} >> + >>   static bool rmi_has_feature(unsigned long feature) >>   { >>       return !!u64_get_bits(rmm_feat_reg0, feature); >> @@ -189,6 +202,11 @@ u32 kvm_realm_ipa_limit(void) >>       return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ); >>   } >>   +static int get_start_level(struct realm *realm) >> +{ >> +    return 4 - stage2_pgtable_levels(realm->ia_bits); >> +} >> + >>   static int undelegate_range(phys_addr_t phys, unsigned long size) >>   { >>       unsigned long ret; >> @@ -223,6 +241,131 @@ static int free_delegated_page(phys_addr_t phys) >>       return 0; >>   } >>   +static void free_rtt(phys_addr_t phys) >> +{ >> +    if (free_delegated_page(phys)) >> +        return; >> + >> +    kvm_account_pgtable_pages(phys_to_virt(phys), -1); >> +} > > > How about a comment here for the function below ? > > Something like : > > /* >  * realm_rtt_destroy: Destroy an RTT at @level for @addr. >  * >  * Returns - Result of the RMI_RTT_DESTROY call, additionally : >  *  @out_rtt  : RTT granule, if the RTT was destroyed. >  *  @next_addr: IPA corresponding to the next possible valid Table entry >  *        we can target. >  */ Sure >> + >> +static int realm_rtt_destroy(struct realm *realm, unsigned long addr, >> +                 int level, phys_addr_t *rtt_granule, >> +                 unsigned long *next_addr) >> +{ >> +    unsigned long out_rtt; >> +    int ret; >> + >> +    ret = rmi_rtt_destroy(virt_to_phys(realm->rd), addr, level, >> +                  &out_rtt, next_addr); >> + >> +    *rtt_granule = out_rtt; >> + >> +    return ret; >> +} >> + >> +static int realm_tear_down_rtt_level(struct realm *realm, int level, >> +                     unsigned long start, unsigned long end) >> +{ >> +    ssize_t map_size; >> +    unsigned long addr, next_addr; >> + >> +    if (WARN_ON(level > RMM_RTT_MAX_LEVEL)) >> +        return -EINVAL; >> + >> +    map_size = rmi_rtt_level_mapsize(level - 1); >> + >> +    for (addr = start; addr < end; addr = next_addr) { >> +        phys_addr_t rtt_granule; >> +        int ret; >> +        unsigned long align_addr = ALIGN(addr, map_size); >> + >> +        next_addr = ALIGN(addr + 1, map_size); >> + >> +        if (next_addr > end || align_addr != addr) { >> +            /* >> +             * The target range is smaller than what this level >> +             * covers, recurse deeper. >> +             */ >> +            ret = realm_tear_down_rtt_level(realm, >> +                            level + 1, >> +                            addr, >> +                            min(next_addr, end)); >> +            if (ret) >> +                return ret; >> +            continue; >> +        } >> + >> +        ret = realm_rtt_destroy(realm, addr, level, >> +                    &rtt_granule, &next_addr); >> + >> +        switch (RMI_RETURN_STATUS(ret)) { >> +        case RMI_SUCCESS: >> +            free_rtt(rtt_granule); >> +            break; >> +        case RMI_ERROR_RTT: >> +            if (next_addr > addr) { >> +                /* Missing RTT, skip */ >> +                break; >> +            } >> +            /* >> +             * We tear down the RTT range for the full IPA >> +             * space, after everything is unmapped. Also we >> +             * descend down only if we cannot tear down a >> +             * top level RTT. Thus RMM must be able to walk >> +             * to the requested level. e.g., a block mapping >> +             * exists at L1 or L2. >> +             */ >> +            if (WARN_ON(RMI_RETURN_INDEX(ret) != level)) >> +                return -EBUSY; >> +            if (WARN_ON(level == RMM_RTT_MAX_LEVEL)) >> +                return -EBUSY; >> + >> +            /* >> +             * The table has active entries in it, recurse deeper >> +             * and tear down the RTTs. >> +             */ >> +            next_addr = ALIGN(addr + 1, map_size); >> +            ret = realm_tear_down_rtt_level(realm, >> +                            level + 1, >> +                            addr, >> +                            next_addr); >> +            if (ret) >> +                return ret; >> +            /* >> +             * Now that the child RTTs are destroyed, >> +             * retry at this level. >> +             */ >> +            next_addr = addr; >> +            break; >> +        default: >> +            WARN_ON(1); >> +            return -ENXIO; >> +        } >> +    } >> + >> +    return 0; >> +} >> + >> +static int realm_tear_down_rtt_range(struct realm *realm, >> +                     unsigned long start, unsigned long end) >> +{ >> +    /* >> +     * Root level RTTs can only be destroyed after the RD is >> destroyed. So >> +     * tear down everything below the root level >> +     */ >> +    return realm_tear_down_rtt_level(realm, get_start_level(realm) + 1, >> +                     start, end); >> +} >> + >> +void kvm_realm_destroy_rtts(struct kvm *kvm) >> +{ >> +    struct realm *realm = &kvm->arch.realm; >> +    unsigned int ia_bits = realm->ia_bits; >> + >> +    WARN_ON(realm_tear_down_rtt_range(realm, 0, (1UL << ia_bits))); > > AFAICS, we already WARN_ON() in all the cases where the > realm_tear_down_rtt_range() fails, so may be we can skip this > WARN_ON here ? So we do - yes this WARN_ON is unnecessary. Thanks, Steve > Suzuki > >> +} >> + >>   void kvm_destroy_realm(struct kvm *kvm) >>   { >>       struct realm *realm = &kvm->arch.realm; >> @@ -246,6 +389,8 @@ void kvm_destroy_realm(struct kvm *kvm) >>       if (realm->rd) { >>           phys_addr_t rd_phys = virt_to_phys(realm->rd); >>   +        kvm_realm_destroy_rtts(kvm); >> + >>           if (WARN_ON(rmi_realm_destroy(rd_phys))) >>               return; >>           free_delegated_page(rd_phys); >