From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 809AFC433EF for ; Fri, 18 Mar 2022 05:05:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:From:References:To:Subject:MIME-Version: Date:Message-ID:Reply-To:Cc:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=7s6eH5vpH9hme5hXqouPoG5w+ufTCdTEmRIEvIZ/dZI=; b=kkWo7/yi5MHq91lXAspa2SVwQm DKXuBBLjhjj4FRufy4VMMOCjqMfXNmXqghptS4U1DsyEa3nkf36LcD3YsXQmNT5AaLR2bgl/Djn94 MjvzB8KxC9JsIvUFdqjPafSXGvGVjql4HESKU7YyRQoH8ZQoXw02a97hrdfFlVnRB5/dd5kiCSGaf yUVR8nZZixs+pNp1u4bFvTwh8PLvPLRvkcr0Wshj3OJ/jdtELw7xyvckhaULUC0Stc7X8PTGNezKn pQvGNlbG3fXIzsaJyQ7PXBjBaFJplxQlHHcqqrsGET3wRdcgAuYcQVEiJ+oPcG+qMGp1kAyW+K+I6 xXPEAiCg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nV4mS-000MbM-Ee; Fri, 18 Mar 2022 05:04:04 +0000 Received: from mail-pl1-x630.google.com ([2607:f8b0:4864:20::630]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nV4mN-000MaX-SF for linux-arm-kernel@lists.infradead.org; Fri, 18 Mar 2022 05:04:01 +0000 Received: by mail-pl1-x630.google.com with SMTP id n2so6176021plf.4 for ; Thu, 17 Mar 2022 22:03:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :references:from:in-reply-to:content-transfer-encoding; bh=4ih7n004nQWlVsoSJhLNCfTTq+AS0JbqXkaNnwGbseg=; b=tJfkV22f2OYy9qGTj/RiMQ3geg4QfKu0LQRUFwGdVS/9G/LBIdg3l3AK2W96SrQ9aK eYCChMwFErYhPrKvJpdqRwj1vtia2kKUrWyVkBYKGTkyS3P1ZJDmQO3nwpTYauxm+rKW 0xLKKCXxCGCg+ncNf+g6SXcnvlSpGKRifCBiM2M44L1l4/sog0oRxA7sFa2Invq69Xdn n7Qyj/j/GvDHFE3T+J/7uGeXfSTAYDvMyuaHeEULF+8HUSPmAzAseXcRzB6VfoyBKyg0 3OYU7a7HkzFhiFD/jpT6cgxJU4EEDQlyaDM4TO9cLj17yclbv1Aus7i+5OY8Ll2CNstX xsBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=4ih7n004nQWlVsoSJhLNCfTTq+AS0JbqXkaNnwGbseg=; b=hethl7Zu8JS6XLXlk8umVNGyohZVzZSRMxuNgE7vgxxZQhrZBSz1RhD2elr1G3NEBt a3HRlwOR49yAUEdYwMSwbigPmgG0XsHZv2bUPAB+ILxU8ivTgCimt5lIA+5j8mim8UJ6 s1zRI0xnyRlm7RNIs/jE4ywcmTyRiX6Rb9Tw4v5Sa9rJg+vHw2vBJEOhv2FGXXIf+pVO y7aJXZGVo2ToyM/q+uVKk+nMX2Idi3hF8f+FE8qnUDNS103bixDgQtLhaKqG4jSQhI72 i3iQlW2gSCfXiRT6uOWlT9eXd9nzTttFqyC1kcH1LLKx1lDfVaXKCe7yxbOWpZq45S3L WjaA== X-Gm-Message-State: AOAM532b/BiiM/LEj+0ksa4t0bW3XkU5FrxvywOOuemGsR4w2J8HVqe5 bhaQi3faJU2sZlZ/VSL9BkKx0Q== X-Google-Smtp-Source: ABdhPJwsmnxgcDNSHRkP6oqX5wjpiHpuqfKbk7ZWm+7Fy71fwAPlmXtZzOMXJomPFl8TWzuuPe64fw== X-Received: by 2002:a17:902:7797:b0:154:b6d:31f2 with SMTP id o23-20020a170902779700b001540b6d31f2mr1775261pll.39.1647579829949; Thu, 17 Mar 2022 22:03:49 -0700 (PDT) Received: from [192.168.86.237] (107-203-254-183.lightspeed.sntcca.sbcglobal.net. [107.203.254.183]) by smtp.gmail.com with ESMTPSA id bh3-20020a056a02020300b00378b62df320sm6544803pgb.73.2022.03.17.22.03.48 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 17 Mar 2022 22:03:49 -0700 (PDT) Message-ID: <8edcae54-b7e2-1159-5cfe-74e395ab535b@google.com> Date: Thu, 17 Mar 2022 22:03:47 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [RFC PATCH v5 06/38] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs Content-Language: en-US To: Alexandru Elisei , maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, will@kernel.org, mark.rutland@arm.com References: <20211117153842.302159-1-alexandru.elisei@arm.com> <20211117153842.302159-7-alexandru.elisei@arm.com> From: Reiji Watanabe In-Reply-To: <20211117153842.302159-7-alexandru.elisei@arm.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220317_220359_997133_7ABEEF88 X-CRM114-Status: GOOD ( 41.39 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Alex, On 11/17/21 7:38 AM, Alexandru Elisei wrote: > When an MTE-enabled guest first accesses a physical page, that page must be > scrubbed for tags. This is normally done by KVM on a translation fault, but > with locked memslots we will not get translation faults. So far, this has > been handled by forbidding userspace to enable the MTE capability after > locking a memslot. > > Remove this constraint by deferring tag cleaning until the first VCPU is > run, similar to how KVM handles cache maintenance operations. > > When userspace resets a VCPU, KVM again performs cache maintenance > operations on locked memslots because userspace might have modified the > guest memory. Clean the tags the next time a VCPU is run for the same > reason. > > Signed-off-by: Alexandru Elisei > --- > arch/arm64/include/asm/kvm_host.h | 7 ++- > arch/arm64/include/asm/kvm_mmu.h | 2 +- > arch/arm64/kvm/arm.c | 29 ++-------- > arch/arm64/kvm/mmu.c | 95 ++++++++++++++++++++++++++----- > 4 files changed, 91 insertions(+), 42 deletions(-) > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > index 5f49a27ce289..0ebdef158020 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -114,9 +114,10 @@ struct kvm_arch_memory_slot { > }; > > /* kvm->arch.mmu_pending_ops flags */ > -#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE 0 > -#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE 1 > -#define KVM_MAX_MMU_PENDING_OPS 2 > +#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE 0 > +#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE 1 > +#define KVM_LOCKED_MEMSLOT_SANITISE_TAGS 2 > +#define KVM_MAX_MMU_PENDING_OPS 3 > > struct kvm_arch { > struct kvm_s2_mmu mmu; > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > index cbf57c474fea..2d2f902000b3 100644 > --- a/arch/arm64/include/asm/kvm_mmu.h > +++ b/arch/arm64/include/asm/kvm_mmu.h > @@ -222,7 +222,7 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags); > #define kvm_mmu_has_pending_ops(kvm) \ > (!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS)) > > -void kvm_mmu_perform_pending_ops(struct kvm *kvm); > +int kvm_mmu_perform_pending_ops(struct kvm *kvm); > > static inline unsigned int kvm_get_vmid_bits(void) > { > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index 96ed48455cdd..13f3af1f2e78 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -106,25 +106,6 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm, > } > } > > -static bool kvm_arm_has_locked_memslots(struct kvm *kvm) > -{ > - struct kvm_memslots *slots = kvm_memslots(kvm); > - struct kvm_memory_slot *memslot; > - bool has_locked_memslots = false; > - int idx; > - > - idx = srcu_read_lock(&kvm->srcu); > - kvm_for_each_memslot(memslot, slots) { > - if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) { > - has_locked_memslots = true; > - break; > - } > - } > - srcu_read_unlock(&kvm->srcu, idx); > - > - return has_locked_memslots; > -} > - > int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > struct kvm_enable_cap *cap) > { > @@ -139,8 +120,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > break; > case KVM_CAP_ARM_MTE: > mutex_lock(&kvm->lock); > - if (!system_supports_mte() || kvm->created_vcpus || > - (kvm_arm_lock_memslot_supported() && kvm_arm_has_locked_memslots(kvm))) { > + if (!system_supports_mte() || kvm->created_vcpus) { > r = -EINVAL; > } else { > r = 0; > @@ -870,8 +850,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > if (unlikely(!kvm_vcpu_initialized(vcpu))) > return -ENOEXEC; > > - if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm))) > - kvm_mmu_perform_pending_ops(vcpu->kvm); > + if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm))) { > + ret = kvm_mmu_perform_pending_ops(vcpu->kvm); > + if (ret) > + return ret; > + } > > ret = kvm_vcpu_first_run_init(vcpu); > if (ret) > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 188064c5839c..2491e73e3d31 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -613,6 +613,15 @@ void stage2_unmap_vm(struct kvm *kvm) > &kvm->arch.mmu_pending_ops); > set_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE, > &kvm->arch.mmu_pending_ops); > + /* > + * stage2_unmap_vm() is called after a VCPU has run, at > + * which point the state of the MTE cap (either enabled > + * or disabled) is final. > + */ > + if (kvm_has_mte(kvm)) { > + set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, > + &kvm->arch.mmu_pending_ops); > + } > continue; > } > stage2_unmap_memslot(kvm, memslot); > @@ -956,6 +965,55 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn, > return 0; > } > > +static int sanitise_mte_tags_memslot(struct kvm *kvm, > + struct kvm_memory_slot *memslot) > +{ > + unsigned long hva, slot_size, slot_end; > + struct kvm_memory_slot_page *entry; > + struct page *page; > + int ret = 0; > + > + hva = memslot->userspace_addr; > + slot_size = memslot->npages << PAGE_SHIFT; > + slot_end = hva + slot_size; > + > + /* First check that the VMAs spanning the memslot are not shared... */ > + do { > + struct vm_area_struct *vma; > + > + vma = find_vma_intersection(current->mm, hva, slot_end); > + /* The VMAs spanning the memslot must be contiguous. */ > + if (!vma) { > + ret = -EFAULT; > + goto out; > + } > + /* > + * VM_SHARED mappings are not allowed with MTE to avoid races > + * when updating the PG_mte_tagged page flag, see > + * sanitise_mte_tags for more details. > + */ > + if (vma->vm_flags & VM_SHARED) { > + ret = -EFAULT; > + goto out; > + } > + hva = min(slot_end, vma->vm_end); > + } while (hva < slot_end); > + > + /* ... then clear the tags. */ > + list_for_each_entry(entry, &memslot->arch.pages.list, list) { > + page = entry->page; > + if (!test_bit(PG_mte_tagged, &page->flags)) { > + mte_clear_page_tags(page_address(page)); > + set_bit(PG_mte_tagged, &page->flags); > + } > + } > + > +out: > + mmap_read_unlock(current->mm); This appears unnecessary (taken care by the caller). > + > + return ret; > +} > + > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > struct kvm_memory_slot *memslot, unsigned long hva, > unsigned long fault_status) > @@ -1325,14 +1383,29 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) > * - Stage 2 tables cannot be freed from under us as long as at least one VCPU > * is live, which means that the VM will be live. > */ > -void kvm_mmu_perform_pending_ops(struct kvm *kvm) > +int kvm_mmu_perform_pending_ops(struct kvm *kvm) > { > struct kvm_memory_slot *memslot; > + int ret = 0; > > mutex_lock(&kvm->slots_lock); > if (!kvm_mmu_has_pending_ops(kvm)) > goto out_unlock; > > + if (kvm_has_mte(kvm) && > + (test_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops))) { > + kvm_for_each_memslot(memslot, kvm_memslots(kvm)) { > + if (!memslot_is_locked(memslot)) > + continue; > + mmap_read_lock(current->mm); > + ret = sanitise_mte_tags_memslot(kvm, memslot); > + mmap_read_unlock(current->mm); > + if (ret) > + goto out_unlock; > + } > + clear_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops); > + } > + > if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) { > kvm_for_each_memslot(memslot, kvm_memslots(kvm)) { > if (!memslot_is_locked(memslot)) > @@ -1349,7 +1422,7 @@ void kvm_mmu_perform_pending_ops(struct kvm *kvm) > > out_unlock: > mutex_unlock(&kvm->slots_lock); > - return; > + return ret; > } > > static int try_rlimit_memlock(unsigned long npages) > @@ -1443,19 +1516,6 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot, > ret = -ENOMEM; > goto out_err; > } > - if (kvm_has_mte(kvm)) { > - if (vma->vm_flags & VM_SHARED) { > - ret = -EFAULT; > - } else { > - ret = sanitise_mte_tags(kvm, > - page_to_pfn(page_entry->page), > - PAGE_SIZE); > - } > - if (ret) { > - mmap_read_unlock(current->mm); > - goto out_err; > - } > - } > mmap_read_unlock(current->mm); > > ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm)); > @@ -1508,6 +1568,11 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot, > memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE; > > set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops); > + /* > + * MTE might be enabled after we lock the memslot, set it here > + * unconditionally. > + */ > + set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops); Since this won't be needed when the system doesn't support MTE, shouldn't the code check if MTE is supported on the system ? What is the reason to set this here rather than when the mte is enabled ? When MTE is not used, once KVM_LOCKED_MEMSLOT_SANITISE_TAGS is set, it appears that KVM_LOCKED_MEMSLOT_SANITISE_TAGS won't be cleared until all memslots are unlocked (Correct ?). I would think it shouldn't be set when unnecessary or should be cleared once it turns out to be unnecessary. Thanks, Reiji > > kvm_mmu_free_memory_cache(&cache); > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel