From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Reiji Watanabe <reijiw@google.com>
Cc: maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, will@kernel.org,
mark.rutland@arm.com
Subject: Re: [RFC PATCH v5 06/38] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs
Date: Mon, 21 Mar 2022 17:17:40 +0000 [thread overview]
Message-ID: <YjizIOvkcn8SNqPx@monolith.localdoman> (raw)
In-Reply-To: <8edcae54-b7e2-1159-5cfe-74e395ab535b@google.com>
Hi,
On Thu, Mar 17, 2022 at 10:03:47PM -0700, Reiji Watanabe wrote:
> Hi Alex,
>
> On 11/17/21 7:38 AM, Alexandru Elisei wrote:
> > When an MTE-enabled guest first accesses a physical page, that page must be
> > scrubbed for tags. This is normally done by KVM on a translation fault, but
> > with locked memslots we will not get translation faults. So far, this has
> > been handled by forbidding userspace to enable the MTE capability after
> > locking a memslot.
> >
> > Remove this constraint by deferring tag cleaning until the first VCPU is
> > run, similar to how KVM handles cache maintenance operations.
> >
> > When userspace resets a VCPU, KVM again performs cache maintenance
> > operations on locked memslots because userspace might have modified the
> > guest memory. Clean the tags the next time a VCPU is run for the same
> > reason.
> >
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> > arch/arm64/include/asm/kvm_host.h | 7 ++-
> > arch/arm64/include/asm/kvm_mmu.h | 2 +-
> > arch/arm64/kvm/arm.c | 29 ++--------
> > arch/arm64/kvm/mmu.c | 95 ++++++++++++++++++++++++++-----
> > 4 files changed, 91 insertions(+), 42 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 5f49a27ce289..0ebdef158020 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -114,9 +114,10 @@ struct kvm_arch_memory_slot {
> > };
> > /* kvm->arch.mmu_pending_ops flags */
> > -#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE 0
> > -#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE 1
> > -#define KVM_MAX_MMU_PENDING_OPS 2
> > +#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE 0
> > +#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE 1
> > +#define KVM_LOCKED_MEMSLOT_SANITISE_TAGS 2
> > +#define KVM_MAX_MMU_PENDING_OPS 3
> > struct kvm_arch {
> > struct kvm_s2_mmu mmu;
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index cbf57c474fea..2d2f902000b3 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -222,7 +222,7 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > #define kvm_mmu_has_pending_ops(kvm) \
> > (!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
> > -void kvm_mmu_perform_pending_ops(struct kvm *kvm);
> > +int kvm_mmu_perform_pending_ops(struct kvm *kvm);
> > static inline unsigned int kvm_get_vmid_bits(void)
> > {
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 96ed48455cdd..13f3af1f2e78 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -106,25 +106,6 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> > }
> > }
> > -static bool kvm_arm_has_locked_memslots(struct kvm *kvm)
> > -{
> > - struct kvm_memslots *slots = kvm_memslots(kvm);
> > - struct kvm_memory_slot *memslot;
> > - bool has_locked_memslots = false;
> > - int idx;
> > -
> > - idx = srcu_read_lock(&kvm->srcu);
> > - kvm_for_each_memslot(memslot, slots) {
> > - if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) {
> > - has_locked_memslots = true;
> > - break;
> > - }
> > - }
> > - srcu_read_unlock(&kvm->srcu, idx);
> > -
> > - return has_locked_memslots;
> > -}
> > -
> > int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > struct kvm_enable_cap *cap)
> > {
> > @@ -139,8 +120,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > break;
> > case KVM_CAP_ARM_MTE:
> > mutex_lock(&kvm->lock);
> > - if (!system_supports_mte() || kvm->created_vcpus ||
> > - (kvm_arm_lock_memslot_supported() && kvm_arm_has_locked_memslots(kvm))) {
> > + if (!system_supports_mte() || kvm->created_vcpus) {
> > r = -EINVAL;
> > } else {
> > r = 0;
> > @@ -870,8 +850,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> > if (unlikely(!kvm_vcpu_initialized(vcpu)))
> > return -ENOEXEC;
> > - if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
> > - kvm_mmu_perform_pending_ops(vcpu->kvm);
> > + if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm))) {
> > + ret = kvm_mmu_perform_pending_ops(vcpu->kvm);
> > + if (ret)
> > + return ret;
> > + }
> > ret = kvm_vcpu_first_run_init(vcpu);
> > if (ret)
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 188064c5839c..2491e73e3d31 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -613,6 +613,15 @@ void stage2_unmap_vm(struct kvm *kvm)
> > &kvm->arch.mmu_pending_ops);
> > set_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE,
> > &kvm->arch.mmu_pending_ops);
> > + /*
> > + * stage2_unmap_vm() is called after a VCPU has run, at
> > + * which point the state of the MTE cap (either enabled
> > + * or disabled) is final.
> > + */
> > + if (kvm_has_mte(kvm)) {
> > + set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS,
> > + &kvm->arch.mmu_pending_ops);
> > + }
> > continue;
> > }
> > stage2_unmap_memslot(kvm, memslot);
> > @@ -956,6 +965,55 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> > return 0;
> > }
> > +static int sanitise_mte_tags_memslot(struct kvm *kvm,
> > + struct kvm_memory_slot *memslot)
> > +{
> > + unsigned long hva, slot_size, slot_end;
> > + struct kvm_memory_slot_page *entry;
> > + struct page *page;
> > + int ret = 0;
> > +
> > + hva = memslot->userspace_addr;
> > + slot_size = memslot->npages << PAGE_SHIFT;
> > + slot_end = hva + slot_size;
> > +
> > + /* First check that the VMAs spanning the memslot are not shared... */
> > + do {
> > + struct vm_area_struct *vma;
> > +
> > + vma = find_vma_intersection(current->mm, hva, slot_end);
> > + /* The VMAs spanning the memslot must be contiguous. */
> > + if (!vma) {
> > + ret = -EFAULT;
> > + goto out;
> > + }
> > + /*
> > + * VM_SHARED mappings are not allowed with MTE to avoid races
> > + * when updating the PG_mte_tagged page flag, see
> > + * sanitise_mte_tags for more details.
> > + */
> > + if (vma->vm_flags & VM_SHARED) {
> > + ret = -EFAULT;
> > + goto out;
> > + }
> > + hva = min(slot_end, vma->vm_end);
> > + } while (hva < slot_end);
> > +
> > + /* ... then clear the tags. */
> > + list_for_each_entry(entry, &memslot->arch.pages.list, list) {
> > + page = entry->page;
> > + if (!test_bit(PG_mte_tagged, &page->flags)) {
> > + mte_clear_page_tags(page_address(page));
> > + set_bit(PG_mte_tagged, &page->flags);
> > + }
> > + }
> > +
> > +out:
> > + mmap_read_unlock(current->mm);
>
> This appears unnecessary (taken care by the caller).
Indeed, this was a refactoring artefact.
>
>
>
> > +
> > + return ret;
> > +}
> > +
> > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > struct kvm_memory_slot *memslot, unsigned long hva,
> > unsigned long fault_status)
> > @@ -1325,14 +1383,29 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
> > * is live, which means that the VM will be live.
> > */
> > -void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> > +int kvm_mmu_perform_pending_ops(struct kvm *kvm)
> > {
> > struct kvm_memory_slot *memslot;
> > + int ret = 0;
> > mutex_lock(&kvm->slots_lock);
> > if (!kvm_mmu_has_pending_ops(kvm))
> > goto out_unlock;
> > + if (kvm_has_mte(kvm) &&
> > + (test_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops))) {
> > + kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> > + if (!memslot_is_locked(memslot))
> > + continue;
> > + mmap_read_lock(current->mm);
> > + ret = sanitise_mte_tags_memslot(kvm, memslot);
> > + mmap_read_unlock(current->mm);
> > + if (ret)
> > + goto out_unlock;
> > + }
> > + clear_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops);
> > + }
> > +
> > if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
> > kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> > if (!memslot_is_locked(memslot))
> > @@ -1349,7 +1422,7 @@ void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> > out_unlock:
> > mutex_unlock(&kvm->slots_lock);
> > - return;
> > + return ret;
> > }
> > static int try_rlimit_memlock(unsigned long npages)
> > @@ -1443,19 +1516,6 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> > ret = -ENOMEM;
> > goto out_err;
> > }
> > - if (kvm_has_mte(kvm)) {
> > - if (vma->vm_flags & VM_SHARED) {
> > - ret = -EFAULT;
> > - } else {
> > - ret = sanitise_mte_tags(kvm,
> > - page_to_pfn(page_entry->page),
> > - PAGE_SIZE);
> > - }
> > - if (ret) {
> > - mmap_read_unlock(current->mm);
> > - goto out_err;
> > - }
> > - }
> > mmap_read_unlock(current->mm);
> > ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm));
> > @@ -1508,6 +1568,11 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> > memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
> > set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> > + /*
> > + * MTE might be enabled after we lock the memslot, set it here
> > + * unconditionally.
> > + */
> > + set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops);
>
>
> Since this won't be needed when the system doesn't support MTE,
> shouldn't the code check if MTE is supported on the system ?
>
> What is the reason to set this here rather than when the mte
> is enabled ?
> When MTE is not used, once KVM_LOCKED_MEMSLOT_SANITISE_TAGS is set,
> it appears that KVM_LOCKED_MEMSLOT_SANITISE_TAGS won't be cleared
> until all memslots are unlocked (Correct ?). I would think it
> shouldn't be set when unnecessary or should be cleared once it turns
> out to be unnecessary.
Indeed, if the user doesn't enable the MTE capability then the bit will
always be set.
The bit must always be set here because KVM has no way of looking into the
future and knowing if the user will enable the MTE capability, as there is
no ordering enforced between creating a memslot and creating a VCPU.
What I can do is clear the bit regardless of the value of kvm_has_mte() in
kvm_mmu_perform_pending_ops(), because at that point the user cannot enable
MTE anymore (at least one VCPU has been created).
Thanks,
Alex
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-03-21 17:19 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-17 15:38 [RFC PATCH v5 00/38] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 01/38] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM Alexandru Elisei
2022-02-15 5:34 ` Reiji Watanabe
2022-02-15 10:34 ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API Alexandru Elisei
2022-02-15 5:59 ` Reiji Watanabe
2022-02-15 11:03 ` Alexandru Elisei
2022-02-15 12:02 ` Marc Zyngier
2022-02-15 12:13 ` Alexandru Elisei
2022-02-17 7:35 ` Reiji Watanabe
2022-02-17 10:31 ` Alexandru Elisei
2022-02-18 4:41 ` Reiji Watanabe
2021-11-17 15:38 ` [RFC PATCH v5 03/38] KVM: arm64: Implement the memslot lock/unlock functionality Alexandru Elisei
2022-02-15 7:46 ` Reiji Watanabe
2022-02-15 11:26 ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 04/38] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run Alexandru Elisei
2022-02-24 5:56 ` Reiji Watanabe
2022-03-21 17:10 ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 05/38] KVM: arm64: Perform CMOs on locked memslots when userspace resets VCPUs Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 06/38] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs Alexandru Elisei
2022-03-18 5:03 ` Reiji Watanabe
2022-03-21 17:17 ` Alexandru Elisei [this message]
2021-11-17 15:38 ` [RFC PATCH v5 07/38] KVM: arm64: Unmap unlocked memslot from stage 2 if kvm_mmu_has_pending_ops() Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 08/38] KVM: arm64: Unlock memslots after stage 2 tables are freed Alexandru Elisei
2022-03-18 5:19 ` Reiji Watanabe
2022-03-21 17:29 ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 09/38] KVM: arm64: Deny changes to locked memslots Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 10/38] KVM: Add kvm_warn{,_ratelimited} macros Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 11/38] KVM: arm64: Print a warning for unexpected faults on locked memslots Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 12/38] KVM: arm64: Allow userspace to lock and unlock memslots Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 13/38] KVM: arm64: Add CONFIG_KVM_ARM_SPE Kconfig option Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 14/38] KVM: arm64: Add SPE capability and VCPU feature Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 15/38] perf: arm_spe_pmu: Move struct arm_spe_pmu to a separate header file Alexandru Elisei
2022-07-05 16:57 ` Calvin Owens
2022-07-06 10:51 ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 16/38] KVM: arm64: Allow SPE emulation when the SPE hardware is present Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 17/38] KVM: arm64: Allow userspace to set the SPE feature only if SPE " Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 18/38] KVM: arm64: Expose SPE version to guests Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 19/38] KVM: arm64: Do not run a VCPU on a CPU without SPE Alexandru Elisei
2022-01-10 11:40 ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 20/38] KVM: arm64: Add a new VCPU device control group for SPE Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 21/38] KVM: arm64: Add SPE VCPU device attribute to set the interrupt number Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 22/38] KVM: arm64: Add SPE VCPU device attribute to initialize SPE Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 23/38] KVM: arm64: debug: Configure MDCR_EL2 when a VCPU has SPE Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 24/38] KVM: arm64: Move accesses to MDCR_EL2 out of __{activate, deactivate}_traps_common Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 25/38] KVM: arm64: VHE: Change MDCR_EL2 at world switch if VCPU has SPE Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 26/38] KVM: arm64: Add SPE system registers to VCPU context Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 27/38] KVM: arm64: nVHE: Save PMSCR_EL1 to the host context Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 28/38] KVM: arm64: Rename DEBUG_STATE_SAVE_SPE -> DEBUG_SAVE_SPE_BUFFER flags Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 29/38] KVM: arm64: nVHE: Context switch SPE state if VCPU has SPE Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 30/38] KVM: arm64: VHE: " Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 31/38] KVM: arm64: Save/restore PMSNEVFR_EL1 on VCPU put/load Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 32/38] KVM: arm64: Allow guest to use physical timestamps if perfmon_capable() Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 33/38] KVM: arm64: Emulate SPE buffer management interrupt Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 34/38] KVM: arm64: Add an userspace API to stop a VCPU profiling Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 35/38] KVM: arm64: Implement " Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 36/38] KVM: arm64: Add PMSIDR_EL1 to the SPE register context Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 37/38] KVM: arm64: Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 38/38] KVM: arm64: Allow userspace to enable SPE for guests Alexandru Elisei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YjizIOvkcn8SNqPx@monolith.localdoman \
--to=alexandru.elisei@arm.com \
--cc=james.morse@arm.com \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=reijiw@google.com \
--cc=suzuki.poulose@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox