All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Marc Zyngier <maz@kernel.org>
Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	 kvm@vger.kernel.org, James Morse <james.morse@arm.com>,
	 Suzuki K Poulose <suzuki.poulose@arm.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	 Zenghui Yu <yuzenghui@huawei.com>,
	Ard Biesheuvel <ardb@kernel.org>, Will Deacon <will@kernel.org>,
	 Quentin Perret <qperret@google.com>,
	stable@vger.kernel.org,  David Matlack <dmatlack@google.com>
Subject: Re: [PATCH 1/2] KVM: arm64: Disable interrupts while walking userspace PTs
Date: Mon, 13 Mar 2023 08:53:37 -0700	[thread overview]
Message-ID: <ZA9HAQtkCDwFXcsm@google.com> (raw)
In-Reply-To: <20230313091425.1962708-2-maz@kernel.org>

+David

On Mon, Mar 13, 2023, Marc Zyngier wrote:
> We walk the userspace PTs to discover what mapping size was
> used there. However, this can race against the userspace tables
> being freed, and we end-up in the weeds.
> 
> Thankfully, the mm code is being generous and will IPI us when
> doing so. So let's implement our part of the bargain and disable
> interrupts around the walk. This ensures that nothing terrible
> happens during that time.
> 
> We still need to handle the removal of the page tables before
> the walk. For that, allow get_user_mapping_size() to return an
> error, and make sure this error can be propagated all the way
> to the the exit handler.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Cc: stable@vger.kernel.org
> ---
>  arch/arm64/kvm/mmu.c | 35 ++++++++++++++++++++++++++++-------
>  1 file changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 7113587222ff..d7b8b25942df 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -666,14 +666,23 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
>  				   CONFIG_PGTABLE_LEVELS),
>  		.mm_ops		= &kvm_user_mm_ops,
>  	};
> +	unsigned long flags;
>  	kvm_pte_t pte = 0;	/* Keep GCC quiet... */
>  	u32 level = ~0;
>  	int ret;
>  
> +	/*
> +	 * Disable IRQs so that we hazard against a concurrent
> +	 * teardown of the userspace page tables (which relies on
> +	 * IPI-ing threads).
> +	 */
> +	local_irq_save(flags);
>  	ret = kvm_pgtable_get_leaf(&pgt, addr, &pte, &level);
> -	VM_BUG_ON(ret);
> -	VM_BUG_ON(level >= KVM_PGTABLE_MAX_LEVELS);
> -	VM_BUG_ON(!(pte & PTE_VALID));
> +	local_irq_restore(flags);
> +
> +	/* Oops, the userspace PTs are gone... */
> +	if (ret || level >= KVM_PGTABLE_MAX_LEVELS || !(pte & PTE_VALID))
> +		return -EFAULT;

I don't think this should return -EFAULT all the way out to userspace.  Unless
arm64 differs from x86 in terms of how the userspace page tables are managed, not
having a valid translation _right now_ doesn't mean that one can't be created in
the future, e.g. by way of a subsequent hva_to_pfn().

FWIW, the approach x86 takes is to install a 4KiB (smallest granuale) translation,
which is safe since there _was_ a valid translation when mmu_lock was acquired and
mmu_invalidate_retry() was checked.  It's the primary MMU's responsibility to ensure
all secondary MMUs are purged before freeing memory, i.e. worst case should be that
KVMs stage-2 translation will be immediately zapped via mmu_notifier.

KVM ARM also has a bug that might be related: the mmu_seq snapshot needs to be
taken _before_ mmap_read_unlock(), otherwise vma_shift may be stale by the time
it's consumed.  I believe David is going to submit a patch (I found and "reported"
the bug when doing an internal review of "common MMU" stuff).

WARNING: multiple messages have this Message-ID (diff)
From: Sean Christopherson <seanjc@google.com>
To: Marc Zyngier <maz@kernel.org>
Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	 kvm@vger.kernel.org, James Morse <james.morse@arm.com>,
	 Suzuki K Poulose <suzuki.poulose@arm.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	 Zenghui Yu <yuzenghui@huawei.com>,
	Ard Biesheuvel <ardb@kernel.org>, Will Deacon <will@kernel.org>,
	 Quentin Perret <qperret@google.com>,
	stable@vger.kernel.org,  David Matlack <dmatlack@google.com>
Subject: Re: [PATCH 1/2] KVM: arm64: Disable interrupts while walking userspace PTs
Date: Mon, 13 Mar 2023 08:53:37 -0700	[thread overview]
Message-ID: <ZA9HAQtkCDwFXcsm@google.com> (raw)
In-Reply-To: <20230313091425.1962708-2-maz@kernel.org>

+David

On Mon, Mar 13, 2023, Marc Zyngier wrote:
> We walk the userspace PTs to discover what mapping size was
> used there. However, this can race against the userspace tables
> being freed, and we end-up in the weeds.
> 
> Thankfully, the mm code is being generous and will IPI us when
> doing so. So let's implement our part of the bargain and disable
> interrupts around the walk. This ensures that nothing terrible
> happens during that time.
> 
> We still need to handle the removal of the page tables before
> the walk. For that, allow get_user_mapping_size() to return an
> error, and make sure this error can be propagated all the way
> to the the exit handler.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Cc: stable@vger.kernel.org
> ---
>  arch/arm64/kvm/mmu.c | 35 ++++++++++++++++++++++++++++-------
>  1 file changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 7113587222ff..d7b8b25942df 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -666,14 +666,23 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
>  				   CONFIG_PGTABLE_LEVELS),
>  		.mm_ops		= &kvm_user_mm_ops,
>  	};
> +	unsigned long flags;
>  	kvm_pte_t pte = 0;	/* Keep GCC quiet... */
>  	u32 level = ~0;
>  	int ret;
>  
> +	/*
> +	 * Disable IRQs so that we hazard against a concurrent
> +	 * teardown of the userspace page tables (which relies on
> +	 * IPI-ing threads).
> +	 */
> +	local_irq_save(flags);
>  	ret = kvm_pgtable_get_leaf(&pgt, addr, &pte, &level);
> -	VM_BUG_ON(ret);
> -	VM_BUG_ON(level >= KVM_PGTABLE_MAX_LEVELS);
> -	VM_BUG_ON(!(pte & PTE_VALID));
> +	local_irq_restore(flags);
> +
> +	/* Oops, the userspace PTs are gone... */
> +	if (ret || level >= KVM_PGTABLE_MAX_LEVELS || !(pte & PTE_VALID))
> +		return -EFAULT;

I don't think this should return -EFAULT all the way out to userspace.  Unless
arm64 differs from x86 in terms of how the userspace page tables are managed, not
having a valid translation _right now_ doesn't mean that one can't be created in
the future, e.g. by way of a subsequent hva_to_pfn().

FWIW, the approach x86 takes is to install a 4KiB (smallest granuale) translation,
which is safe since there _was_ a valid translation when mmu_lock was acquired and
mmu_invalidate_retry() was checked.  It's the primary MMU's responsibility to ensure
all secondary MMUs are purged before freeing memory, i.e. worst case should be that
KVMs stage-2 translation will be immediately zapped via mmu_notifier.

KVM ARM also has a bug that might be related: the mmu_seq snapshot needs to be
taken _before_ mmap_read_unlock(), otherwise vma_shift may be stale by the time
it's consumed.  I believe David is going to submit a patch (I found and "reported"
the bug when doing an internal review of "common MMU" stuff).

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2023-03-13 15:53 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-13  9:14 [PATCH 0/2] KVM: arm64: Plug a couple of MM races Marc Zyngier
2023-03-13  9:14 ` Marc Zyngier
2023-03-13  9:14 ` [PATCH 1/2] KVM: arm64: Disable interrupts while walking userspace PTs Marc Zyngier
2023-03-13  9:14   ` Marc Zyngier
2023-03-13 15:53   ` Sean Christopherson [this message]
2023-03-13 15:53     ` Sean Christopherson
2023-03-13 17:16     ` David Matlack
2023-03-13 17:16       ` David Matlack
2023-03-13 17:21       ` Sean Christopherson
2023-03-13 17:21         ` Sean Christopherson
2023-03-13 17:26         ` David Matlack
2023-03-13 17:26           ` David Matlack
2023-03-13 17:40     ` Marc Zyngier
2023-03-13 17:40       ` Marc Zyngier
2023-03-13  9:14 ` [PATCH 2/2] KVM: arm64: Check for kvm_vma_mte_allowed in the critical section Marc Zyngier
2023-03-13  9:14   ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZA9HAQtkCDwFXcsm@google.com \
    --to=seanjc@google.com \
    --cc=ardb@kernel.org \
    --cc=dmatlack@google.com \
    --cc=james.morse@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=qperret@google.com \
    --cc=stable@vger.kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.