From mboxrd@z Thu Jan  1 00:00:00 1970
From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Subject: Re: [PATCH v2] kvm: x86: fix stale mmio cache bug
Date: Tue, 05 Aug 2014 11:36:15 +0800
Message-ID: <53E0512F.2020309@linux.vnet.ibm.com>
References: <1407186620-1999-1-git-send-email-dmatlack@google.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: Eric Northup <digitaleric@google.com>
To: David Matlack <dmatlack@google.com>,
	Gleb Natapov <gleb@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org,
	x86@kernel.org
Return-path: <kvm-owner@vger.kernel.org>
Received: from e28smtp05.in.ibm.com ([122.248.162.5]:57085 "EHLO
	e28smtp05.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1756805AbaHEDgU (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 4 Aug 2014 23:36:20 -0400
Received: from /spool/local
	by e28smtp05.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted
	for <kvm@vger.kernel.org> from <xiaoguangrong@linux.vnet.ibm.com>;
	Tue, 5 Aug 2014 09:06:17 +0530
Received: from d28relay01.in.ibm.com (d28relay01.in.ibm.com [9.184.220.58])
	by d28dlp03.in.ibm.com (Postfix) with ESMTP id 8890F1258017
	for <kvm@vger.kernel.org>; Tue,  5 Aug 2014 09:06:17 +0530 (IST)
Received: from d28av03.in.ibm.com (d28av03.in.ibm.com [9.184.220.65])
	by d28relay01.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s753aBGv44040272
	for <kvm@vger.kernel.org>; Tue, 5 Aug 2014 09:06:11 +0530
Received: from d28av03.in.ibm.com (localhost [127.0.0.1])
	by d28av03.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s753aEtm008914
	for <kvm@vger.kernel.org>; Tue, 5 Aug 2014 09:06:14 +0530
In-Reply-To: <1407186620-1999-1-git-send-email-dmatlack@google.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 08/05/2014 05:10 AM, David Matlack wrote:
> The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
> up to userspace:
> 
> (1) Guest accesses gpa X without a memory slot. The gfn is cached in
> struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
> the SPTE write-execute-noread so that future accesses cause
> EPT_MISCONFIGs.
> 
> (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
> covering the page just accessed.
> 
> (3) Guest attempts to read or write to gpa X again. On Intel, this
> generates an EPT_MISCONFIG. The memory slot generation number that
> was incremented in (2) would normally take care of this but we fast
> path mmio faults through quickly_check_mmio_pf(), which only checks
> the per-vcpu mmio cache. Since we hit the cache, KVM passes a
> KVM_EXIT_MMIO up to userspace.
> 
> This patch fixes the issue by doing the following:
>   - Tag the mmio cache with the memslot generation and use it to
>     validate mmio cache lookups.
>   - Extend vcpu_clear_mmio_info to clear mmio_gfn in addition to
>     mmio_gva, since both can be used to fast path mmio faults.
>   - In mmu_sync_roots, unconditionally clear the mmio cache since
>     even direct_map (e.g. tdp) hosts use it.

It's not needed.

direct map only uses gpa (and never cache gva) and
vcpu_clear_mmio_info only clears gva.

> +static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
> +					gva_t gva, gfn_t gfn, unsigned access)
> +{
> +	vcpu->arch.mmio_gen = kvm_current_mmio_generation(vcpu->kvm);
> +
> +	/*
> +	 * Ensure that the mmio_gen is set before the rest of the cache entry.
> +	 * Otherwise we might see a new generation number attached to an old
> +	 * cache entry if creating/deleting a memslot races with mmio caching.
> +	 * The inverse case is possible (old generation number with new cache
> +	 * info), but that is safe. The next access will just miss the cache
> +	 * when it should have hit.
> +	 */
> +	smp_wmb();

The memory barrier can't help us, consider this scenario:

CPU 0                                      CPU 1
page-fault
see gpa is not mapped in memslot

                              create new memslot containing gpa from Qemu
                                  update the slots's generation number
cache mmio info

!!! later when vcpu accesses gpa again
it will cause mmio-exit.

The easy way to fix this is that we update slots's generation-number
after synchronize_srcu_expedited when memslot is being updated that
ensures other sides can see the new generation-number only after
finishing update.