From mboxrd@z Thu Jan 1 00:00:00 1970 From: Xiao Guangrong Subject: Re: [PATCH v2] kvm: x86: fix stale mmio cache bug Date: Wed, 06 Aug 2014 11:26:39 +0800 Message-ID: <53E1A06F.90005@linux.vnet.ibm.com> References: <1407186620-1999-1-git-send-email-dmatlack@google.com> <53E0512F.2020309@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Gleb Natapov , Paolo Bonzini , kvm@vger.kernel.org, x86@kernel.org, Eric Northup To: David Matlack Return-path: Received: from e28smtp02.in.ibm.com ([122.248.162.2]:43664 "EHLO e28smtp02.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751026AbaHFD0n (ORCPT ); Tue, 5 Aug 2014 23:26:43 -0400 Received: from /spool/local by e28smtp02.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 6 Aug 2014 08:56:41 +0530 Received: from d28relay05.in.ibm.com (d28relay05.in.ibm.com [9.184.220.62]) by d28dlp02.in.ibm.com (Postfix) with ESMTP id 3C69D394001A for ; Wed, 6 Aug 2014 08:56:38 +0530 (IST) Received: from d28av05.in.ibm.com (d28av05.in.ibm.com [9.184.220.67]) by d28relay05.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s763R0M164553132 for ; Wed, 6 Aug 2014 08:57:00 +0530 Received: from d28av05.in.ibm.com (localhost [127.0.0.1]) by d28av05.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s763QbWI025616 for ; Wed, 6 Aug 2014 08:56:37 +0530 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On 08/06/2014 06:39 AM, David Matlack wrote: > On Mon, Aug 4, 2014 at 8:36 PM, Xiao Guangrong > wrote: >> On 08/05/2014 05:10 AM, David Matlack wrote: >>> >>> This patch fixes the issue by doing the following: >>> - Tag the mmio cache with the memslot generation and use it to >>> validate mmio cache lookups. >>> - Extend vcpu_clear_mmio_info to clear mmio_gfn in addition to >>> mmio_gva, since both can be used to fast path mmio faults. >>> - In mmu_sync_roots, unconditionally clear the mmio cache since >>> even direct_map (e.g. tdp) hosts use it. >> >> It's not needed. >> >> direct map only uses gpa (and never cache gva) and >> vcpu_clear_mmio_info only clears gva. > > Ok thanks for the clarification. > >>> +static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu, >>> + gva_t gva, gfn_t gfn, unsigned access) >>> +{ >>> + vcpu->arch.mmio_gen = kvm_current_mmio_generation(vcpu->kvm); >>> + >>> + /* >>> + * Ensure that the mmio_gen is set before the rest of the cache entry. >>> + * Otherwise we might see a new generation number attached to an old >>> + * cache entry if creating/deleting a memslot races with mmio caching. >>> + * The inverse case is possible (old generation number with new cache >>> + * info), but that is safe. The next access will just miss the cache >>> + * when it should have hit. >>> + */ >>> + smp_wmb(); >> >> The memory barrier can't help us, consider this scenario: >> >> CPU 0 CPU 1 >> page-fault >> see gpa is not mapped in memslot >> >> create new memslot containing gpa from Qemu >> update the slots's generation number >> cache mmio info >> >> !!! later when vcpu accesses gpa again >> it will cause mmio-exit. > > Ah! Thanks for catching my mistake. > >> The easy way to fix this is that we update slots's generation-number >> after synchronize_srcu_expedited when memslot is being updated that >> ensures other sides can see the new generation-number only after >> finishing update. > > It would be possible for a vcpu to see an inconsistent kvm_memslots struct > (new set of slots with an old generation number). Is that not an issue? In this case, checking generation-number will fail, we will goto the slow path to handle mmio access - that's very rare, so i think it's ok. > > We could just use the generation number that is stored in the > spte. The only downside (that I can see) is that handle_abnormal_pfn() > calls vcpu_cache_mmio_info() but does not have access to the spte. > So presumably we'd have to do a page table walk. The issue is not only in vcpu_cache_mmio_info but also in mark_mmio_spte() where we may cache new generation-number and old memslots info.