From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoffer Dall Subject: Re: [PATCH 2/2] kvm: arm/arm64: Fix use after free of stage2 page table Date: Mon, 15 May 2017 12:00:17 +0200 Message-ID: <20170515100017.GJ9309@cbox> References: <1493821072-29713-1-git-send-email-suzuki.poulose@arm.com> <1493821072-29713-3-git-send-email-suzuki.poulose@arm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 8F2F140296 for ; Mon, 15 May 2017 05:57:03 -0400 (EDT) Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id i+1fsystKGGD for ; Mon, 15 May 2017 05:57:02 -0400 (EDT) Received: from mail-wm0-f50.google.com (mail-wm0-f50.google.com [74.125.82.50]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id E4E8840187 for ; Mon, 15 May 2017 05:57:01 -0400 (EDT) Received: by mail-wm0-f50.google.com with SMTP id d127so69674006wmf.0 for ; Mon, 15 May 2017 03:00:27 -0700 (PDT) Content-Disposition: inline In-Reply-To: <1493821072-29713-3-git-send-email-suzuki.poulose@arm.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu To: Suzuki K Poulose Cc: kvm@vger.kernel.org, marc.zyngier@arm.com, andreyknvl@google.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, pbonzini@redhat.com, kvmarm@lists.cs.columbia.edu List-Id: kvmarm@lists.cs.columbia.edu SGkgU3V6dWtpLAoKT24gV2VkLCBNYXkgMDMsIDIwMTcgYXQgMDM6MTc6NTJQTSArMDEwMCwgU3V6 dWtpIEsgUG91bG9zZSB3cm90ZToKPiBXZSB5aWVsZCB0aGUga3ZtLT5tbXVfbG9jayBvY2Nhc3Np b25hbHkgd2hpbGUgcGVyZm9ybWluZyBhbiBvcGVyYXRpb24KPiAoZS5nLCB1bm1hcCBvciBwZXJt aXNzaW9uIGNoYW5nZXMpIG9uIGEgbGFyZ2UgYXJlYSBvZiBzdGFnZTIgbWFwcGluZ3MuCj4gSG93 ZXZlciB0aGlzIGNvdWxkIHBvc3NpYmx5IGNhdXNlIGFub3RoZXIgdGhyZWFkIHRvIGNsZWFyIGFu ZCBmcmVlIHVwCj4gdGhlIHN0YWdlMiBwYWdlIHRhYmxlcyB3aGlsZSB3ZSB3ZXJlIHdhaXRpbmcg Zm9yIHJlZ2FpbmluZyB0aGUgbG9jayBhbmQKPiB0aHVzIHRoZSBvcmlnaW5hbCB0aHJlYWQgY291 bGQgZW5kIHVwIGluIGFjY2Vzc2luZyBtZW1vcnkgdGhhdCB3YXMKPiBmcmVlZC4gVGhpcyBwYXRj aCBmaXhlcyB0aGUgcHJvYmxlbSBieSBtYWtpbmcgc3VyZSB0aGF0IHRoZSBzdGFnZTIKPiBwYWdl dGFibGUgaXMgc3RpbGwgdmFsaWQgYWZ0ZXIgd2UgcmVnYWluIHRoZSBsb2NrLiBUaGUgZmFjdCB0 aGF0Cj4gbW11X25vdGlmZXItPnJlbGVhc2UoKSBjb3VsZCBiZSBjYWxsZWQgdHdpY2UgKHZpYSBf X21tdV9ub3RpZmllcl9yZWxlYXNlCj4gYW5kIG1tdV9ub3RpZmllcl91bnJlZ3Npc3RlcikgZW5o YW5jZXMgdGhlIHBvc3NpYmlsaXR5IG9mIGhpdHRpbmcKPiB0aGlzIHJhY2Ugd2hlcmUgdGhlcmUg YXJlIHR3byB0aHJlYWRzIHRyeWluZyB0byB1bm1hcCB0aGUgZW50aXJlIGd1ZXN0Cj4gc2hhZG93 IHBhZ2VzLgo+IAo+IFdoaWxlIGF0IGl0LCBjbGVhbnVwIHRoZSByZWR1ZGFudCBjaGVja3MgYXJv dW5kIGNvbmRfcmVzY2hlZF9sb2NrIGluCj4gc3RhZ2UyX3dwX3JhbmdlKCksIGFzIGNvbmRfcmVz Y2hlZF9sb2NrIGFscmVhZHkgZG9lcyB0aGUgc2FtZSBjaGVja3MuCj4gCj4gQ2M6IE1hcmsgUnV0 bGFuZCA8bWFyay5ydXRsYW5kQGFybS5jb20+Cj4gQ2M6IFJhZGltIEtyxI1tw6HFmSA8cmtyY21h ckByZWRoYXQuY29tPgo+IENjOiBhbmRyZXlrbnZsQGdvb2dsZS5jb20KPiBDYzogQ2hyaXN0b2Zm ZXIgRGFsbCA8Y2hyaXN0b2ZmZXIuZGFsbEBsaW5hcm8ub3JnPgo+IENjOiBNYXJjIFp5bmdpZXIg PG1hcmMuenluZ2llckBhcm0uY29tPgo+IENjOiBQYW9sbyBCb256aW5pIDxwYm9uemluaUByZWRo YXQuY29tPgo+IFNpZ25lZC1vZmYtYnk6IFN1enVraSBLIFBvdWxvc2UgPHN1enVraS5wb3Vsb3Nl QGFybS5jb20+Cj4gLS0tCj4gIGFyY2gvYXJtL2t2bS9tbXUuYyB8IDE3ICsrKysrKysrKysrKy0t LS0tCj4gIDEgZmlsZSBjaGFuZ2VkLCAxMiBpbnNlcnRpb25zKCspLCA1IGRlbGV0aW9ucygtKQo+ IAo+IGRpZmYgLS1naXQgYS9hcmNoL2FybS9rdm0vbW11LmMgYi9hcmNoL2FybS9rdm0vbW11LmMK PiBpbmRleCA5MDlhMWE3Li41YjNlMGRiIDEwMDY0NAo+IC0tLSBhL2FyY2gvYXJtL2t2bS9tbXUu Ywo+ICsrKyBiL2FyY2gvYXJtL2t2bS9tbXUuYwo+IEBAIC0zMDEsOSArMzAxLDE0IEBAIHN0YXRp YyB2b2lkIHVubWFwX3N0YWdlMl9yYW5nZShzdHJ1Y3Qga3ZtICprdm0sIHBoeXNfYWRkcl90IHN0 YXJ0LCB1NjQgc2l6ZSkKPiAgCQkvKgo+ICAJCSAqIElmIHRoZSByYW5nZSBpcyB0b28gbGFyZ2Us IHJlbGVhc2UgdGhlIGt2bS0+bW11X2xvY2sKPiAgCQkgKiB0byBwcmV2ZW50IHN0YXJ2YXRpb24g YW5kIGxvY2t1cCBkZXRlY3RvciB3YXJuaW5ncy4KPiArCQkgKiBNYWtlIHN1cmUgdGhlIHBhZ2Ug dGFibGUgaXMgc3RpbGwgYWN0aXZlIHdoZW4gd2UgcmVnYWluCj4gKwkJICogdGhlIGxvY2suCj4g IAkJICovCj4gLQkJaWYgKG5leHQgIT0gZW5kKQo+ICsJCWlmIChuZXh0ICE9IGVuZCkgewo+ICAJ CQljb25kX3Jlc2NoZWRfbG9jaygma3ZtLT5tbXVfbG9jayk7Cj4gKwkJCWlmICghUkVBRF9PTkNF KGt2bS0+YXJjaC5wZ2QpKQo+ICsJCQkJYnJlYWs7Cj4gKwkJfQoKU28gSSBkb24ndCB0aGluayB0 aGlzIGNoYW5nZSBpcyB3cm9uZywgYnV0IEkgd29uZGVyIGlmIGl0J3Mgc3VmZmljaWVudC4KRm9y IGV4YW1wbGUsIEkgY2FuIHNlZSB0aGF0IHRoaXMgZnVuY3Rpb24gaXMgY2FsbGVkIGZyb20KCnN0 YWdlMl91bm1zcF92bQogLT4gc3RhZ2UyX3VubWFwX21lbXNsb3QKICAgLT4gdW5tYXBfc3RhZ2Uy X3JhbmdlCgphbmQKCmt2bV9hcmNoX2ZsdXNoX3NoYWRvd19tZW1zbG90CiAtPiB1bm1hcF9zdGFn ZTJfcmFuZ2UKCndoaWNoIG5ldmVyIGNoZWNrIGlmIHRoZSBwZ2QgcG9pbnRlciBpcyB2YWxpZCwg YW5kIGZpbmFsbHkKa3ZtX2ZyZWVfc3RhZ2UyX3BnZCBhbHNvIGNoZWNrcyB0aGUgcGdkIHBvaW50 ZXIgb3V0c2lkZSBvZiBob2xkaW5nIHRoZQprdm0tPm1tdV9sb2NrIHNvIHdoeSBpcyB0aGlzIG5v dCByYWN5PwoKPiAgCX0gd2hpbGUgKHBnZCsrLCBhZGRyID0gbmV4dCwgYWRkciAhPSBlbmQpOwo+ ICB9Cj4gIAo+IEBAIC0xMTcwLDExICsxMTc1LDEzIEBAIHN0YXRpYyB2b2lkIHN0YWdlMl93cF9y YW5nZShzdHJ1Y3Qga3ZtICprdm0sIHBoeXNfYWRkcl90IGFkZHIsIHBoeXNfYWRkcl90IGVuZCkK PiAgCQkgKiBsYXJnZS4gT3RoZXJ3aXNlLCB3ZSBtYXkgc2VlIGtlcm5lbCBwYW5pY3Mgd2l0aAo+ ICAJCSAqIENPTkZJR19ERVRFQ1RfSFVOR19UQVNLLCBDT05GSUdfTE9DS1VQX0RFVEVDVE9SLAo+ ICAJCSAqIENPTkZJR19MT0NLREVQLiBBZGRpdGlvbmFsbHksIGhvbGRpbmcgdGhlIGxvY2sgdG9v IGxvbmcKPiAtCQkgKiB3aWxsIGFsc28gc3RhcnZlIG90aGVyIHZDUFVzLgo+ICsJCSAqIHdpbGwg YWxzbyBzdGFydmUgb3RoZXIgdkNQVXMuIFdlIGhhdmUgdG8gYWxzbyBtYWtlIHN1cmUKPiArCQkg KiB0aGF0IHRoZSBwYWdlIHRhYmxlcyBhcmUgbm90IGZyZWVkIHdoaWxlIHdlIHJlbGVhc2VkCj4g KwkJICogdGhlIGxvY2suCj4gIAkJICovCj4gLQkJaWYgKG5lZWRfcmVzY2hlZCgpIHx8IHNwaW5f bmVlZGJyZWFrKCZrdm0tPm1tdV9sb2NrKSkKPiAtCQkJY29uZF9yZXNjaGVkX2xvY2soJmt2bS0+ bW11X2xvY2spOwo+IC0KPiArCQljb25kX3Jlc2NoZWRfbG9jaygma3ZtLT5tbXVfbG9jayk7Cj4g KwkJaWYgKCFSRUFEX09OQ0Uoa3ZtLT5hcmNoLnBnZCkpCj4gKwkJCWJyZWFrOwoKSGVyZSBJIHN1 cHBvc2UgeW91IGRvbid0IGhhdmUgdGhlIGlzc3VlIGJlY2FzZSB5b3UgY2hlY2sgdGhlIHBnZCBw b2ludGVyCmJlZm9yZSBkZXJlZmVuY2luZyBpdCBpbiBhbGwgY2FzZXMuCgpUaGFua3MsCi1DaHJp c3RvZmZlcgoKPiAgCQluZXh0ID0gc3RhZ2UyX3BnZF9hZGRyX2VuZChhZGRyLCBlbmQpOwo+ICAJ CWlmIChzdGFnZTJfcGdkX3ByZXNlbnQoKnBnZCkpCj4gIAkJCXN0YWdlMl93cF9wdWRzKHBnZCwg YWRkciwgbmV4dCk7Cj4gLS0gCj4gMi43LjQKPiAKX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX18Ka3ZtYXJtIG1haWxpbmcgbGlzdAprdm1hcm1AbGlzdHMuY3Mu Y29sdW1iaWEuZWR1Cmh0dHBzOi8vbGlzdHMuY3MuY29sdW1iaWEuZWR1L21haWxtYW4vbGlzdGlu Zm8va3ZtYXJtCg== From mboxrd@z Thu Jan 1 00:00:00 1970 From: cdall@linaro.org (Christoffer Dall) Date: Mon, 15 May 2017 12:00:17 +0200 Subject: [PATCH 2/2] kvm: arm/arm64: Fix use after free of stage2 page table In-Reply-To: <1493821072-29713-3-git-send-email-suzuki.poulose@arm.com> References: <1493821072-29713-1-git-send-email-suzuki.poulose@arm.com> <1493821072-29713-3-git-send-email-suzuki.poulose@arm.com> Message-ID: <20170515100017.GJ9309@cbox> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi Suzuki, On Wed, May 03, 2017 at 03:17:52PM +0100, Suzuki K Poulose wrote: > We yield the kvm->mmu_lock occassionaly while performing an operation > (e.g, unmap or permission changes) on a large area of stage2 mappings. > However this could possibly cause another thread to clear and free up > the stage2 page tables while we were waiting for regaining the lock and > thus the original thread could end up in accessing memory that was > freed. This patch fixes the problem by making sure that the stage2 > pagetable is still valid after we regain the lock. The fact that > mmu_notifer->release() could be called twice (via __mmu_notifier_release > and mmu_notifier_unregsister) enhances the possibility of hitting > this race where there are two threads trying to unmap the entire guest > shadow pages. > > While at it, cleanup the redudant checks around cond_resched_lock in > stage2_wp_range(), as cond_resched_lock already does the same checks. > > Cc: Mark Rutland > Cc: Radim Kr?m?? > Cc: andreyknvl at google.com > Cc: Christoffer Dall > Cc: Marc Zyngier > Cc: Paolo Bonzini > Signed-off-by: Suzuki K Poulose > --- > arch/arm/kvm/mmu.c | 17 ++++++++++++----- > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > index 909a1a7..5b3e0db 100644 > --- a/arch/arm/kvm/mmu.c > +++ b/arch/arm/kvm/mmu.c > @@ -301,9 +301,14 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) > /* > * If the range is too large, release the kvm->mmu_lock > * to prevent starvation and lockup detector warnings. > + * Make sure the page table is still active when we regain > + * the lock. > */ > - if (next != end) > + if (next != end) { > cond_resched_lock(&kvm->mmu_lock); > + if (!READ_ONCE(kvm->arch.pgd)) > + break; > + } So I don't think this change is wrong, but I wonder if it's sufficient. For example, I can see that this function is called from stage2_unmsp_vm -> stage2_unmap_memslot -> unmap_stage2_range and kvm_arch_flush_shadow_memslot -> unmap_stage2_range which never check if the pgd pointer is valid, and finally kvm_free_stage2_pgd also checks the pgd pointer outside of holding the kvm->mmu_lock so why is this not racy? > } while (pgd++, addr = next, addr != end); > } > > @@ -1170,11 +1175,13 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) > * large. Otherwise, we may see kernel panics with > * CONFIG_DETECT_HUNG_TASK, CONFIG_LOCKUP_DETECTOR, > * CONFIG_LOCKDEP. Additionally, holding the lock too long > - * will also starve other vCPUs. > + * will also starve other vCPUs. We have to also make sure > + * that the page tables are not freed while we released > + * the lock. > */ > - if (need_resched() || spin_needbreak(&kvm->mmu_lock)) > - cond_resched_lock(&kvm->mmu_lock); > - > + cond_resched_lock(&kvm->mmu_lock); > + if (!READ_ONCE(kvm->arch.pgd)) > + break; Here I suppose you don't have the issue becase you check the pgd pointer before derefencing it in all cases. Thanks, -Christoffer > next = stage2_pgd_addr_end(addr, end); > if (stage2_pgd_present(*pgd)) > stage2_wp_puds(pgd, addr, next); > -- > 2.7.4 > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760899AbdEOKAa (ORCPT ); Mon, 15 May 2017 06:00:30 -0400 Received: from mail-wm0-f47.google.com ([74.125.82.47]:37704 "EHLO mail-wm0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758578AbdEOKA2 (ORCPT ); Mon, 15 May 2017 06:00:28 -0400 Date: Mon, 15 May 2017 12:00:17 +0200 From: Christoffer Dall To: Suzuki K Poulose Cc: christoffer.dall@linaro.org, agraf@suse.de, andreyknvl@google.com, marc.zyngier@arm.com, mark.rutland@arm.com, pbonzini@redhat.com, rkrcmar@redhat.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org Subject: Re: [PATCH 2/2] kvm: arm/arm64: Fix use after free of stage2 page table Message-ID: <20170515100017.GJ9309@cbox> References: <1493821072-29713-1-git-send-email-suzuki.poulose@arm.com> <1493821072-29713-3-git-send-email-suzuki.poulose@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1493821072-29713-3-git-send-email-suzuki.poulose@arm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Suzuki, On Wed, May 03, 2017 at 03:17:52PM +0100, Suzuki K Poulose wrote: > We yield the kvm->mmu_lock occassionaly while performing an operation > (e.g, unmap or permission changes) on a large area of stage2 mappings. > However this could possibly cause another thread to clear and free up > the stage2 page tables while we were waiting for regaining the lock and > thus the original thread could end up in accessing memory that was > freed. This patch fixes the problem by making sure that the stage2 > pagetable is still valid after we regain the lock. The fact that > mmu_notifer->release() could be called twice (via __mmu_notifier_release > and mmu_notifier_unregsister) enhances the possibility of hitting > this race where there are two threads trying to unmap the entire guest > shadow pages. > > While at it, cleanup the redudant checks around cond_resched_lock in > stage2_wp_range(), as cond_resched_lock already does the same checks. > > Cc: Mark Rutland > Cc: Radim Krčmář > Cc: andreyknvl@google.com > Cc: Christoffer Dall > Cc: Marc Zyngier > Cc: Paolo Bonzini > Signed-off-by: Suzuki K Poulose > --- > arch/arm/kvm/mmu.c | 17 ++++++++++++----- > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > index 909a1a7..5b3e0db 100644 > --- a/arch/arm/kvm/mmu.c > +++ b/arch/arm/kvm/mmu.c > @@ -301,9 +301,14 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) > /* > * If the range is too large, release the kvm->mmu_lock > * to prevent starvation and lockup detector warnings. > + * Make sure the page table is still active when we regain > + * the lock. > */ > - if (next != end) > + if (next != end) { > cond_resched_lock(&kvm->mmu_lock); > + if (!READ_ONCE(kvm->arch.pgd)) > + break; > + } So I don't think this change is wrong, but I wonder if it's sufficient. For example, I can see that this function is called from stage2_unmsp_vm -> stage2_unmap_memslot -> unmap_stage2_range and kvm_arch_flush_shadow_memslot -> unmap_stage2_range which never check if the pgd pointer is valid, and finally kvm_free_stage2_pgd also checks the pgd pointer outside of holding the kvm->mmu_lock so why is this not racy? > } while (pgd++, addr = next, addr != end); > } > > @@ -1170,11 +1175,13 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) > * large. Otherwise, we may see kernel panics with > * CONFIG_DETECT_HUNG_TASK, CONFIG_LOCKUP_DETECTOR, > * CONFIG_LOCKDEP. Additionally, holding the lock too long > - * will also starve other vCPUs. > + * will also starve other vCPUs. We have to also make sure > + * that the page tables are not freed while we released > + * the lock. > */ > - if (need_resched() || spin_needbreak(&kvm->mmu_lock)) > - cond_resched_lock(&kvm->mmu_lock); > - > + cond_resched_lock(&kvm->mmu_lock); > + if (!READ_ONCE(kvm->arch.pgd)) > + break; Here I suppose you don't have the issue becase you check the pgd pointer before derefencing it in all cases. Thanks, -Christoffer > next = stage2_pgd_addr_end(addr, end); > if (stage2_pgd_present(*pgd)) > stage2_wp_puds(pgd, addr, next); > -- > 2.7.4 >