From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoffer Dall Subject: Re: [PATCH 2/2] kvm: arm/arm64: Fix use after free of stage2 page table Date: Mon, 15 May 2017 19:43:37 +0200 Message-ID: <20170515174337.GC18755@cbox> References: <1493821072-29713-1-git-send-email-suzuki.poulose@arm.com> <1493821072-29713-3-git-send-email-suzuki.poulose@arm.com> <20170515100017.GJ9309@cbox> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 5F96140905 for ; Mon, 15 May 2017 13:40:24 -0400 (EDT) Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nQzCR92jAyHa for ; Mon, 15 May 2017 13:40:22 -0400 (EDT) Received: from mail-qk0-f170.google.com (mail-qk0-f170.google.com [209.85.220.170]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id 0ED2440187 for ; Mon, 15 May 2017 13:40:21 -0400 (EDT) Received: by mail-qk0-f170.google.com with SMTP id y201so104377303qka.0 for ; Mon, 15 May 2017 10:43:48 -0700 (PDT) Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu To: Suzuki K Poulose Cc: kvm@vger.kernel.org, marc.zyngier@arm.com, andreyknvl@google.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, pbonzini@redhat.com, kvmarm@lists.cs.columbia.edu List-Id: kvmarm@lists.cs.columbia.edu T24gTW9uLCBNYXkgMTUsIDIwMTcgYXQgMDI6MzY6NThQTSArMDEwMCwgU3V6dWtpIEsgUG91bG9z ZSB3cm90ZToKPiBPbiAxNS8wNS8xNyAxMTowMCwgQ2hyaXN0b2ZmZXIgRGFsbCB3cm90ZToKPiA+ SGkgU3V6dWtpLAo+ID4KPiA+T24gV2VkLCBNYXkgMDMsIDIwMTcgYXQgMDM6MTc6NTJQTSArMDEw MCwgU3V6dWtpIEsgUG91bG9zZSB3cm90ZToKPiA+PldlIHlpZWxkIHRoZSBrdm0tPm1tdV9sb2Nr IG9jY2Fzc2lvbmFseSB3aGlsZSBwZXJmb3JtaW5nIGFuIG9wZXJhdGlvbgo+ID4+KGUuZywgdW5t YXAgb3IgcGVybWlzc2lvbiBjaGFuZ2VzKSBvbiBhIGxhcmdlIGFyZWEgb2Ygc3RhZ2UyIG1hcHBp bmdzLgo+ID4+SG93ZXZlciB0aGlzIGNvdWxkIHBvc3NpYmx5IGNhdXNlIGFub3RoZXIgdGhyZWFk IHRvIGNsZWFyIGFuZCBmcmVlIHVwCj4gPj50aGUgc3RhZ2UyIHBhZ2UgdGFibGVzIHdoaWxlIHdl IHdlcmUgd2FpdGluZyBmb3IgcmVnYWluaW5nIHRoZSBsb2NrIGFuZAo+ID4+dGh1cyB0aGUgb3Jp Z2luYWwgdGhyZWFkIGNvdWxkIGVuZCB1cCBpbiBhY2Nlc3NpbmcgbWVtb3J5IHRoYXQgd2FzCj4g Pj5mcmVlZC4gVGhpcyBwYXRjaCBmaXhlcyB0aGUgcHJvYmxlbSBieSBtYWtpbmcgc3VyZSB0aGF0 IHRoZSBzdGFnZTIKPiA+PnBhZ2V0YWJsZSBpcyBzdGlsbCB2YWxpZCBhZnRlciB3ZSByZWdhaW4g dGhlIGxvY2suIFRoZSBmYWN0IHRoYXQKPiA+Pm1tdV9ub3RpZmVyLT5yZWxlYXNlKCkgY291bGQg YmUgY2FsbGVkIHR3aWNlICh2aWEgX19tbXVfbm90aWZpZXJfcmVsZWFzZQo+ID4+YW5kIG1tdV9u b3RpZmllcl91bnJlZ3Npc3RlcikgZW5oYW5jZXMgdGhlIHBvc3NpYmlsaXR5IG9mIGhpdHRpbmcK PiA+PnRoaXMgcmFjZSB3aGVyZSB0aGVyZSBhcmUgdHdvIHRocmVhZHMgdHJ5aW5nIHRvIHVubWFw IHRoZSBlbnRpcmUgZ3Vlc3QKPiA+PnNoYWRvdyBwYWdlcy4KPiA+Pgo+ID4+V2hpbGUgYXQgaXQs IGNsZWFudXAgdGhlIHJlZHVkYW50IGNoZWNrcyBhcm91bmQgY29uZF9yZXNjaGVkX2xvY2sgaW4K PiA+PnN0YWdlMl93cF9yYW5nZSgpLCBhcyBjb25kX3Jlc2NoZWRfbG9jayBhbHJlYWR5IGRvZXMg dGhlIHNhbWUgY2hlY2tzLgo+ID4+Cj4gPj5DYzogTWFyayBSdXRsYW5kIDxtYXJrLnJ1dGxhbmRA YXJtLmNvbT4KPiA+PkNjOiBSYWRpbSBLcsSNbcOhxZkgPHJrcmNtYXJAcmVkaGF0LmNvbT4KPiA+ PkNjOiBhbmRyZXlrbnZsQGdvb2dsZS5jb20KPiA+PkNjOiBDaHJpc3RvZmZlciBEYWxsIDxjaHJp c3RvZmZlci5kYWxsQGxpbmFyby5vcmc+Cj4gPj5DYzogTWFyYyBaeW5naWVyIDxtYXJjLnp5bmdp ZXJAYXJtLmNvbT4KPiA+PkNjOiBQYW9sbyBCb256aW5pIDxwYm9uemluaUByZWRoYXQuY29tPgo+ ID4+U2lnbmVkLW9mZi1ieTogU3V6dWtpIEsgUG91bG9zZSA8c3V6dWtpLnBvdWxvc2VAYXJtLmNv bT4KPiA+Pi0tLQo+ID4+IGFyY2gvYXJtL2t2bS9tbXUuYyB8IDE3ICsrKysrKysrKysrKy0tLS0t Cj4gPj4gMSBmaWxlIGNoYW5nZWQsIDEyIGluc2VydGlvbnMoKyksIDUgZGVsZXRpb25zKC0pCj4g Pj4KPiA+PmRpZmYgLS1naXQgYS9hcmNoL2FybS9rdm0vbW11LmMgYi9hcmNoL2FybS9rdm0vbW11 LmMKPiA+PmluZGV4IDkwOWExYTcuLjViM2UwZGIgMTAwNjQ0Cj4gPj4tLS0gYS9hcmNoL2FybS9r dm0vbW11LmMKPiA+PisrKyBiL2FyY2gvYXJtL2t2bS9tbXUuYwo+ID4+QEAgLTMwMSw5ICszMDEs MTQgQEAgc3RhdGljIHZvaWQgdW5tYXBfc3RhZ2UyX3JhbmdlKHN0cnVjdCBrdm0gKmt2bSwgcGh5 c19hZGRyX3Qgc3RhcnQsIHU2NCBzaXplKQo+ID4+IAkJLyoKPiA+PiAJCSAqIElmIHRoZSByYW5n ZSBpcyB0b28gbGFyZ2UsIHJlbGVhc2UgdGhlIGt2bS0+bW11X2xvY2sKPiA+PiAJCSAqIHRvIHBy ZXZlbnQgc3RhcnZhdGlvbiBhbmQgbG9ja3VwIGRldGVjdG9yIHdhcm5pbmdzLgo+ID4+KwkJICog TWFrZSBzdXJlIHRoZSBwYWdlIHRhYmxlIGlzIHN0aWxsIGFjdGl2ZSB3aGVuIHdlIHJlZ2Fpbgo+ ID4+KwkJICogdGhlIGxvY2suCj4gPj4gCQkgKi8KPiA+Pi0JCWlmIChuZXh0ICE9IGVuZCkKPiA+ PisJCWlmIChuZXh0ICE9IGVuZCkgewo+ID4+IAkJCWNvbmRfcmVzY2hlZF9sb2NrKCZrdm0tPm1t dV9sb2NrKTsKPiA+PisJCQlpZiAoIVJFQURfT05DRShrdm0tPmFyY2gucGdkKSkKPiA+PisJCQkJ YnJlYWs7Cj4gPj4rCQl9Cj4gPgo+ID5TbyBJIGRvbid0IHRoaW5rIHRoaXMgY2hhbmdlIGlzIHdy b25nLCBidXQgSSB3b25kZXIgaWYgaXQncyBzdWZmaWNpZW50Lgo+ID5Gb3IgZXhhbXBsZSwgSSBj YW4gc2VlIHRoYXQgdGhpcyBmdW5jdGlvbiBpcyBjYWxsZWQgZnJvbQo+ID4KPiA+c3RhZ2UyX3Vu bXNwX3ZtCj4gPiAtPiBzdGFnZTJfdW5tYXBfbWVtc2xvdAo+ID4gICAtPiB1bm1hcF9zdGFnZTJf cmFuZ2UKPiA+Cj4gPmFuZAo+ID4KPiA+a3ZtX2FyY2hfZmx1c2hfc2hhZG93X21lbXNsb3QKPiA+ IC0+IHVubWFwX3N0YWdlMl9yYW5nZQo+ID4KPiA+d2hpY2ggbmV2ZXIgY2hlY2sgaWYgdGhlIHBn ZCBwb2ludGVyIGlzIHZhbGlkLAo+IAo+IFlvdSBhcmUgcmlnaHQuIFRob3NlIHR3byBjYWxsZXJz IGRvIG5vdCBjaGVjayBpdC4gV2UgY291bGQgZml4IGFsbCBvZiB0aGlzIGJ5IHNpbXBseQo+IG1v dmluZyB0aGUgY2hlY2sgdG8gdGhlIGJlZ2lubmluZyBvZiB0aGUgbG9vcC4KPiBpLmUsIHNvbWV0 aGluZyBsaWtlIHRoaXMgOgo+IAo+IEBAIC0yOTUsNiArMjk1LDEyIEBAIHN0YXRpYyB2b2lkIHVu bWFwX3N0YWdlMl9yYW5nZShzdHJ1Y3Qga3ZtICprdm0sIHBoeXNfYWRkcl90IHN0YXJ0LCB1NjQg c2l6ZSkKPiAJYXNzZXJ0X3NwaW5fbG9ja2VkKCZrdm0tPm1tdV9sb2NrKTsKPiAJcGdkID0ga3Zt LT5hcmNoLnBnZCArIHN0YWdlMl9wZ2RfaW5kZXgoYWRkcik7Cj4gCWRvIHsKPiArCQkvKgo+ICsJ CSAqIE1ha2Ugc3VyZSB0aGUgcGFnZSB0YWJsZSBpcyBzdGlsbCBhY3RpdmUsIGFzIHdlIGNvdWxk Cj4gKwkJICogYW5vdGhlciB0aHJlYWQgY291bGQgaGF2ZSBwb3NzaWJseSBmcmVlZCB0aGUgcGFn ZSB0YWJsZS4KPiArCQkgKi8KPiArCQlpZiAoIVJFQURfT05DRShrdm0tPmFyY2gucGdkKSkKPiAr CQkJYnJlYWs7Cj4gCQluZXh0ID0gc3RhZ2UyX3BnZF9hZGRyX2VuZChhZGRyLCBlbmQpOwo+IAkJ aWYgKCFzdGFnZTJfcGdkX25vbmUoKnBnZCkpCj4gCQkJdW5tYXBfc3RhZ2UyX3B1ZHMoa3ZtLCBw Z2QsIGFkZHIsIG5leHQpOwo+IAo+IAo+IAo+IAo+ID5hbmQgZmluYWxseSwga3ZtX2ZyZWVfc3Rh Z2UyX3BnZCBhbHNvIGNoZWNrcyB0aGUgcGdkIHBvaW50ZXIgb3V0c2lkZSBvZiBob2xkaW5nIHRo ZQo+ID5rdm0tPm1tdV9sb2NrIHNvIHdoeSBpcyB0aGlzIG5vdCByYWN5Pwo+IAo+IFRoaXMgaGFz IGJlZW4gZml4ZWQgYnkgcGF0Y2ggMSBpbiB0aGUgc2VyaWVzLiBTbyBzaG91bGQgYmUgZmluZS4K PiAKPiAKPiBJIGNhbiByZXNwaW4gdGhlIHBhdGNoIHdpdGggdGhlIGNoYW5nZXMgaWYgeW91IGFy ZSBPSyB3aXRoIGl0Lgo+IApZZXMsIGFic29sdXRlbHkuICBJJ3ZlIGFscmVhZHkgYXBwbGllZCBw YXRjaCAxIHNvIG5vIG5lZWQgdG8gaW5jbHVkZQp0aGF0IGluIHlvdXIgcmVzcGluLgoKVGhhbmtz IQoKLUNocmlzdG9mZmVyCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fCmt2bWFybSBtYWlsaW5nIGxpc3QKa3ZtYXJtQGxpc3RzLmNzLmNvbHVtYmlhLmVkdQpo dHRwczovL2xpc3RzLmNzLmNvbHVtYmlhLmVkdS9tYWlsbWFuL2xpc3RpbmZvL2t2bWFybQo= From mboxrd@z Thu Jan 1 00:00:00 1970 From: cdall@linaro.org (Christoffer Dall) Date: Mon, 15 May 2017 19:43:37 +0200 Subject: [PATCH 2/2] kvm: arm/arm64: Fix use after free of stage2 page table In-Reply-To: References: <1493821072-29713-1-git-send-email-suzuki.poulose@arm.com> <1493821072-29713-3-git-send-email-suzuki.poulose@arm.com> <20170515100017.GJ9309@cbox> Message-ID: <20170515174337.GC18755@cbox> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Mon, May 15, 2017 at 02:36:58PM +0100, Suzuki K Poulose wrote: > On 15/05/17 11:00, Christoffer Dall wrote: > >Hi Suzuki, > > > >On Wed, May 03, 2017 at 03:17:52PM +0100, Suzuki K Poulose wrote: > >>We yield the kvm->mmu_lock occassionaly while performing an operation > >>(e.g, unmap or permission changes) on a large area of stage2 mappings. > >>However this could possibly cause another thread to clear and free up > >>the stage2 page tables while we were waiting for regaining the lock and > >>thus the original thread could end up in accessing memory that was > >>freed. This patch fixes the problem by making sure that the stage2 > >>pagetable is still valid after we regain the lock. The fact that > >>mmu_notifer->release() could be called twice (via __mmu_notifier_release > >>and mmu_notifier_unregsister) enhances the possibility of hitting > >>this race where there are two threads trying to unmap the entire guest > >>shadow pages. > >> > >>While at it, cleanup the redudant checks around cond_resched_lock in > >>stage2_wp_range(), as cond_resched_lock already does the same checks. > >> > >>Cc: Mark Rutland > >>Cc: Radim Kr?m?? > >>Cc: andreyknvl at google.com > >>Cc: Christoffer Dall > >>Cc: Marc Zyngier > >>Cc: Paolo Bonzini > >>Signed-off-by: Suzuki K Poulose > >>--- > >> arch/arm/kvm/mmu.c | 17 ++++++++++++----- > >> 1 file changed, 12 insertions(+), 5 deletions(-) > >> > >>diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > >>index 909a1a7..5b3e0db 100644 > >>--- a/arch/arm/kvm/mmu.c > >>+++ b/arch/arm/kvm/mmu.c > >>@@ -301,9 +301,14 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) > >> /* > >> * If the range is too large, release the kvm->mmu_lock > >> * to prevent starvation and lockup detector warnings. > >>+ * Make sure the page table is still active when we regain > >>+ * the lock. > >> */ > >>- if (next != end) > >>+ if (next != end) { > >> cond_resched_lock(&kvm->mmu_lock); > >>+ if (!READ_ONCE(kvm->arch.pgd)) > >>+ break; > >>+ } > > > >So I don't think this change is wrong, but I wonder if it's sufficient. > >For example, I can see that this function is called from > > > >stage2_unmsp_vm > > -> stage2_unmap_memslot > > -> unmap_stage2_range > > > >and > > > >kvm_arch_flush_shadow_memslot > > -> unmap_stage2_range > > > >which never check if the pgd pointer is valid, > > You are right. Those two callers do not check it. We could fix all of this by simply > moving the check to the beginning of the loop. > i.e, something like this : > > @@ -295,6 +295,12 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) > assert_spin_locked(&kvm->mmu_lock); > pgd = kvm->arch.pgd + stage2_pgd_index(addr); > do { > + /* > + * Make sure the page table is still active, as we could > + * another thread could have possibly freed the page table. > + */ > + if (!READ_ONCE(kvm->arch.pgd)) > + break; > next = stage2_pgd_addr_end(addr, end); > if (!stage2_pgd_none(*pgd)) > unmap_stage2_puds(kvm, pgd, addr, next); > > > > > >and finally, kvm_free_stage2_pgd also checks the pgd pointer outside of holding the > >kvm->mmu_lock so why is this not racy? > > This has been fixed by patch 1 in the series. So should be fine. > > > I can respin the patch with the changes if you are OK with it. > Yes, absolutely. I've already applied patch 1 so no need to include that in your respin. Thanks! -Christoffer From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757897AbdEORnu (ORCPT ); Mon, 15 May 2017 13:43:50 -0400 Received: from mail-qk0-f173.google.com ([209.85.220.173]:34177 "EHLO mail-qk0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753777AbdEORns (ORCPT ); Mon, 15 May 2017 13:43:48 -0400 Date: Mon, 15 May 2017 19:43:37 +0200 From: Christoffer Dall To: Suzuki K Poulose Cc: christoffer.dall@linaro.org, agraf@suse.de, andreyknvl@google.com, marc.zyngier@arm.com, mark.rutland@arm.com, pbonzini@redhat.com, rkrcmar@redhat.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org Subject: Re: [PATCH 2/2] kvm: arm/arm64: Fix use after free of stage2 page table Message-ID: <20170515174337.GC18755@cbox> References: <1493821072-29713-1-git-send-email-suzuki.poulose@arm.com> <1493821072-29713-3-git-send-email-suzuki.poulose@arm.com> <20170515100017.GJ9309@cbox> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 15, 2017 at 02:36:58PM +0100, Suzuki K Poulose wrote: > On 15/05/17 11:00, Christoffer Dall wrote: > >Hi Suzuki, > > > >On Wed, May 03, 2017 at 03:17:52PM +0100, Suzuki K Poulose wrote: > >>We yield the kvm->mmu_lock occassionaly while performing an operation > >>(e.g, unmap or permission changes) on a large area of stage2 mappings. > >>However this could possibly cause another thread to clear and free up > >>the stage2 page tables while we were waiting for regaining the lock and > >>thus the original thread could end up in accessing memory that was > >>freed. This patch fixes the problem by making sure that the stage2 > >>pagetable is still valid after we regain the lock. The fact that > >>mmu_notifer->release() could be called twice (via __mmu_notifier_release > >>and mmu_notifier_unregsister) enhances the possibility of hitting > >>this race where there are two threads trying to unmap the entire guest > >>shadow pages. > >> > >>While at it, cleanup the redudant checks around cond_resched_lock in > >>stage2_wp_range(), as cond_resched_lock already does the same checks. > >> > >>Cc: Mark Rutland > >>Cc: Radim Krčmář > >>Cc: andreyknvl@google.com > >>Cc: Christoffer Dall > >>Cc: Marc Zyngier > >>Cc: Paolo Bonzini > >>Signed-off-by: Suzuki K Poulose > >>--- > >> arch/arm/kvm/mmu.c | 17 ++++++++++++----- > >> 1 file changed, 12 insertions(+), 5 deletions(-) > >> > >>diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > >>index 909a1a7..5b3e0db 100644 > >>--- a/arch/arm/kvm/mmu.c > >>+++ b/arch/arm/kvm/mmu.c > >>@@ -301,9 +301,14 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) > >> /* > >> * If the range is too large, release the kvm->mmu_lock > >> * to prevent starvation and lockup detector warnings. > >>+ * Make sure the page table is still active when we regain > >>+ * the lock. > >> */ > >>- if (next != end) > >>+ if (next != end) { > >> cond_resched_lock(&kvm->mmu_lock); > >>+ if (!READ_ONCE(kvm->arch.pgd)) > >>+ break; > >>+ } > > > >So I don't think this change is wrong, but I wonder if it's sufficient. > >For example, I can see that this function is called from > > > >stage2_unmsp_vm > > -> stage2_unmap_memslot > > -> unmap_stage2_range > > > >and > > > >kvm_arch_flush_shadow_memslot > > -> unmap_stage2_range > > > >which never check if the pgd pointer is valid, > > You are right. Those two callers do not check it. We could fix all of this by simply > moving the check to the beginning of the loop. > i.e, something like this : > > @@ -295,6 +295,12 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) > assert_spin_locked(&kvm->mmu_lock); > pgd = kvm->arch.pgd + stage2_pgd_index(addr); > do { > + /* > + * Make sure the page table is still active, as we could > + * another thread could have possibly freed the page table. > + */ > + if (!READ_ONCE(kvm->arch.pgd)) > + break; > next = stage2_pgd_addr_end(addr, end); > if (!stage2_pgd_none(*pgd)) > unmap_stage2_puds(kvm, pgd, addr, next); > > > > > >and finally, kvm_free_stage2_pgd also checks the pgd pointer outside of holding the > >kvm->mmu_lock so why is this not racy? > > This has been fixed by patch 1 in the series. So should be fine. > > > I can respin the patch with the changes if you are OK with it. > Yes, absolutely. I've already applied patch 1 so no need to include that in your respin. Thanks! -Christoffer