From mboxrd@z Thu Jan 1 00:00:00 1970 From: Suzuki K Poulose Subject: Re: [PATCH 5/8] KVM: arm/arm64: Enforce PTE mappings at stage2 when needed Date: Tue, 2 Apr 2019 10:47:16 +0100 Message-ID: <20190402094716.GA1082@en101> References: <20190328133608.110805-1-marc.zyngier@arm.com> <20190328133608.110805-6-marc.zyngier@arm.com> <496ad70d-eaa5-c46e-ddf0-d07607522eeb@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 7252A4A41A for ; Tue, 2 Apr 2019 05:47:29 -0400 (EDT) Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id U888bFrnBSEK for ; Tue, 2 Apr 2019 05:47:27 -0400 (EDT) Received: from foss.arm.com (foss.arm.com [217.140.101.70]) by mm01.cs.columbia.edu (Postfix) with ESMTP id C36934A34E for ; Tue, 2 Apr 2019 05:47:27 -0400 (EDT) Content-Disposition: inline In-Reply-To: <496ad70d-eaa5-c46e-ddf0-d07607522eeb@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu To: Auger Eric Cc: kvm@vger.kernel.org, Marc Zyngier , YueHaibing , Julien Grall , Zenghui Yu , Paolo Bonzini , kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org List-Id: kvmarm@lists.cs.columbia.edu On Mon, Apr 01, 2019 at 07:10:37PM +0200, Auger Eric wrote: > Hi Suzuki, > > On 3/28/19 2:36 PM, Marc Zyngier wrote: > > From: Suzuki K Poulose > > > > commit 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings") > > made the checks to skip huge mappings, stricter. However it introduced > > a bug where we still use huge mappings, ignoring the flag to > > use PTE mappings, by not reseting the vma_pagesize to PAGE_SIZE. > > > > Also, the checks do not cover the PUD huge pages, that was > > under review during the same period. This patch fixes both > > the issues. > > I face a regression with this patch. My guest gets stuck. I am running > on AMD Seattle. Reverting the patch makes things work again for me. I > run with qemu. In this scenario I don't use hugepages. I use 64kB page > size for both the host and guest. Hi Eric, Thanks for the testing. Does the following patch fix the issue for you ? ---8>--- kvm: arm: Skip transparent huge pages in unaligned memslots We silently create stage2 huge mappings for a memslot with unaligned IPA and user address. Signed-off-by: Suzuki K Poulose --- virt/kvm/arm/mmu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 27c9583..4a22f5b 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1412,7 +1412,9 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) * page accordingly. */ mask = PTRS_PER_PMD - 1; - VM_BUG_ON((gfn & mask) != (pfn & mask)); + /* Skip memslots with unaligned IPA and user address */ + if ((gfn & mask) != (pfn & mask)) + return false; if (pfn & mask) { *ipap &= PMD_MASK; kvm_release_pfn_clean(pfn); -- 2.7.4 Kind regards Suzuki > > Thanks > > Eric > > > > Fixes : 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings") > > Reported-by: Zenghui Yu > > Cc: Zenghui Yu > > Cc: Christoffer Dall > > Signed-off-by: Suzuki K Poulose > > Signed-off-by: Marc Zyngier > > --- > > virt/kvm/arm/mmu.c | 43 +++++++++++++++++++++---------------------- > > 1 file changed, 21 insertions(+), 22 deletions(-) > > > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > > index ffd7acdceac7..bcdf978c0d1d 100644 > > --- a/virt/kvm/arm/mmu.c > > +++ b/virt/kvm/arm/mmu.c > > @@ -1594,8 +1594,9 @@ static void kvm_send_hwpoison_signal(unsigned long address, > > send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current); > > } > > > > -static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, > > - unsigned long hva) > > +static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, > > + unsigned long hva, > > + unsigned long map_size) > > { > > gpa_t gpa_start; > > hva_t uaddr_start, uaddr_end; > > @@ -1610,34 +1611,34 @@ static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, > > > > /* > > * Pages belonging to memslots that don't have the same alignment > > - * within a PMD for userspace and IPA cannot be mapped with stage-2 > > - * PMD entries, because we'll end up mapping the wrong pages. > > + * within a PMD/PUD for userspace and IPA cannot be mapped with stage-2 > > + * PMD/PUD entries, because we'll end up mapping the wrong pages. > > * > > * Consider a layout like the following: > > * > > * memslot->userspace_addr: > > * +-----+--------------------+--------------------+---+ > > - * |abcde|fgh Stage-1 PMD | Stage-1 PMD tv|xyz| > > + * |abcde|fgh Stage-1 block | Stage-1 block tv|xyz| > > * +-----+--------------------+--------------------+---+ > > * > > * memslot->base_gfn << PAGE_SIZE: > > * +---+--------------------+--------------------+-----+ > > - * |abc|def Stage-2 PMD | Stage-2 PMD |tvxyz| > > + * |abc|def Stage-2 block | Stage-2 block |tvxyz| > > * +---+--------------------+--------------------+-----+ > > * > > - * If we create those stage-2 PMDs, we'll end up with this incorrect > > + * If we create those stage-2 blocks, we'll end up with this incorrect > > * mapping: > > * d -> f > > * e -> g > > * f -> h > > */ > > - if ((gpa_start & ~S2_PMD_MASK) != (uaddr_start & ~S2_PMD_MASK)) > > + if ((gpa_start & (map_size - 1)) != (uaddr_start & (map_size - 1))) > > return false; > > > > /* > > * Next, let's make sure we're not trying to map anything not covered > > - * by the memslot. This means we have to prohibit PMD size mappings > > - * for the beginning and end of a non-PMD aligned and non-PMD sized > > + * by the memslot. This means we have to prohibit block size mappings > > + * for the beginning and end of a non-block aligned and non-block sized > > * memory slot (illustrated by the head and tail parts of the > > * userspace view above containing pages 'abcde' and 'xyz', > > * respectively). > > @@ -1646,8 +1647,8 @@ static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, > > * userspace_addr or the base_gfn, as both are equally aligned (per > > * the check above) and equally sized. > > */ > > - return (hva & S2_PMD_MASK) >= uaddr_start && > > - (hva & S2_PMD_MASK) + S2_PMD_SIZE <= uaddr_end; > > + return (hva & ~(map_size - 1)) >= uaddr_start && > > + (hva & ~(map_size - 1)) + map_size <= uaddr_end; > > } > > > > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > @@ -1676,12 +1677,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > return -EFAULT; > > } > > > > - if (!fault_supports_stage2_pmd_mappings(memslot, hva)) > > - force_pte = true; > > - > > - if (logging_active) > > - force_pte = true; > > - > > /* Let's check if we will get back a huge page backed by hugetlbfs */ > > down_read(¤t->mm->mmap_sem); > > vma = find_vma_intersection(current->mm, hva, hva + 1); > > @@ -1692,6 +1687,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > } > > > > vma_pagesize = vma_kernel_pagesize(vma); > > + if (logging_active || > > + !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) { > > + force_pte = true; > > + vma_pagesize = PAGE_SIZE; > > + } > > + > > /* > > * The stage2 has a minimum of 2 level table (For arm64 see > > * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can > > @@ -1699,11 +1700,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > * As for PUD huge maps, we must make sure that we have at least > > * 3 levels, i.e, PMD is not folded. > > */ > > - if ((vma_pagesize == PMD_SIZE || > > - (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) && > > - !force_pte) { > > + if (vma_pagesize == PMD_SIZE || > > + (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) > > gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; > > - } > > up_read(¤t->mm->mmap_sem); > > > > /* We need minimum second+third level pages */ > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.5 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03C3BC4360F for ; Tue, 2 Apr 2019 09:47:39 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C89F520857 for ; Tue, 2 Apr 2019 09:47:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="TFy76Bx6" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C89F520857 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Dge3xX0Tb9EfL9FUMdZ+W1ElbFAl15yl6P95ss3orjA=; b=TFy76Bx621DVuN 880eG6L0HumvTnr5xUxjeV4uiaC5j47fCHT3LAMKGoTcJl912wG8s85RQavQZYBQgYTW/jyOGjOYM Q9JIIA67En7PQJBRKKvdJsUuqcgbCqYwjiV35nTZA40d8ngWjYbvdwlgPY3RqZowaNhvCGTFVyFyh KDdx75wrUY0oFkQmWzrHvkLlJpwVoUKv6OplU7wezLizen6PxXYxNwAUb6kqf5wtffixSjisC6YTa 1Wb1uV8c3VYhfrJMy59bmibD8bFQUA0X5TZI9jQdwy36iJzWpMhcj5934nsKZeVUMYIOsEgiyA4cF f7Roczi3DcUIct+NG+Tw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1hBG0z-00078h-Fj; Tue, 02 Apr 2019 09:47:33 +0000 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70] helo=foss.arm.com) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1hBG0v-000714-4l for linux-arm-kernel@lists.infradead.org; Tue, 02 Apr 2019 09:47:31 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3E90580D; Tue, 2 Apr 2019 02:47:27 -0700 (PDT) Received: from en101 (en101.cambridge.arm.com [10.1.196.93]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6E7833F59C; Tue, 2 Apr 2019 02:47:24 -0700 (PDT) Date: Tue, 2 Apr 2019 10:47:16 +0100 From: Suzuki K Poulose To: Auger Eric Subject: Re: [PATCH 5/8] KVM: arm/arm64: Enforce PTE mappings at stage2 when needed Message-ID: <20190402094716.GA1082@en101> References: <20190328133608.110805-1-marc.zyngier@arm.com> <20190328133608.110805-6-marc.zyngier@arm.com> <496ad70d-eaa5-c46e-ddf0-d07607522eeb@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <496ad70d-eaa5-c46e-ddf0-d07607522eeb@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190402_024729_635302_FE23C9E5 X-CRM114-Status: GOOD ( 32.61 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kvm@vger.kernel.org, Radim =?utf-8?B?S3LEjW3DocWZ?= , Marc Zyngier , Julien Thierry , YueHaibing , Zheng Xiang , Shameerali Kolothum Thodi , Christoffer Dall , Julien Grall , James Morse , Nianyao Tang , Zenghui Yu , Paolo Bonzini , suzuki.poulose@arm.com, kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Apr 01, 2019 at 07:10:37PM +0200, Auger Eric wrote: > Hi Suzuki, > > On 3/28/19 2:36 PM, Marc Zyngier wrote: > > From: Suzuki K Poulose > > > > commit 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings") > > made the checks to skip huge mappings, stricter. However it introduced > > a bug where we still use huge mappings, ignoring the flag to > > use PTE mappings, by not reseting the vma_pagesize to PAGE_SIZE. > > > > Also, the checks do not cover the PUD huge pages, that was > > under review during the same period. This patch fixes both > > the issues. > > I face a regression with this patch. My guest gets stuck. I am running > on AMD Seattle. Reverting the patch makes things work again for me. I > run with qemu. In this scenario I don't use hugepages. I use 64kB page > size for both the host and guest. Hi Eric, Thanks for the testing. Does the following patch fix the issue for you ? ---8>--- kvm: arm: Skip transparent huge pages in unaligned memslots We silently create stage2 huge mappings for a memslot with unaligned IPA and user address. Signed-off-by: Suzuki K Poulose --- virt/kvm/arm/mmu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 27c9583..4a22f5b 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1412,7 +1412,9 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) * page accordingly. */ mask = PTRS_PER_PMD - 1; - VM_BUG_ON((gfn & mask) != (pfn & mask)); + /* Skip memslots with unaligned IPA and user address */ + if ((gfn & mask) != (pfn & mask)) + return false; if (pfn & mask) { *ipap &= PMD_MASK; kvm_release_pfn_clean(pfn); -- 2.7.4 Kind regards Suzuki > > Thanks > > Eric > > > > Fixes : 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings") > > Reported-by: Zenghui Yu > > Cc: Zenghui Yu > > Cc: Christoffer Dall > > Signed-off-by: Suzuki K Poulose > > Signed-off-by: Marc Zyngier > > --- > > virt/kvm/arm/mmu.c | 43 +++++++++++++++++++++---------------------- > > 1 file changed, 21 insertions(+), 22 deletions(-) > > > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > > index ffd7acdceac7..bcdf978c0d1d 100644 > > --- a/virt/kvm/arm/mmu.c > > +++ b/virt/kvm/arm/mmu.c > > @@ -1594,8 +1594,9 @@ static void kvm_send_hwpoison_signal(unsigned long address, > > send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current); > > } > > > > -static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, > > - unsigned long hva) > > +static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, > > + unsigned long hva, > > + unsigned long map_size) > > { > > gpa_t gpa_start; > > hva_t uaddr_start, uaddr_end; > > @@ -1610,34 +1611,34 @@ static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, > > > > /* > > * Pages belonging to memslots that don't have the same alignment > > - * within a PMD for userspace and IPA cannot be mapped with stage-2 > > - * PMD entries, because we'll end up mapping the wrong pages. > > + * within a PMD/PUD for userspace and IPA cannot be mapped with stage-2 > > + * PMD/PUD entries, because we'll end up mapping the wrong pages. > > * > > * Consider a layout like the following: > > * > > * memslot->userspace_addr: > > * +-----+--------------------+--------------------+---+ > > - * |abcde|fgh Stage-1 PMD | Stage-1 PMD tv|xyz| > > + * |abcde|fgh Stage-1 block | Stage-1 block tv|xyz| > > * +-----+--------------------+--------------------+---+ > > * > > * memslot->base_gfn << PAGE_SIZE: > > * +---+--------------------+--------------------+-----+ > > - * |abc|def Stage-2 PMD | Stage-2 PMD |tvxyz| > > + * |abc|def Stage-2 block | Stage-2 block |tvxyz| > > * +---+--------------------+--------------------+-----+ > > * > > - * If we create those stage-2 PMDs, we'll end up with this incorrect > > + * If we create those stage-2 blocks, we'll end up with this incorrect > > * mapping: > > * d -> f > > * e -> g > > * f -> h > > */ > > - if ((gpa_start & ~S2_PMD_MASK) != (uaddr_start & ~S2_PMD_MASK)) > > + if ((gpa_start & (map_size - 1)) != (uaddr_start & (map_size - 1))) > > return false; > > > > /* > > * Next, let's make sure we're not trying to map anything not covered > > - * by the memslot. This means we have to prohibit PMD size mappings > > - * for the beginning and end of a non-PMD aligned and non-PMD sized > > + * by the memslot. This means we have to prohibit block size mappings > > + * for the beginning and end of a non-block aligned and non-block sized > > * memory slot (illustrated by the head and tail parts of the > > * userspace view above containing pages 'abcde' and 'xyz', > > * respectively). > > @@ -1646,8 +1647,8 @@ static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, > > * userspace_addr or the base_gfn, as both are equally aligned (per > > * the check above) and equally sized. > > */ > > - return (hva & S2_PMD_MASK) >= uaddr_start && > > - (hva & S2_PMD_MASK) + S2_PMD_SIZE <= uaddr_end; > > + return (hva & ~(map_size - 1)) >= uaddr_start && > > + (hva & ~(map_size - 1)) + map_size <= uaddr_end; > > } > > > > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > @@ -1676,12 +1677,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > return -EFAULT; > > } > > > > - if (!fault_supports_stage2_pmd_mappings(memslot, hva)) > > - force_pte = true; > > - > > - if (logging_active) > > - force_pte = true; > > - > > /* Let's check if we will get back a huge page backed by hugetlbfs */ > > down_read(¤t->mm->mmap_sem); > > vma = find_vma_intersection(current->mm, hva, hva + 1); > > @@ -1692,6 +1687,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > } > > > > vma_pagesize = vma_kernel_pagesize(vma); > > + if (logging_active || > > + !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) { > > + force_pte = true; > > + vma_pagesize = PAGE_SIZE; > > + } > > + > > /* > > * The stage2 has a minimum of 2 level table (For arm64 see > > * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can > > @@ -1699,11 +1700,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > * As for PUD huge maps, we must make sure that we have at least > > * 3 levels, i.e, PMD is not folded. > > */ > > - if ((vma_pagesize == PMD_SIZE || > > - (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) && > > - !force_pte) { > > + if (vma_pagesize == PMD_SIZE || > > + (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) > > gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; > > - } > > up_read(¤t->mm->mmap_sem); > > > > /* We need minimum second+third level pages */ > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel