From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69C85C33CB3 for ; Wed, 8 Jan 2020 20:27:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2818220705 for ; Wed, 8 Jan 2020 20:27:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2818220705 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C40108E000E; Wed, 8 Jan 2020 15:27:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8BDD18E000B; Wed, 8 Jan 2020 15:27:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2CBF88E000E; Wed, 8 Jan 2020 15:27:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0190.hostedemail.com [216.40.44.190]) by kanga.kvack.org (Postfix) with ESMTP id E01728E000D for ; Wed, 8 Jan 2020 15:27:10 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 8A54C8249980 for ; Wed, 8 Jan 2020 20:27:10 +0000 (UTC) X-FDA: 76355601420.03.pie57_2179db8d37606 X-HE-Tag: pie57_2179db8d37606 X-Filterd-Recvd-Size: 11619 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Wed, 8 Jan 2020 20:27:09 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Jan 2020 12:27:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,411,1571727600"; d="scan'208";a="211658383" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by orsmga007.jf.intel.com with ESMTP; 08 Jan 2020 12:27:06 -0800 From: Sean Christopherson To: Paolo Bonzini Cc: Paul Mackerras , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Andrew Morton , Marc Zyngier , James Morse , Julien Thierry , Suzuki K Poulose , kvm-ppc@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, syzbot+c9d1fb51ac9d0d10c39d@syzkaller.appspotmail.com, Andrea Arcangeli , Dan Williams , Barret Rhoden , David Hildenbrand , Jason Zeng , Dave Jiang , Liran Alon , linux-nvdimm Subject: [PATCH 09/14] KVM: x86/mmu: Rely on host page tables to find HugeTLB mappings Date: Wed, 8 Jan 2020 12:24:43 -0800 Message-Id: <20200108202448.9669-10-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200108202448.9669-1-sean.j.christopherson@intel.com> References: <20200108202448.9669-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Remove KVM's HugeTLB specific logic and instead rely on walking the host page tables (already done for THP) to identify HugeTLB mappings. Eliminating the HugeTLB-only logic avoids taking mmap_sem and calling find_vma() for all hugepage compatible page faults, and simplifies KVM's page fault code by consolidating all hugepage adjustments into a common helper. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 84 ++++++++++------------------------ arch/x86/kvm/mmu/paging_tmpl.h | 15 +++--- 2 files changed, 29 insertions(+), 70 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 7d78d1d996ed..68aec984f953 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1286,23 +1286,6 @@ static bool mmu_gfn_lpage_is_disallowed(struct kvm= _vcpu *vcpu, gfn_t gfn, return __mmu_gfn_lpage_is_disallowed(gfn, level, slot); } =20 -static int host_mapping_level(struct kvm_vcpu *vcpu, gfn_t gfn) -{ - unsigned long page_size; - int i, ret =3D 0; - - page_size =3D kvm_host_page_size(vcpu, gfn); - - for (i =3D PT_PAGE_TABLE_LEVEL; i <=3D PT_MAX_HUGEPAGE_LEVEL; ++i) { - if (page_size >=3D KVM_HPAGE_SIZE(i)) - ret =3D i; - else - break; - } - - return ret; -} - static inline bool memslot_valid_for_gpte(struct kvm_memory_slot *slot, bool no_dirty_log) { @@ -1327,43 +1310,25 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu= , gfn_t gfn, return slot; } =20 -static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn, - int *max_levelp) +static int max_mapping_level(struct kvm_vcpu *vcpu, gfn_t gfn, + int max_level) { - int host_level, max_level =3D *max_levelp; struct kvm_memory_slot *slot; =20 if (unlikely(max_level =3D=3D PT_PAGE_TABLE_LEVEL)) return PT_PAGE_TABLE_LEVEL; =20 - slot =3D kvm_vcpu_gfn_to_memslot(vcpu, large_gfn); - if (!memslot_valid_for_gpte(slot, true)) { - *max_levelp =3D PT_PAGE_TABLE_LEVEL; + slot =3D kvm_vcpu_gfn_to_memslot(vcpu, gfn); + if (!memslot_valid_for_gpte(slot, true)) return PT_PAGE_TABLE_LEVEL; - } =20 max_level =3D min(max_level, kvm_x86_ops->get_lpage_level()); for ( ; max_level > PT_PAGE_TABLE_LEVEL; max_level--) { - if (!__mmu_gfn_lpage_is_disallowed(large_gfn, max_level, slot)) + if (!__mmu_gfn_lpage_is_disallowed(gfn, max_level, slot)) break; } =20 - *max_levelp =3D max_level; - - if (max_level =3D=3D PT_PAGE_TABLE_LEVEL) - return PT_PAGE_TABLE_LEVEL; - - /* - * Note, host_mapping_level() does *not* handle transparent huge pages. - * As suggested by "mapping", it reflects the page size established by - * the associated vma, if there is one, i.e. host_mapping_level() will - * return a huge page level if and only if a vma exists and the backing - * implementation for the vma uses huge pages, e.g. hugetlbfs and dax. - * So, do not propagate host_mapping_level() to max_level as KVM can - * still promote the guest mapping to a huge page in the THP case. - */ - host_level =3D host_mapping_level(vcpu, large_gfn); - return min(host_level, max_level); + return max_level; } =20 /* @@ -3137,7 +3102,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *spt= ep, =20 /* * Other vcpu creates new sp in the window between - * mapping_level() and acquiring mmu-lock. We can + * max_mapping_level() and acquiring mmu-lock. We can * allow guest to retry the access, the mapping can * be fixed if guest refault. */ @@ -3364,24 +3329,23 @@ static int host_pfn_mapping_level(struct kvm_vcpu= *vcpu, gfn_t gfn, return level; } =20 -static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn= , - int max_level, kvm_pfn_t *pfnp, - int *levelp) +static int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn, + int max_level, kvm_pfn_t *pfnp) { kvm_pfn_t pfn =3D *pfnp; - int level =3D *levelp; kvm_pfn_t mask; + int level; =20 - if (max_level =3D=3D PT_PAGE_TABLE_LEVEL || level > PT_PAGE_TABLE_LEVEL= ) - return; + if (max_level =3D=3D PT_PAGE_TABLE_LEVEL) + return PT_PAGE_TABLE_LEVEL; =20 if (is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn) || kvm_is_zone_device_pfn(pfn)) - return; + return PT_PAGE_TABLE_LEVEL; =20 level =3D host_pfn_mapping_level(vcpu, gfn, pfn); if (level =3D=3D PT_PAGE_TABLE_LEVEL) - return; + return level; =20 level =3D min(level, max_level); =20 @@ -3389,10 +3353,11 @@ static void transparent_hugepage_adjust(struct kv= m_vcpu *vcpu, gfn_t gfn, * mmu_notifier_retry() was successful and mmu_lock is held, so * the pmd can't be split from under us. */ - *levelp =3D level; mask =3D KVM_PAGES_PER_HPAGE(level) - 1; VM_BUG_ON((gfn & mask) !=3D (pfn & mask)); *pfnp =3D pfn & ~mask; + + return level; } =20 static void disallowed_hugepage_adjust(struct kvm_shadow_walk_iterator i= t, @@ -3419,20 +3384,19 @@ static void disallowed_hugepage_adjust(struct kvm= _shadow_walk_iterator it, } =20 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write, - int map_writable, int level, int max_level, - kvm_pfn_t pfn, bool prefault, - bool account_disallowed_nx_lpage) + int map_writable, int max_level, kvm_pfn_t pfn, + bool prefault, bool account_disallowed_nx_lpage) { struct kvm_shadow_walk_iterator it; struct kvm_mmu_page *sp; - int ret; + int level, ret; gfn_t gfn =3D gpa >> PAGE_SHIFT; gfn_t base_gfn =3D gfn; =20 if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa))) return RET_PF_RETRY; =20 - transparent_hugepage_adjust(vcpu, gfn, max_level, &pfn, &level); + level =3D kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn); =20 trace_kvm_mmu_spte_requested(gpa, level, pfn); for_each_shadow_entry(vcpu, gpa, it) { @@ -4206,7 +4170,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu,= gpa_t gpa, u32 error_code, gfn_t gfn =3D gpa >> PAGE_SHIFT; unsigned long mmu_seq; kvm_pfn_t pfn; - int level, r; + int r; =20 if (page_fault_handle_page_track(vcpu, error_code, gfn)) return RET_PF_EMULATE; @@ -4218,9 +4182,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu,= gpa_t gpa, u32 error_code, if (lpage_disallowed) max_level =3D PT_PAGE_TABLE_LEVEL; =20 - level =3D mapping_level(vcpu, gfn, &max_level); - if (level > PT_PAGE_TABLE_LEVEL) - gfn &=3D ~(KVM_PAGES_PER_HPAGE(level) - 1); + max_level =3D max_mapping_level(vcpu, gfn, max_level); =20 if (fast_page_fault(vcpu, gpa, error_code)) return RET_PF_RETRY; @@ -4240,7 +4202,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu,= gpa_t gpa, u32 error_code, goto out_unlock; if (make_mmu_pages_available(vcpu) < 0) goto out_unlock; - r =3D __direct_map(vcpu, gpa, write, map_writable, level, max_level, pf= n, + r =3D __direct_map(vcpu, gpa, write, map_writable, max_level, pfn, prefault, is_tdp && lpage_disallowed); =20 out_unlock: diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmp= l.h index 0029f7870865..841506a55815 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -613,14 +613,14 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vc= pu, struct guest_walker *gw, */ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr, struct guest_walker *gw, - int write_fault, int hlevel, int max_level, + int write_fault, int max_level, kvm_pfn_t pfn, bool map_writable, bool prefault, bool lpage_disallowed) { struct kvm_mmu_page *sp =3D NULL; struct kvm_shadow_walk_iterator it; unsigned direct_access, access =3D gw->pt_access; - int top_level, ret; + int top_level, hlevel, ret; gfn_t gfn, base_gfn; =20 direct_access =3D gw->pte_access; @@ -673,7 +673,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t = addr, gfn =3D gw->gfn | ((addr & PT_LVL_OFFSET_MASK(gw->level)) >> PAGE_SHIFT= ); base_gfn =3D gfn; =20 - transparent_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn, &hlevel); + hlevel =3D kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn); =20 trace_kvm_mmu_spte_requested(addr, gw->level, pfn); =20 @@ -775,7 +775,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, g= pa_t addr, u32 error_code, struct guest_walker walker; int r; kvm_pfn_t pfn; - int level; unsigned long mmu_seq; bool map_writable, is_self_change_mapping; bool lpage_disallowed =3D (error_code & PFERR_FETCH_MASK) && @@ -825,9 +824,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, g= pa_t addr, u32 error_code, else max_level =3D walker.level; =20 - level =3D mapping_level(vcpu, walker.gfn, &max_level); - if (level > PT_PAGE_TABLE_LEVEL) - walker.gfn =3D walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1); + max_level =3D max_mapping_level(vcpu, walker.gfn, max_level); =20 mmu_seq =3D vcpu->kvm->mmu_notifier_seq; smp_rmb(); @@ -867,8 +864,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, g= pa_t addr, u32 error_code, kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT); if (make_mmu_pages_available(vcpu) < 0) goto out_unlock; - r =3D FNAME(fetch)(vcpu, addr, &walker, write_fault, level, max_level, - pfn, map_writable, prefault, lpage_disallowed); + r =3D FNAME(fetch)(vcpu, addr, &walker, write_fault, max_level, pfn, + map_writable, prefault, lpage_disallowed); kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT); =20 out_unlock: --=20 2.24.1