From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE88F35DA5D for ; Wed, 3 Jun 2026 16:32:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780504371; cv=none; b=tbDAq4N1/eBL9zjOSimjzzlppDDuyR2N43rqois89Y2+kVwEW+3zDoekbhhPJBRyi2cIFFLeoegLzOyyQGfm4ZfkoI1Y4NK1z/G5/r3ijlGuY93nQnUQEd2oHpfEo2wjulcMYLKWHyozX0pR/0f/CfwCXjD3qbBuFLLx6bWte+M= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780504371; c=relaxed/simple; bh=f9nKkS4BLQkxr7pM86UjiUScruhOghvv0SIkJ2GuIJ0=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=ZWx6hiPCfXLA3DdT7v6zEIIHoFoAuWO4BytCdc25mmZk1FoN2/Wq99WgcsfIsX+6iIg7Ao7J9GgVaXUs2o44N0q6xtQYgAxDmMZUjOBW6xO7leGJkSTuMz1h2TKf/RNOX4SveVMFGcOTl8Dho7j24/sBtiz3e625e637ImQP/fc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QzexMSaT; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QzexMSaT" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 744341F00893; Wed, 3 Jun 2026 16:32:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780504369; bh=ENPWNrwIfUzSApQaRRH02eHrg+sQXf7DNoBxsOvmCCg=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=QzexMSaT52ABcr570CDR8X7s9XtuET1NOWJid0eyXQQQyWC3s+6n3yTS+lo0Cwx1K dPMeud8EHmLnPf5vocODGmWUVQT9+cNKBfVGpANjZ09qqFqrn400ZnKMCVFMiL6n37 DyHGXDfjSpmLQOCy/emt/SU8opZ506XTwf5fghmYzu0HptDRWACB8sISD3Gxqx8t4G j3ibI2hLfSc4APiVfDMQaJGg7tY9fgOYW73UP8pbwLwvvvyVq1pZQGQtUGAOS1gOK7 nokE8TAsvStjkuem3nPBgM2H17BkwhUn8fRlMISSDvk/eaniROblqZP4rf+0lhHtou v0eE6NQ0P1uKw== From: sashiko-bot@kernel.org Subject: Re: [PATCH v1 3/3] KVM: s390: Allow for 2G hugepages Reply-To: sashiko-reviews@lists.linux.dev To: "Claudio Imbrenda" Cc: kvm@vger.kernel.org In-Reply-To: <20260603154758.234768-4-imbrenda@linux.ibm.com> References: <20260603154758.234768-4-imbrenda@linux.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Wed, 03 Jun 2026 16:32:48 +0000 Message-Id: <20260603163249.744341F00893@smtp.kernel.org> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 2 potential issue(= s) to consider: New issues: - [Critical] Enabling 2G hugepages exposes a hardcoded `_SEGMENT_MASK` in `= _crste_fc1`, injecting invalid entries into hardware page tables. Pre-existing issues: - [High] Concurrent page faults can insert 1M or 2G huge pages into a PV gu= est gmap while `KVM_PV_ENABLE` runs, because `gmap_split_huge_pages` and `g= map_link` both acquire `mmu_lock` in read mode without sufficient serializa= tion. -- commit 8d8980fda5d60990dd59ce7e857350fb4c8cedbd Author: Claudio Imbrenda KVM: s390: Allow for 2G hugepages Change gmap_2g_allowed() to perform the necessary checks to allow for 2G hugepages to be used, instead of unconditionally returning false. The GMAP_FLAG_ALLOW_HPAGE_2G flag is now taken into account when evaluating whether a 2G hugepage can back the faulting address. > diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c > index c9e348149ba12..c5a8132eaa22b 100644 > --- a/arch/s390/kvm/gmap.c > +++ b/arch/s390/kvm/gmap.c [ ... ] > @@ -635,10 +635,27 @@ int gmap_try_fixup_minor(struct gmap *gmap, struct = guest_fault *fault) > return rc; > } > =20 > +/** > + * gmap_2g_allowed() - Check whether a 2G hugepage is allowed. > + * @gmap: The gmap of the guest. > + * @f: Describes the fault that is being resolved. > + * @slot: The memslot the faulting address belongs to. > + * > + * The function checks whether the GMAP_FLAG_ALLOW_HPAGE_2G flag is set = for > + * @gmap, whether the offset of the address in the 2G virtual frame is t= he > + * same as the offset in the physical 2G frame, and finally whether the = whole > + * 2G page would fit in the given memslot. > + * > + * Return: true if a 2G hugepage is allowed to back the faulting address= , false > + * otherwise. > + */ > static inline bool gmap_2g_allowed(struct gmap *gmap, struct guest_fault= *f, > struct kvm_memory_slot *slot) > { > - return false; > + return test_bit(GMAP_FLAG_ALLOW_HPAGE_2G, &gmap->flags) && [Severity: Critical] Enabling 2G hugepages here exposes a hardcoded _SEGMENT_MASK in _crste_fc1(= ). When 2G hugepages proceed to _gmap_link(), it calls _crste_fc1() with an unaligned f->pfn and a Region-3 table type. arch/s390/kvm/dat.h:_crste_fc1() { union crste res =3D { .val =3D PFN_PHYS(pfn) & _SEGMENT_MASK }; ... } The constructor hardcodes _SEGMENT_MASK, which fails to zero out bits 20-30 for Region-3 hugepages. For a Region-3 large page entry, the hardware requi= res these bits to be strictly zero. Writing the unaligned 2MB-to-2GB offset into these reserved bits causes the hardware MMU to trigger a Translation Specification Exception when it walks the page table. Should _crste_fc1() be updated to use _REGION3_MASK for Region-3 entries to prevent guest crashes? [Severity: High] This isn't a bug introduced by this patch, but adding 2G hugepages to this path exacerbates a preexisting race with PV initialization. Concurrent page faults can insert huge pages into a PV guest gmap while kvm_s390_pv_init_vm() runs. The initialization code clears the flags and ca= lls gmap_split_huge_pages(): arch/s390/kvm/pv.c:kvm_s390_pv_init_vm() { ... clear_bit(GMAP_FLAG_ALLOW_HPAGE_1M, &kvm->arch.gmap->flags); clear_bit(GMAP_FLAG_ALLOW_HPAGE_2G, &kvm->arch.gmap->flags); gmap_split_huge_pages(kvm->arch.gmap); ... } However, gmap_split_huge_pages() only takes a read lock on kvm->mmu_lock. If a vCPU thread evaluates test_bit() in gmap_2g_allowed() to true before the flag is cleared, and is then preempted, the splitting function can walk past the faulting GFN. When the vCPU thread resumes, it can atomically insert the 2G huge page into the gmap via _gmap_link(), as that also only holds mmu_lock in read mode. Does this race allow a PV guest to map a hugepage and break memory isolation invariants? > + !((f->gfn ^ f->pfn) & ~_REGION3_FR_MASK) && > + slot->base_gfn <=3D ALIGN_DOWN(f->gfn, _PAGES_PER_REGION3) && > + slot->base_gfn + slot->npages >=3D ALIGN(f->gfn + 1, _PAGES_PER_= REGION3); > } --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260603154758.2347= 68-1-imbrenda@linux.ibm.com?part=3D3