From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A20E22D7DDD for ; Wed, 27 May 2026 15:53:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779897199; cv=none; b=ffv7Z+BpOeqzPg7MeXX8eJ3b1PsUL33oUkAsBJbiZR3tvJPTs4cVe6lTHaUaQXEBpI3Fh0nyFg/qaYZium1Y7ovF14L42RH/WvyUBJRz1foviqEqd1a9d2nTB4XqgglwJJYr99BidmprRRorLdKx5Zuw0kz8wGT/hSFuD5d1SLA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779897199; c=relaxed/simple; bh=SBwA8wu8cVufrQF9HEWdy1+f6XdSGIKGUQvKAto+z18=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=o42KxiVCjXMvXXy0Q+slKhmPX3yxmOr5TQ3jqqI1tvTFx00acWsftwpfRe+YZHZn7AyWJHB0scX45BTU2bnRRTuSdneyol3fOqZPXSKYu7VtO2kJD97ZAjeC6MGL9KJbnaE1JhZWStmJgRc6oHjmJgRyPv/MaHN6cSCBwu1y2/0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=G9HbQqFk; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="G9HbQqFk" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3177A1F00A3D; Wed, 27 May 2026 15:53:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779897198; bh=acBXCFXCO0QcppYUjrSg5CthjV5fygRW8wTTyAY/7iE=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=G9HbQqFkbMDy9esmTnzQen+IsqE10YcPwdZPM/k37brj//37kMvDUcdqRZTWu09P7 TIm58yvF+BGnMQpev7utwqm+4WZJS8z5UyNuAfohtlCMymAYbWBkDS8zXsfonsqbGO GHMhKtARZqlxTW5H3ArMeE/DtSaoJPyBuJKa6+F+0E/0V58wvPfo8IUYhDhJua3PxP O4FPqMmFoB604MOOeKmAb6kVXrUvQJceozY56kggYmtOMZ74I3lAhK3L3UMLpGZnuJ d4DZzF8Dc8SRubj3ckrsnTecUNoPaQKki7NT7aRTFCYSnPSpP6lDK1J3sSe5OxRSkc Fnkca5bZ3feMA== Date: Wed, 27 May 2026 17:53:06 +0200 From: "Oscar Salvador (SUSE)" To: Karsten Desler Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" Subject: Re: [REGRESSION] x86/hugetlb: AMD F15h VA alignment offset breaks MAP_HUGETLB alignment Message-ID: References: <20260527143643.GO31091@soohrt.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260527143643.GO31091@soohrt.org> On Wed, May 27, 2026 at 04:36:43PM +0200, Karsten Desler wrote: > Hi, > > I found a reproducible hugetlb regression on an AMD Family 15h system. > > On some boots, mmap(MAP_HUGETLB) returns a virtual address that is not aligned > to the hugepage size. The mapping is nevertheless installed as a hugetlb VMA. > When the process exits, the kernel later BUGs in __unmap_hugepage_range(). > > 6.18.33 x86_64, AMD opteron 6238, 2M hugepages Thanks Kartsten for reporting this. Ooops, that would be me. > Example bad mapping captured from /proc/$pid/maps: > > 7fc67f604000-7fc67f804000 rw-p 00000000 00:0f 12340 /anon_hugepage (deleted) > > The address has offset 0x4000 within a 2 MiB hugepage. > > smaps confirms it is really hugetlb: > > KernelPageSize: 2048 kB > MMUPageSize: 2048 kB > Private_Hugetlb: 2048 kB > VmFlags: rd wr mr mw me de ht > > Minimal reproducer: > > echo 1000 > /proc/sys/vm/nr_hugepages > > mmap(NULL, 1229824, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANONYMOUS|MAP_POPULATE|MAP_HUGETLB, -1, 0) > > On bad boots this returns e.g.: > > mmap returned 0x7fc67f604000 aligned=no offset=16384 > > and exiting the process triggers: > > Kernel BUG at __unmap_hugepage_range+0x5ef/0x640 > RIP: __unmap_hugepage_range+0x5ef/0x640 > Fixing recursive fault but reboot is needed! > > The following is AI work, sorry if that's total BS but at the very least, > I can reproduce the kernelBUG and booting with > align_va_addr=off > works around the issue. > > This is boot-dependent. Some boots work, some fail. The reason appears > to be the per-boot AMD F15h VA alignment offset. I have to confess that I completely overlooked that scenario, so let me apologyze. > The old x86 hugetlb path in arch/x86/mm/hugetlbpage.c only set: > > info.align_mask = PAGE_MASK & ~huge_page_mask(h); > > It did not add the AMD F15h align offset. > > After the v6.13-rc1 hugetlb mmap rework, hugetlb mappings go through > arch_get_unmapped_area*(), and x86 currently does: > > if (filp) { > info.align_mask = get_align_mask(filp); > info.align_offset += get_align_bits(); > } Ok, I see. > > For hugetlb, get_align_mask(filp) correctly returns the hugepage alignment > mask, but get_align_bits() can still return the AMD F15h per-boot offset, > e.g. 0x4000. That produces a non-hugepage-aligned hugetlb VMA. > > Likely introduced by the v6.13-rc1 series: > > 1317a5e7f7b1 arch/x86: teach arch_get_unmapped_area_vmflags to handle hugetlb mappings > 7bd3f1e1a9ae mm: make hugetlb mappings go through mm_get_unmapped_area_vmflags > cc92882ee218 mm: drop hugetlb_get_unmapped_area{_*} functions Yes, that was part of a refactoring I did some time ago. I will fix it up later today/early tomorrow. Would you be available for a quick test once I have the patch? -- Oscar Salvador SUSE Labs