From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 312463FB072 for ; Wed, 29 Apr 2026 12:52:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777467168; cv=none; b=pu+GxGTxdAFaRtXcJzqnrr+GDzZUX5/OKO7rFXWzw7g8ToBz2tV0uApMVhrUxJWfpHnvN8Q50syd9JWdZohEgvwWxATc+u9tdW2DdldSF9PWbMWSox8viVmyp23l2draiKHXbgJz+kxm5pbkJr07AG5HbpOAJV2yidbk78nBW0I= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777467168; c=relaxed/simple; bh=8GZ44RaWVMjBVWMLfr5FWMRloiqwqw++2SghVPE/e1Y=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=s+6/rpx5MDzc/QEAD9/aJTjrxmKjgJYYi0XCIFnSl9XgkHh8K/gwW50li+/rFM6J1Yx7ta9IhS+sPTZ5GFRySgRB88AS9AAcqqiZSoADFoX7SFcHX6ObBEmSx15lAM2Il0G3EYKxU4OWOyTafZYDdxnM0gByXodLqZZ7NrvTs60= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DTaZs96p; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DTaZs96p" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DEDAAC2BCC4; Wed, 29 Apr 2026 12:52:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777467167; bh=8GZ44RaWVMjBVWMLfr5FWMRloiqwqw++2SghVPE/e1Y=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=DTaZs96pq3ZZwmwl5KPqJmwsEvar+yOC+4zf6GYxZBHZsR5E0nqp0x3pzSqWigarB gTIWym4k6blpHmxnb/k+98MV2DrUr9ANI9Q/UI24XsRUpblD91UY/4eJPHO5+Hc2VW fF5hTw5S1+irQLJvkPJn4zMAbwdpZMkHHg7Ojixrp6/Mr8+iO/MizXsqs0D0jjS/Md 6QPmn6YfQtadu8OIVZeOOrn2HS3s6GjrRLoHGp+a0KRyxq1meITYUxc1YQJkCmaJE/ c/yxqsAodFeZG6Us7N+mimmlEukB9Ohw4Ltw6QsdgeMA7YSdIRF4fuRFtL6TwOPNTR z8dp2b4cjgGzA== Date: Wed, 29 Apr 2026 13:52:38 +0100 From: Lorenzo Stoakes To: Usama Arif Cc: "David Hildenbrand (Arm)" , Andrew Morton , chrisl@kernel.org, kasong@tencent.com, ziy@nvidia.com, bhe@redhat.com, willy@infradead.org, youngjun.park@lge.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, alex@ghiti.fr, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, nphamcs@gmail.com, shikemeng@huaweicloud.com, kernel-team@meta.com Subject: Re: [PATCH 00/13] mm: PMD-level swap entries for anonymous THPs Message-ID: References: <20260427100553.2754667-1-usama.arif@linux.dev> <74a5ab18-1486-4ad2-82fc-bea14b9122e0@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Apr 29, 2026 at 10:39:23AM +0100, Usama Arif wrote: > > > On 28/04/2026 20:54, David Hildenbrand (Arm) wrote: > > On 4/27/26 12:01, Usama Arif wrote: > >> When reclaim swaps out a PMD-mapped anonymous THP today, the PMD is > >> split into 512 PTE-level swap entries via TTU_SPLIT_HUGE_PMD before > >> unmap. > >> > >> This series introduces a PMD-level swap entry. The huge mapping is > >> preserved across the swap round-trip, and do_huge_pmd_swap_page() > >> resolves the entire 2 MB region in a single fault on swap-in, > >> no khugepaged involvement is needed. swap_map metadata is identical > >> either way (512 single-slot counts), so the PTE split buys nothing > >> on the swap side, it is purely a page-table representation change. > >> > >> This work was brought about after Hugh reported that one of the > >> major blockers for having lazy page table deposit is the lack of > >> PMD swap entries [1]. However, this series has benefits of its > >> own: > >> - The huge mapping is restored on swap-in. Today even when the > >> folio is still in swap cache as a single 2 MB folio, the swap-in > >> path installs 512 PTE mappings -- the PMD mapping is gone, the > >> freshly-materialised PTE table sticks around, and only > >> khugepaged can later collapse the range back into a THP. > >> do_huge_pmd_swap_page() reinstalls the PMD mapping directly in > >> one fault, no khugepaged involvement. > > > > Ack, that's nice. > > > >> - Memory saved per swapped-out THP *once lazy page table deposit is > >> merged* [2]. With lazy page table deposit [2], splitting a PMD into > >> 512 PTE swap entries forces allocation of a 4 KB PTE table page. > >> The new path leaves the pgtable hierarchy at PMD level and avoids > >> that allocation entirely. > >> This will save memory when swapping, which is likely when there is > >> memory pressure and exactly when allocations are most likely to > >> fail. > > > > Also ack. > > > >> - Walkers (zap, mprotect, smaps, pagemap, soft-dirty, uffd-wp) > >> visit one PMD entry instead of 512 PTEs, reducing traversal > >> time and lock-hold windows. > > > > Right. > > > >> > >> The swap entry value is identical to 512 PTE swap entries (same > >> type, same starting offset), so swap_map refcounting is unchanged. > >> Only the page-table representation differs; the swap slot allocator, > >> swap I/O, and swap cache are untouched. The new path falls back to > >> the existing PTE-split path whenever a PMD-order resource is > >> unavailable: zswap enabled, non-contiguous swap allocation > >> (THP_SWPOUT_FALLBACK), PMD-order folio allocation failure on swap-in > >> or fork, racing folio split, or rmap-driven split on a swapcache > >> folio. Walkers that previously assumed every non-present PMD encodes > >> a PFN (migration / device_private) are taught to recognise PMD swap > >> entries. > > > > All sounds nice. I'll get to review this soon. LSF/MM and travel will slow me a > > bit down in May :( > > > > Thanks! Appreciate it! > My email is a disaster right now, various other stuff + lately working hard on the thing-I'm-going-to-talk-about-at-LSF and the-slides-for-that has left me with only backlog but... :) will want to have a look post-LSF also. But May likely to be slow for me alos. Cheers, Lorenzo