From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4CA2A351C04 for ; Tue, 28 Apr 2026 19:54:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777406098; cv=none; b=qqFM3lIlBg+bfgfcfmNbTyLvuaAhkNm1PZNzz6f8g6oQacE93WfdghQAZH4lAo2bk1KFZ536Q4ka+JcjEvA964wOM6ml4P+7Myp5AEk5dBoqbE7KaVBudyHrmtUbiTcHNa3zWgGYEHORYDiXumdYbbU7uwd3OQQbMcYP6cOebhU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777406098; c=relaxed/simple; bh=AWjm4TGCupaiRWdgdU/771qPVtO0hZpbKb8AdBSFXoU=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=MD/URTEz+ppaCGqVaYjS6GF55vIqnCj5zL0vcegFV7fpoH4R1XikGGMVhVIJzJ/mUBLZNUXcVhlZigyp/SGv+VNQh50UI6G0Z6YUUzoL9qZFNZzy1ulxG8csJnP1Hi8cboVcsLdDCu+dbtwJKQxc/tn96j6BsZE/9AyCiQ07Prg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=iyeZxHlk; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iyeZxHlk" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4E6F9C2BCAF; Tue, 28 Apr 2026 19:54:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777406097; bh=AWjm4TGCupaiRWdgdU/771qPVtO0hZpbKb8AdBSFXoU=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=iyeZxHlkHheAohCmM5bGGK3K2yghAKgCmubWWhhT6sb0n1/sQFYSoWexo+JBiqVnM 6S7kZayE4fq2ImisKc26ozIaTSvccSIGmU6x+5IauDJeG7oXOocp4hb9wAUlKr+WKp 9Rp+Vwohvt4q4uNBgEWpd2a5iR8jww1RhO3pyg9S3ZWo+syW99TlppsZ0UzJSTIBFn 3PxLFrN82/HWA93e5nBA4MVpjnkLEvXyk7pxN8z/byDd8v6W9SwaY8UTZL22Y4XPsY g6Y87oScu2jgElPX60HIZXTZx1VmXi0aaA9uXPoSS0sNMbX42y99zJa6udW3hWCgog H/np8XCenZM2A== Message-ID: <74a5ab18-1486-4ad2-82fc-bea14b9122e0@kernel.org> Date: Tue, 28 Apr 2026 21:54:49 +0200 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 00/13] mm: PMD-level swap entries for anonymous THPs To: Usama Arif , Andrew Morton , chrisl@kernel.org, kasong@tencent.com, ljs@kernel.org, ziy@nvidia.com Cc: bhe@redhat.com, willy@infradead.org, youngjun.park@lge.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, alex@ghiti.fr, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, nphamcs@gmail.com, shikemeng@huaweicloud.com, kernel-team@meta.com References: <20260427100553.2754667-1-usama.arif@linux.dev> From: "David Hildenbrand (Arm)" Content-Language: en-US Autocrypt: addr=david@kernel.org; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzS5EYXZpZCBIaWxk ZW5icmFuZCAoQ3VycmVudCkgPGRhdmlkQGtlcm5lbC5vcmc+wsGQBBMBCAA6AhsDBQkmWAik AgsJBBUKCQgCFgICHgUCF4AWIQQb2cqtc1xMOkYN/MpN3hD3AP+DWgUCaYJt/AIZAQAKCRBN 3hD3AP+DWriiD/9BLGEKG+N8L2AXhikJg6YmXom9ytRwPqDgpHpVg2xdhopoWdMRXjzOrIKD g4LSnFaKneQD0hZhoArEeamG5tyo32xoRsPwkbpIzL0OKSZ8G6mVbFGpjmyDLQCAxteXCLXz ZI0VbsuJKelYnKcXWOIndOrNRvE5eoOfTt2XfBnAapxMYY2IsV+qaUXlO63GgfIOg8RBaj7x 3NxkI3rV0SHhI4GU9K6jCvGghxeS1QX6L/XI9mfAYaIwGy5B68kF26piAVYv/QZDEVIpo3t7 /fjSpxKT8plJH6rhhR0epy8dWRHk3qT5tk2P85twasdloWtkMZ7FsCJRKWscm1BLpsDn6EQ4 jeMHECiY9kGKKi8dQpv3FRyo2QApZ49NNDbwcR0ZndK0XFo15iH708H5Qja/8TuXCwnPWAcJ DQoNIDFyaxe26Rx3ZwUkRALa3iPcVjE0//TrQ4KnFf+lMBSrS33xDDBfevW9+Dk6IISmDH1R HFq2jpkN+FX/PE8eVhV68B2DsAPZ5rUwyCKUXPTJ/irrCCmAAb5Jpv11S7hUSpqtM/6oVESC 3z/7CzrVtRODzLtNgV4r5EI+wAv/3PgJLlMwgJM90Fb3CB2IgbxhjvmB1WNdvXACVydx55V7 LPPKodSTF29rlnQAf9HLgCphuuSrrPn5VQDaYZl4N/7zc2wcWM7BTQRVy5+RARAA59fefSDR 9nMGCb9LbMX+TFAoIQo/wgP5XPyzLYakO+94GrgfZjfhdaxPXMsl2+o8jhp/hlIzG56taNdt VZtPp3ih1AgbR8rHgXw1xwOpuAd5lE1qNd54ndHuADO9a9A0vPimIes78Hi1/yy+ZEEvRkHk /kDa6F3AtTc1m4rbbOk2fiKzzsE9YXweFjQvl9p+AMw6qd/iC4lUk9g0+FQXNdRs+o4o6Qvy iOQJfGQ4UcBuOy1IrkJrd8qq5jet1fcM2j4QvsW8CLDWZS1L7kZ5gT5EycMKxUWb8LuRjxzZ 3QY1aQH2kkzn6acigU3HLtgFyV1gBNV44ehjgvJpRY2cC8VhanTx0dZ9mj1YKIky5N+C0f21 zvntBqcxV0+3p8MrxRRcgEtDZNav+xAoT3G0W4SahAaUTWXpsZoOecwtxi74CyneQNPTDjNg azHmvpdBVEfj7k3p4dmJp5i0U66Onmf6mMFpArvBRSMOKU9DlAzMi4IvhiNWjKVaIE2Se9BY FdKVAJaZq85P2y20ZBd08ILnKcj7XKZkLU5FkoA0udEBvQ0f9QLNyyy3DZMCQWcwRuj1m73D sq8DEFBdZ5eEkj1dCyx+t/ga6x2rHyc8Sl86oK1tvAkwBNsfKou3v+jP/l14a7DGBvrmlYjO 59o3t6inu6H7pt7OL6u6BQj7DoMAEQEAAcLBfAQYAQgAJgIbDBYhBBvZyq1zXEw6Rg38yk3e EPcA/4NaBQJonNqrBQkmWAihAAoJEE3eEPcA/4NaKtMQALAJ8PzprBEXbXcEXwDKQu+P/vts IfUb1UNMfMV76BicGa5NCZnJNQASDP/+bFg6O3gx5NbhHHPeaWz/VxlOmYHokHodOvtL0WCC 8A5PEP8tOk6029Z+J+xUcMrJClNVFpzVvOpb1lCbhjwAV465Hy+NUSbbUiRxdzNQtLtgZzOV Zw7jxUCs4UUZLQTCuBpFgb15bBxYZ/BL9MbzxPxvfUQIPbnzQMcqtpUs21CMK2PdfCh5c4gS sDci6D5/ZIBw94UQWmGpM/O1ilGXde2ZzzGYl64glmccD8e87OnEgKnH3FbnJnT4iJchtSvx yJNi1+t0+qDti4m88+/9IuPqCKb6Stl+s2dnLtJNrjXBGJtsQG/sRpqsJz5x1/2nPJSRMsx9 5YfqbdrJSOFXDzZ8/r82HgQEtUvlSXNaXCa95ez0UkOG7+bDm2b3s0XahBQeLVCH0mw3RAQg r7xDAYKIrAwfHHmMTnBQDPJwVqxJjVNr7yBic4yfzVWGCGNE4DnOW0vcIeoyhy9vnIa3w1uZ 3iyY2Nsd7JxfKu1PRhCGwXzRw5TlfEsoRI7V9A8isUCoqE2Dzh3FvYHVeX4Us+bRL/oqareJ CIFqgYMyvHj7Q06kTKmauOe4Nf0l0qEkIuIzfoLJ3qr5UyXc2hLtWyT9Ir+lYlX9efqh7mOY qIws/H2t In-Reply-To: <20260427100553.2754667-1-usama.arif@linux.dev> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 4/27/26 12:01, Usama Arif wrote: > When reclaim swaps out a PMD-mapped anonymous THP today, the PMD is > split into 512 PTE-level swap entries via TTU_SPLIT_HUGE_PMD before > unmap. > > This series introduces a PMD-level swap entry. The huge mapping is > preserved across the swap round-trip, and do_huge_pmd_swap_page() > resolves the entire 2 MB region in a single fault on swap-in, > no khugepaged involvement is needed. swap_map metadata is identical > either way (512 single-slot counts), so the PTE split buys nothing > on the swap side, it is purely a page-table representation change. > > This work was brought about after Hugh reported that one of the > major blockers for having lazy page table deposit is the lack of > PMD swap entries [1]. However, this series has benefits of its > own: > - The huge mapping is restored on swap-in. Today even when the > folio is still in swap cache as a single 2 MB folio, the swap-in > path installs 512 PTE mappings -- the PMD mapping is gone, the > freshly-materialised PTE table sticks around, and only > khugepaged can later collapse the range back into a THP. > do_huge_pmd_swap_page() reinstalls the PMD mapping directly in > one fault, no khugepaged involvement. Ack, that's nice. > - Memory saved per swapped-out THP *once lazy page table deposit is > merged* [2]. With lazy page table deposit [2], splitting a PMD into > 512 PTE swap entries forces allocation of a 4 KB PTE table page. > The new path leaves the pgtable hierarchy at PMD level and avoids > that allocation entirely. > This will save memory when swapping, which is likely when there is > memory pressure and exactly when allocations are most likely to > fail. Also ack. > - Walkers (zap, mprotect, smaps, pagemap, soft-dirty, uffd-wp) > visit one PMD entry instead of 512 PTEs, reducing traversal > time and lock-hold windows. Right. > > The swap entry value is identical to 512 PTE swap entries (same > type, same starting offset), so swap_map refcounting is unchanged. > Only the page-table representation differs; the swap slot allocator, > swap I/O, and swap cache are untouched. The new path falls back to > the existing PTE-split path whenever a PMD-order resource is > unavailable: zswap enabled, non-contiguous swap allocation > (THP_SWPOUT_FALLBACK), PMD-order folio allocation failure on swap-in > or fork, racing folio split, or rmap-driven split on a swapcache > folio. Walkers that previously assumed every non-present PMD encodes > a PFN (migration / device_private) are taught to recognise PMD swap > entries. All sounds nice. I'll get to review this soon. LSF/MM and travel will slow me a bit down in May :( -- Cheers, David