From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2ACBB3CD8CD for ; Wed, 29 Apr 2026 09:39:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.186 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777455576; cv=none; b=PhQgj1DCRmw/MOh7igg1SWJV0kowxYTF6ddMmwgzB96zlzdvEXI+p6wQ2GOftHOcG5D2Q8JyjXFmTfEUt4pOydgj/plAJNMME7jBmxbS+O2DSqta9ON+JGys7vgmLh+q0j+fqGfVh7WQjp508zd/oTOMvVyMgMd4G1zojJ65a2U= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777455576; c=relaxed/simple; bh=V0BoVqqdMVkC+Zh9Y+gCdDCZPjjNKO4PzZLTyNzq+nA=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=BTG5rk/Z5VGp/vXMflSx7t6tudIGofIfA3+H0krM+VyOR19TEuNluYkWtHpSMclEdoEswMmTGxuMvTnOrihEY13Luk67i01wytYvnd82yblIDU/MRb4iJkVTWG92Kfb4GVAAs+hY6MOrmT4/lkpWqN1b4ZQM4YxWxbasIPpupRE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=v4FMfumY; arc=none smtp.client-ip=91.218.175.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="v4FMfumY" Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1777455566; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yIiwPio9RbUKdWiYk2kHXCgHn8NzcEDgxzPNf/DAyAc=; b=v4FMfumY4ObXwPO7U64ehuQtvLO1tt5htyOIB9eFKTfkUVTWfqy/ZhVFtf/w40wjU8tyqH PJiav05hn37TD36zjGcnAP6cMBZEv4nMSDA3Dl81LftKZt5CoQmtm7Sb2+AKDxZfX9lUQq exZ3QK+AdjTupFIAG2r6qWidxaZOMno= Date: Wed, 29 Apr 2026 10:39:23 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH 00/13] mm: PMD-level swap entries for anonymous THPs To: "David Hildenbrand (Arm)" , Andrew Morton , chrisl@kernel.org, kasong@tencent.com, ljs@kernel.org, ziy@nvidia.com Cc: bhe@redhat.com, willy@infradead.org, youngjun.park@lge.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, alex@ghiti.fr, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, nphamcs@gmail.com, shikemeng@huaweicloud.com, kernel-team@meta.com References: <20260427100553.2754667-1-usama.arif@linux.dev> <74a5ab18-1486-4ad2-82fc-bea14b9122e0@kernel.org> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Usama Arif In-Reply-To: <74a5ab18-1486-4ad2-82fc-bea14b9122e0@kernel.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT On 28/04/2026 20:54, David Hildenbrand (Arm) wrote: > On 4/27/26 12:01, Usama Arif wrote: >> When reclaim swaps out a PMD-mapped anonymous THP today, the PMD is >> split into 512 PTE-level swap entries via TTU_SPLIT_HUGE_PMD before >> unmap. >> >> This series introduces a PMD-level swap entry. The huge mapping is >> preserved across the swap round-trip, and do_huge_pmd_swap_page() >> resolves the entire 2 MB region in a single fault on swap-in, >> no khugepaged involvement is needed. swap_map metadata is identical >> either way (512 single-slot counts), so the PTE split buys nothing >> on the swap side, it is purely a page-table representation change. >> >> This work was brought about after Hugh reported that one of the >> major blockers for having lazy page table deposit is the lack of >> PMD swap entries [1]. However, this series has benefits of its >> own: >> - The huge mapping is restored on swap-in. Today even when the >> folio is still in swap cache as a single 2 MB folio, the swap-in >> path installs 512 PTE mappings -- the PMD mapping is gone, the >> freshly-materialised PTE table sticks around, and only >> khugepaged can later collapse the range back into a THP. >> do_huge_pmd_swap_page() reinstalls the PMD mapping directly in >> one fault, no khugepaged involvement. > > Ack, that's nice. > >> - Memory saved per swapped-out THP *once lazy page table deposit is >> merged* [2]. With lazy page table deposit [2], splitting a PMD into >> 512 PTE swap entries forces allocation of a 4 KB PTE table page. >> The new path leaves the pgtable hierarchy at PMD level and avoids >> that allocation entirely. >> This will save memory when swapping, which is likely when there is >> memory pressure and exactly when allocations are most likely to >> fail. > > Also ack. > >> - Walkers (zap, mprotect, smaps, pagemap, soft-dirty, uffd-wp) >> visit one PMD entry instead of 512 PTEs, reducing traversal >> time and lock-hold windows. > > Right. > >> >> The swap entry value is identical to 512 PTE swap entries (same >> type, same starting offset), so swap_map refcounting is unchanged. >> Only the page-table representation differs; the swap slot allocator, >> swap I/O, and swap cache are untouched. The new path falls back to >> the existing PTE-split path whenever a PMD-order resource is >> unavailable: zswap enabled, non-contiguous swap allocation >> (THP_SWPOUT_FALLBACK), PMD-order folio allocation failure on swap-in >> or fork, racing folio split, or rmap-driven split on a swapcache >> folio. Walkers that previously assumed every non-present PMD encodes >> a PFN (migration / device_private) are taught to recognise PMD swap >> entries. > > All sounds nice. I'll get to review this soon. LSF/MM and travel will slow me a > bit down in May :( > Thanks! Appreciate it!