From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8240023741 for ; Tue, 9 Sep 2025 04:00:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757390435; cv=none; b=f5C1D3vYvP7J6fcx0XtbUsEHGaO9HGanBUbyM2+QanbeliW1Lji2kQN4Kf7DHPHf0nDydRu7rqPxu6rzFsJ71qU6IBYROXmeikRHD4BPhuMRS5V1xlYk5pe4Knb7EYAcJbjd/RNfKL+7Ad3xvllPDjca672g91qg2dhKTDHcyf8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757390435; c=relaxed/simple; bh=+oHJiimVLNw0GycVEUrxuNnYBzAVaGiKzo/vo6lShL8=; h=Date:To:From:Subject:Message-Id; b=L5k2v0oj4EuTeE6IrWi3KkHrYTffda/aPbTZj5U5Kqmr1tttN7WxE1qkoUN8hHyIOVsYHutIc/lj0eYxdSzpSyfu054gS2DCS6QklC1Pn9iuHIeJpBtlAnr7hH2JtqO8/iFap9dCPTD8lgNMO64SOq//Vmd2kohJEo6hZZf5SFg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b=mOto6JIY; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="mOto6JIY" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 109D5C4CEF4; Tue, 9 Sep 2025 04:00:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1757390435; bh=+oHJiimVLNw0GycVEUrxuNnYBzAVaGiKzo/vo6lShL8=; h=Date:To:From:Subject:From; b=mOto6JIYkM9k4L4Bm2/lmdXRbtvo6h8XKUrkIR8W0hDnt0No/yOsblAUxpUNiFXFe 1XWFTQl93EV3OMlTs0gI7jvVQFZge3T1V3BDrLfuUI1hut1HdN5WdammjiT+JzV//U Fr1bl5OYG2yeC/E2JXmMoj8ZDxcQNBB/6zxzipb8= Date: Mon, 08 Sep 2025 21:00:34 -0700 To: mm-commits@vger.kernel.org,ziy@nvidia.com,ying.huang@linux.alibaba.com,simona@ffwll.ch,ryan.roberts@arm.com,rcampbell@nvidia.com,rakie.kim@sk.com,osalvador@suse.de,npache@redhat.com,mpenttil@redhat.com,matthew.brost@intel.com,lyude@redhat.com,lorenzo.stoakes@oracle.com,Liam.Howlett@oracle.com,joshua.hahnjy@gmail.com,gourry@gourry.net,francois.dugast@intel.com,dev.jain@arm.com,david@redhat.com,dakr@kernel.org,byungchul@sk.com,baolin.wang@linux.alibaba.com,baohua@kernel.org,apopple@nvidia.com,airlied@gmail.com,balbirs@nvidia.com,akpm@linux-foundation.org From: Andrew Morton Subject: + mm-huge_memory-implement-device-private-thp-splitting.patch added to mm-new branch Message-Id: <20250909040035.109D5C4CEF4@smtp.kernel.org> Precedence: bulk X-Mailing-List: mm-commits@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: The patch titled Subject: mm/huge_memory: implement device-private THP splitting has been added to the -mm mm-new branch. Its filename is mm-huge_memory-implement-device-private-thp-splitting.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-huge_memory-implement-device-private-thp-splitting.patch This patch will later appear in the mm-new branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Note, mm-new is a provisional staging ground for work-in-progress patches, and acceptance into mm-new is a notification for others take notice and to finish up reviews. Please do not hesitate to respond to review feedback and post updated versions to replace or incrementally fixup patches in mm-new. Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Balbir Singh Subject: mm/huge_memory: implement device-private THP splitting Date: Mon, 8 Sep 2025 10:04:37 +1000 Add support for splitting device-private THP folios, enabling fallback to smaller page sizes when large page allocation or migration fails. Key changes: - split_huge_pmd(): Handle device-private PMD entries during splitting - Preserve RMAP_EXCLUSIVE semantics for anonymous exclusive folios - Skip RMP_USE_SHARED_ZEROPAGE for device-private entries as they don't support shared zero page semantics Link: https://lkml.kernel.org/r/20250908000448.180088-5-balbirs@nvidia.com Signed-off-by: Balbir Singh Cc: David Hildenbrand Cc: Zi Yan Cc: Joshua Hahn Cc: Rakie Kim Cc: Byungchul Park Cc: Gregory Price Cc: Ying Huang Cc: Alistair Popple Cc: Oscar Salvador Cc: Lorenzo Stoakes Cc: Baolin Wang Cc: "Liam R. Howlett" Cc: Nico Pache Cc: Ryan Roberts Cc: Dev Jain Cc: Barry Song Cc: Lyude Paul Cc: Danilo Krummrich Cc: David Airlie Cc: Simona Vetter Cc: Ralph Campbell Cc: Mika Penttilä Cc: Matthew Brost Cc: Francois Dugast Signed-off-by: Andrew Morton --- mm/huge_memory.c | 129 +++++++++++++++++++++++++++++++-------------- 1 file changed, 91 insertions(+), 38 deletions(-) --- a/mm/huge_memory.c~mm-huge_memory-implement-device-private-thp-splitting +++ a/mm/huge_memory.c @@ -2880,16 +2880,19 @@ static void __split_huge_pmd_locked(stru struct page *page; pgtable_t pgtable; pmd_t old_pmd, _pmd; - bool young, write, soft_dirty, pmd_migration = false, uffd_wp = false; - bool anon_exclusive = false, dirty = false; + bool young, write, soft_dirty, uffd_wp = false; + bool anon_exclusive = false, dirty = false, present = false; unsigned long addr; pte_t *pte; int i; + swp_entry_t swp_entry; VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd)); + + VM_WARN_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) && + !is_pmd_device_private_entry(*pmd)); count_vm_event(THP_SPLIT_PMD); @@ -2937,18 +2940,43 @@ static void __split_huge_pmd_locked(stru return __split_huge_zero_page_pmd(vma, haddr, pmd); } - pmd_migration = is_pmd_migration_entry(*pmd); - if (unlikely(pmd_migration)) { - swp_entry_t entry; + present = pmd_present(*pmd); + if (unlikely(!present)) { + swp_entry = pmd_to_swp_entry(*pmd); old_pmd = *pmd; - entry = pmd_to_swp_entry(old_pmd); - page = pfn_swap_entry_to_page(entry); - write = is_writable_migration_entry(entry); - if (PageAnon(page)) - anon_exclusive = is_readable_exclusive_migration_entry(entry); - young = is_migration_entry_young(entry); - dirty = is_migration_entry_dirty(entry); + + folio = pfn_swap_entry_folio(swp_entry); + VM_WARN_ON(!is_migration_entry(swp_entry) && + !is_device_private_entry(swp_entry)); + page = pfn_swap_entry_to_page(swp_entry); + + if (is_pmd_migration_entry(old_pmd)) { + write = is_writable_migration_entry(swp_entry); + if (PageAnon(page)) + anon_exclusive = + is_readable_exclusive_migration_entry( + swp_entry); + young = is_migration_entry_young(swp_entry); + dirty = is_migration_entry_dirty(swp_entry); + } else if (is_pmd_device_private_entry(old_pmd)) { + write = is_writable_device_private_entry(swp_entry); + anon_exclusive = PageAnonExclusive(page); + if (freeze && anon_exclusive && + folio_try_share_anon_rmap_pmd(folio, page)) + freeze = false; + if (!freeze) { + rmap_t rmap_flags = RMAP_NONE; + + folio_ref_add(folio, HPAGE_PMD_NR - 1); + if (anon_exclusive) + rmap_flags |= RMAP_EXCLUSIVE; + + folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR, + vma, haddr, rmap_flags); + } + } + soft_dirty = pmd_swp_soft_dirty(old_pmd); uffd_wp = pmd_swp_uffd_wp(old_pmd); } else { @@ -3034,30 +3062,49 @@ static void __split_huge_pmd_locked(stru * Note that NUMA hinting access restrictions are not transferred to * avoid any possibility of altering permissions across VMAs. */ - if (freeze || pmd_migration) { + if (freeze || !present) { for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) { pte_t entry; - swp_entry_t swp_entry; - - if (write) - swp_entry = make_writable_migration_entry( - page_to_pfn(page + i)); - else if (anon_exclusive) - swp_entry = make_readable_exclusive_migration_entry( - page_to_pfn(page + i)); - else - swp_entry = make_readable_migration_entry( - page_to_pfn(page + i)); - if (young) - swp_entry = make_migration_entry_young(swp_entry); - if (dirty) - swp_entry = make_migration_entry_dirty(swp_entry); - entry = swp_entry_to_pte(swp_entry); - if (soft_dirty) - entry = pte_swp_mksoft_dirty(entry); - if (uffd_wp) - entry = pte_swp_mkuffd_wp(entry); - + if (freeze || is_migration_entry(swp_entry)) { + if (write) + swp_entry = make_writable_migration_entry( + page_to_pfn(page + i)); + else if (anon_exclusive) + swp_entry = make_readable_exclusive_migration_entry( + page_to_pfn(page + i)); + else + swp_entry = make_readable_migration_entry( + page_to_pfn(page + i)); + if (young) + swp_entry = make_migration_entry_young(swp_entry); + if (dirty) + swp_entry = make_migration_entry_dirty(swp_entry); + entry = swp_entry_to_pte(swp_entry); + if (soft_dirty) + entry = pte_swp_mksoft_dirty(entry); + if (uffd_wp) + entry = pte_swp_mkuffd_wp(entry); + } else { + /* + * anon_exclusive was already propagated to the relevant + * pages corresponding to the pte entries when freeze + * is false. + */ + if (write) + swp_entry = make_writable_device_private_entry( + page_to_pfn(page + i)); + else + swp_entry = make_readable_device_private_entry( + page_to_pfn(page + i)); + /* + * Young and dirty bits are not progated via swp_entry + */ + entry = swp_entry_to_pte(swp_entry); + if (soft_dirty) + entry = pte_swp_mksoft_dirty(entry); + if (uffd_wp) + entry = pte_swp_mkuffd_wp(entry); + } VM_WARN_ON(!pte_none(ptep_get(pte + i))); set_pte_at(mm, addr, pte + i, entry); } @@ -3084,7 +3131,7 @@ static void __split_huge_pmd_locked(stru } pte_unmap(pte); - if (!pmd_migration) + if (!is_pmd_migration_entry(*pmd)) folio_remove_rmap_pmd(folio, page, vma); if (freeze) put_page(page); @@ -3096,8 +3143,10 @@ static void __split_huge_pmd_locked(stru void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, bool freeze) { + VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); - if (pmd_trans_huge(*pmd) || is_pmd_migration_entry(*pmd)) + if (pmd_trans_huge(*pmd) || is_pmd_migration_entry(*pmd) || + is_pmd_device_private_entry(*pmd)) __split_huge_pmd_locked(vma, pmd, address, freeze); } @@ -3276,6 +3325,9 @@ static void lru_add_split_folio(struct f VM_BUG_ON_FOLIO(folio_test_lru(new_folio), folio); lockdep_assert_held(&lruvec->lru_lock); + if (folio_is_device_private(folio)) + return; + if (list) { /* page reclaim is reclaiming a huge page */ VM_WARN_ON(folio_test_lru(folio)); @@ -3894,8 +3946,9 @@ fail: if (nr_shmem_dropped) shmem_uncharge(mapping->host, nr_shmem_dropped); - if (!ret && is_anon) + if (!ret && is_anon && !folio_is_device_private(folio)) remap_flags = RMP_USE_SHARED_ZEROPAGE; + remap_page(folio, 1 << order, remap_flags); /* _ Patches currently in -mm which might be from balbirs@nvidia.com are mm-zone_device-support-large-zone-device-private-folios.patch mm-huge_memory-add-device-private-thp-support-to-pmd-operations.patch mm-rmap-extend-rmap-and-migration-support-device-private-entries.patch mm-huge_memory-implement-device-private-thp-splitting.patch mm-migrate_device-handle-partially-mapped-folios-during-collection.patch mm-migrate_device-implement-thp-migration-of-zone-device-pages.patch mm-memory-fault-add-thp-fault-handling-for-zone-device-private-pages.patch lib-test_hmm-add-zone-device-private-thp-test-infrastructure.patch mm-memremap-add-driver-callback-support-for-folio-splitting.patch mm-migrate_device-add-thp-splitting-during-migration.patch lib-test_hmm-add-large-page-allocation-failure-testing.patch selftests-mm-hmm-tests-new-tests-for-zone-device-thp-migration.patch selftests-mm-hmm-tests-new-throughput-tests-including-thp.patch gpu-drm-nouveau-enable-thp-support-for-gpu-memory-migration.patch