From: Usama Arif <usama.arif@linux.dev>
To: Nico Pache <npache@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
david@kernel.org, lorenzo.stoakes@oracle.com,
willy@infradead.org, linux-mm@kvack.org, fvdl@google.com,
hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev,
kas@kernel.org, baohua@kernel.org, dev.jain@arm.com,
baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
ryan.roberts@arm.com, Vlastimil Babka <vbabka@kernel.org>,
lance.yang@linux.dev, linux-kernel@vger.kernel.org,
kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au,
linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com,
gor@linux.ibm.com, agordeev@linux.ibm.com,
borntraeger@linux.ibm.com, svens@linux.ibm.com,
linux-s390@vger.kernel.org
Subject: Re: [RFC v2 12/21] mm: thp: handle split failure in device migration
Date: Thu, 5 Mar 2026 19:55:22 +0300 [thread overview]
Message-ID: <6982e9fc-cc17-4d4f-b26e-83997c4bf070@linux.dev> (raw)
In-Reply-To: <CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com>
On 02/03/2026 21:20, Nico Pache wrote:
> On Thu, Feb 26, 2026 at 4:34 AM Usama Arif <usama.arif@linux.dev> wrote:
>>
>> Device memory migration has two call sites that split huge PMDs:
>>
>> migrate_vma_split_unmapped_folio():
>> Called from migrate_vma_pages() when migrating a PMD-mapped THP to a
>> destination that doesn't support compound pages. It splits the PMD
>> then splits the folio via folio_split_unmapped().
>>
>> If the PMD split fails, folio_split_unmapped() would operate on an
>> unsplit folio with inconsistent page table state. Propagate -ENOMEM
>> to skip this page's migration. This is safe as folio_split_unmapped
>> failure would be propagated in a similar way.
>>
>> migrate_vma_insert_page():
>> Called from migrate_vma_pages() when inserting a page into a VMA
>> during migration back from device memory. If a huge zero PMD exists
>> at the target address, it must be split before PTE insertion.
>>
>> If the split fails, the subsequent pte_alloc() and set_pte_at() would
>> operate on a PMD slot still occupied by the huge zero entry. Use
>> goto abort, consistent with other allocation failures in this function.
>>
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>> ---
>> mm/migrate_device.c | 16 ++++++++++++++--
>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>> index 78c7acf024615..bc53e06fd9735 100644
>> --- a/mm/migrate_device.c
>> +++ b/mm/migrate_device.c
>> @@ -909,7 +909,13 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>> int ret = 0;
>>
>> folio_get(folio);
>
> Should we be concerned about this folio_get? Are we incrementing a
> reference that was already held if we back out of the split?
>
> -- Nico
Hi Nico,
Thanks for pointing this out. It spun out to an entire investigation for me [1].
Similar to [1], I inserted trace prints [2] and created a new __split_huge_pmd2
that always returns -ENOMEM. Without folio_put on error [3], we get a refcount of 2.
hmm-tests-129 [000] .l... 1.485514: __migrate_device_finalize: FINALIZE[0]: src=ffb48827440e8000 dst=ffb48827440e8000 src==dst=1 refcount_src=2 mapcount_src=0 order_src=9 migrate=0 BEFORE remove_migration_ptes
hmm-tests-129 [000] .l... 1.485517: __migrate_device_finalize: FINALIZE[0]: src=ffb48827440e8000 refcount=3 mapcount=1 AFTER remove_migration_ptes
hmm-tests-129 [000] .l... 1.485518: __migrate_device_finalize: FINALIZE[0]: src=ffb48827440e8000 refcount=2 AFTER folio_put(src)
With folio_put on error [4], we get a refcount of 1.
hmm-tests-129 [001] ..... 1.492216: __migrate_device_finalize: FINALIZE[0]: src=fff7b8be840f0000 dst=fff7b8be840f0000 src==dst=1 refcount_src=1 mapcount_src=0 order_src=9 migrate=0 BEFORE remove_migration_ptes
hmm-tests-129 [001] ..... 1.492219: __migrate_device_finalize: FINALIZE[0]: src=fff7b8be840f0000 refcount=2 mapcount=1 AFTER remove_migration_ptes
hmm-tests-129 [001] ..... 1.492220: __migrate_device_finalize: FINALIZE[0]: src=fff7b8be840f0000 refcount=1 AFTER folio_put(src)
So we need folio_put for split_huge_pmd_address failure, but NOT for
folio_split_unmapped.
[1] https://lore.kernel.org/all/332c9e16-46c3-4e1c-898e-2cb0a87ba1fc@linux.dev/
[2] https://gist.github.com/uarif1/6abe4bedb85814e9be8d48a4fe742b41
[3] https://gist.github.com/uarif1/f718af2113bc1a33484674b61b9dafcc
[4] https://gist.github.com/uarif1/03c42f2549eaf2bc555e8b03e07a63c8
next prev parent reply other threads:[~2026-03-05 16:55 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-26 11:23 [RFC v2 00/21] mm: thp: lazy PTE page table allocation at PMD split Usama Arif
2026-02-26 11:23 ` [RFC v2 01/21] mm: thp: make split_huge_pmd functions return int for error propagation Usama Arif
2026-02-26 11:23 ` [RFC v2 02/21] mm: thp: propagate split failure from vma_adjust_trans_huge() Usama Arif
2026-02-26 11:23 ` [RFC v2 03/21] mm: thp: handle split failure in copy_huge_pmd() Usama Arif
2026-02-26 11:23 ` [RFC v2 04/21] mm: thp: handle split failure in do_huge_pmd_wp_page() Usama Arif
2026-02-26 11:23 ` [RFC v2 05/21] mm: thp: handle split failure in zap_pmd_range() Usama Arif
2026-02-26 11:23 ` [RFC v2 06/21] mm: thp: handle split failure in wp_huge_pmd() Usama Arif
2026-02-26 11:23 ` [RFC v2 07/21] mm: thp: retry on split failure in change_pmd_range() Usama Arif
2026-02-26 11:23 ` [RFC v2 08/21] mm: thp: handle split failure in follow_pmd_mask() Usama Arif
2026-02-26 11:23 ` [RFC v2 09/21] mm: handle walk_page_range() failure from THP split Usama Arif
2026-02-26 11:23 ` [RFC v2 10/21] mm: thp: handle split failure in mremap move_page_tables() Usama Arif
2026-02-26 11:23 ` [RFC v2 11/21] mm: thp: handle split failure in userfaultfd move_pages() Usama Arif
2026-02-26 11:23 ` [RFC v2 12/21] mm: thp: handle split failure in device migration Usama Arif
2026-03-02 21:20 ` Nico Pache
2026-03-04 11:55 ` Usama Arif
2026-03-05 16:55 ` Usama Arif [this message]
2026-03-09 15:09 ` Nico Pache
2026-03-09 21:34 ` Usama Arif
2026-02-26 11:23 ` [RFC v2 13/21] mm: huge_mm: Make sure all split_huge_pmd calls are checked Usama Arif
2026-02-26 16:32 ` kernel test robot
2026-02-27 12:11 ` Usama Arif
2026-02-26 11:23 ` [RFC v2 14/21] mm: thp: allocate PTE page tables lazily at split time Usama Arif
2026-02-26 11:23 ` [RFC v2 15/21] mm: thp: remove pgtable_trans_huge_{deposit/withdraw} when not needed Usama Arif
2026-02-26 11:23 ` [RFC v2 16/21] mm: thp: add THP_SPLIT_PMD_FAILED counter Usama Arif
2026-02-26 13:56 ` kernel test robot
2026-02-26 14:22 ` Usama Arif
2026-02-26 15:10 ` kernel test robot
2026-02-26 11:23 ` [RFC v2 17/21] selftests/mm: add THP PMD split test infrastructure Usama Arif
2026-02-26 11:23 ` [RFC v2 18/21] selftests/mm: add partial_mprotect test for change_pmd_range Usama Arif
2026-02-26 11:23 ` [RFC v2 19/21] selftests/mm: add partial_mlock test Usama Arif
2026-02-26 11:23 ` [RFC v2 20/21] selftests/mm: add partial_mremap test for move_page_tables Usama Arif
2026-02-26 11:23 ` [RFC v2 21/21] selftests/mm: add madv_dontneed_partial test Usama Arif
2026-02-26 21:01 ` [RFC v2 00/21] mm: thp: lazy PTE page table allocation at PMD split Nico Pache
2026-02-27 11:13 ` Usama Arif
2026-02-28 0:06 ` Nico Pache
2026-03-02 11:08 ` Usama Arif
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6982e9fc-cc17-4d4f-b26e-83997c4bf070@linux.dev \
--to=usama.arif@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=borntraeger@linux.ibm.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=fvdl@google.com \
--cc=gor@linux.ibm.com \
--cc=hannes@cmpxchg.org \
--cc=hca@linux.ibm.com \
--cc=kas@kernel.org \
--cc=kernel-team@meta.com \
--cc=lance.yang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=maddy@linux.ibm.com \
--cc=mpe@ellerman.id.au \
--cc=npache@redhat.com \
--cc=riel@surriel.com \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=svens@linux.ibm.com \
--cc=vbabka@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.