All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: akpm@linux-foundation.org, david@redhat.com, peterx@redhat.com,
	ziy@nvidia.com, baolin.wang@linux.alibaba.com, baohua@kernel.org,
	ryan.roberts@arm.com, dev.jain@arm.com, npache@redhat.com,
	riel@surriel.com, Liam.Howlett@oracle.com, vbabka@suse.cz,
	harry.yoo@oracle.com, jannh@google.com, matthew.brost@intel.com,
	joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
	gourry@gourry.net, ying.huang@linux.alibaba.com,
	apopple@nvidia.com, usamaarif642@gmail.com, yuzhao@google.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	ioworker0@gmail.com, stable@vger.kernel.org
Subject: Re: [PATCH v5 1/1] mm/rmap: fix soft-dirty and uffd-wp bit loss when remapping zero-filled mTHP subpage to shared zeropage
Date: Tue, 14 Oct 2025 20:25:12 +0800	[thread overview]
Message-ID: <d1cf240b-6e2f-453a-9119-23cfe5480f84@linux.dev> (raw)
In-Reply-To: <91c443e1-3d7d-4b95-ab62-b970b047936f@lucifer.local>

Thanks for the super energetic review!

On 2025/10/14 19:19, Lorenzo Stoakes wrote:
> Feels like the mTHP implementation is hitting up on the buffers of the THP code
> being something of a mess... :)

Haha, yeah, it really feels that way sometimes ;)

> 
> On Tue, Sep 30, 2025 at 04:10:40PM +0800, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> When splitting an mTHP and replacing a zero-filled subpage with the shared
>> zeropage, try_to_map_unused_to_zeropage() currently drops several important
>> PTE bits.
>>
>> For userspace tools like CRIU, which rely on the soft-dirty mechanism for
> 
> It's slightly by-the-by, but CRIU in my view - as it relies on kernel
> implementation details that can always change to operate - is not actually
> something we have to strictly keep working.
> 
> HOWEVER, if we can reasonably do so without causing issues for us in the kernel
> we ought to do so.
> 
>> incremental snapshots, losing the soft-dirty bit means modified pages are
>> missed, leading to inconsistent memory state after restore.
>>
>> As pointed out by David, the more critical uffd-wp bit is also dropped.
>> This breaks the userfaultfd write-protection mechanism, causing writes
>> to be silently missed by monitoring applications, which can lead to data
>> corruption.
> 
> Again, uffd-wp is a total mess. We shouldn't be in a position where its state
> being correctly retained relies on everybody always getting the subtle,
> uncommented, open-coded details right everywhere all the time.
> 
> But this is again a general comment... :)

:)

> 
>>
>> Preserve both the soft-dirty and uffd-wp bits from the old PTE when
>> creating the new zeropage mapping to ensure they are correctly tracked.
>>
>> Cc: <stable@vger.kernel.org>
>> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Suggested-by: Dev Jain <dev.jain@arm.com>
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Reviewed-by: Dev Jain <dev.jain@arm.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> 
> Overall LGTM, so:
> 
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Cheers!

> 
>> ---
>> v4 -> v5:
>>   - Move ptep_get() call after the !pvmw.pte check, which handles PMD-mapped
>>     THP migration entries.
>>   - https://lore.kernel.org/linux-mm/20250930071053.36158-1-lance.yang@linux.dev/
>>
>> v3 -> v4:
>>   - Minor formatting tweak in try_to_map_unused_to_zeropage() function
>>     signature (per David and Dev)
>>   - Collect Reviewed-by from Dev - thanks!
>>   - https://lore.kernel.org/linux-mm/20250930060557.85133-1-lance.yang@linux.dev/
>>
>> v2 -> v3:
>>   - ptep_get() gets called only once per iteration (per Dev)
>>   - https://lore.kernel.org/linux-mm/20250930043351.34927-1-lance.yang@linux.dev/
>>
>> v1 -> v2:
>>   - Avoid calling ptep_get() multiple times (per Dev)
>>   - Double-check the uffd-wp bit (per David)
>>   - Collect Acked-by from David - thanks!
>>   - https://lore.kernel.org/linux-mm/20250928044855.76359-1-lance.yang@linux.dev/
>>
>>   mm/migrate.c | 15 ++++++++++-----
>>   1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index ce83c2c3c287..e3065c9edb55 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -296,8 +296,7 @@ bool isolate_folio_to_list(struct folio *folio, struct list_head *list)
>>   }
>>
>>   static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw,
>> -					  struct folio *folio,
>> -					  unsigned long idx)
>> +		struct folio *folio, pte_t old_pte, unsigned long idx)
>>   {
>>   	struct page *page = folio_page(folio, idx);
>>   	pte_t newpte;
>> @@ -306,7 +305,7 @@ static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw,
>>   		return false;
>>   	VM_BUG_ON_PAGE(!PageAnon(page), page);
>>   	VM_BUG_ON_PAGE(!PageLocked(page), page);
>> -	VM_BUG_ON_PAGE(pte_present(ptep_get(pvmw->pte)), page);
>> +	VM_BUG_ON_PAGE(pte_present(old_pte), page);
> 
> Kinda ugly that we pass old_pte when it's avaiable via the shared state object,
> but probably nothing to really concern ourselves about.
> 
> Guess you could argue it both ways :)
> 
> It'd be good to convert these VM_BUG_ON_*() to VM_WARN_ON_*() but I guess that's

I agree.

> somewhat out of the scope of the code here and would be inconsistent to change
> it for just one condition.

Since this fix already landed in the mainline, just leave it as is here :)

Thanks,
Lance


      reply	other threads:[~2025-10-14 12:25 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-30  8:10 [PATCH v5 1/1] mm/rmap: fix soft-dirty and uffd-wp bit loss when remapping zero-filled mTHP subpage to shared zeropage Lance Yang
2025-09-30 15:00 ` Zi Yan
2025-10-02  1:14 ` Harry Yoo
2025-10-02 16:10 ` Liam R. Howlett
2025-10-14 11:19 ` Lorenzo Stoakes
2025-10-14 12:25   ` Lance Yang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d1cf240b-6e2f-453a-9119-23cfe5480f84@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=byungchul@sk.com \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=gourry@gourry.net \
    --cc=harry.yoo@oracle.com \
    --cc=ioworker0@gmail.com \
    --cc=jannh@google.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=matthew.brost@intel.com \
    --cc=npache@redhat.com \
    --cc=peterx@redhat.com \
    --cc=rakie.kim@sk.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=stable@vger.kernel.org \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.