All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: David Hildenbrand <david@redhat.com>
Cc: ziy@nvidia.com, baolin.wang@linux.alibaba.com, baohua@kernel.org,
	ryan.roberts@arm.com, dev.jain@arm.com, npache@redhat.com,
	riel@surriel.com, Liam.Howlett@oracle.com, vbabka@suse.cz,
	harry.yoo@oracle.com, jannh@google.com, matthew.brost@intel.com,
	joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
	gourry@gourry.net, ying.huang@linux.alibaba.com,
	apopple@nvidia.com, usamaarif642@gmail.com, yuzhao@google.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	ioworker0@gmail.com, stable@vger.kernel.org,
	akpm@linux-foundation.org, lorenzo.stoakes@oracle.com
Subject: Re: [PATCH 1/1] mm/rmap: fix soft-dirty bit loss when remapping zero-filled mTHP subpage to shared zeropage
Date: Mon, 29 Sep 2025 19:29:45 +0800	[thread overview]
Message-ID: <1f66374a-a901-49e7-95c8-96b1e5a5f22d@linux.dev> (raw)
In-Reply-To: <900d0314-8e9a-4779-a058-9bb3cc8840b8@linux.dev>



On 2025/9/29 18:29, Lance Yang wrote:
> 
> 
> On 2025/9/29 15:25, David Hildenbrand wrote:
>> On 28.09.25 06:48, Lance Yang wrote:
>>> From: Lance Yang <lance.yang@linux.dev>
>>>
>>> When splitting an mTHP and replacing a zero-filled subpage with the 
>>> shared
>>> zeropage, try_to_map_unused_to_zeropage() currently drops the soft-dirty
>>> bit.
>>>
>>> For userspace tools like CRIU, which rely on the soft-dirty mechanism 
>>> for
>>> incremental snapshots, losing this bit means modified pages are missed,
>>> leading to inconsistent memory state after restore.
>>>
>>> Preserve the soft-dirty bit from the old PTE when creating the zeropage
>>> mapping to ensure modified pages are correctly tracked.
>>>
>>> Cc: <stable@vger.kernel.org>
>>> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage 
>>> when splitting isolated thp")
>>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>>> ---
>>>   mm/migrate.c | 4 ++++
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index ce83c2c3c287..bf364ba07a3f 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -322,6 +322,10 @@ static bool try_to_map_unused_to_zeropage(struct 
>>> page_vma_mapped_walk *pvmw,
>>>       newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address),
>>>                       pvmw->vma->vm_page_prot));
>>> +
>>> +    if (pte_swp_soft_dirty(ptep_get(pvmw->pte)))
>>> +        newpte = pte_mksoft_dirty(newpte);
>>> +
>>>       set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte);
>>>       dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio));
>>
>> It's interesting that there isn't a single occurrence of the stof- 
>> dirty flag in khugepaged code. I guess it all works because we do the
>>
>>      _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma);
>>
>> and the pmd_mkdirty() will imply marking it soft-dirty.
>>
>> Now to the problem at hand: I don't think this is particularly 
>> problematic in the common case: if the page is zero, it likely was 
>> never written to (that's what the unerused shrinker is targeted at), 
>> so the soft-dirty setting on the PMD is actually just an over- 
>> indication for this page.
> 
> Cool. Thanks for the insight! Good to know that ;)
> 
>>
>> For example, when we just install the shared zeropage directly in 
>> do_anonymous_page(), we obviously also don't set it dirty/soft-dirty.
>>
>> Now, one could argue that if the content was changed from non-zero to 
>> zero, it ould actually be soft-dirty.
> 
> Exactly. A false negative could be a problem for the userspace tools, IMO.
> 
>>
>> Long-story short: I don't think this matters much in practice, but 
>> it's an easy fix.
>>
>> As said by dev, please avoid double ptep_get() if possible.
> 
> Sure, will do. I'll refactor it in the next version.
> 
>>
>> Acked-by: David Hildenbrand <david@redhat.com>
> 
> Thanks!
> 
>>
>>
>> @Lance, can you double-check that the uffd-wp bit is handled 
>> correctly? I strongly assume we lose that as well here.

Yes, the uffd-wp bit was indeed being dropped, but ...

The shared zeropage is read-only, which triggers a fault. IIUC,
The kernel then falls back to checking the VM_UFFD_WP flag on
the VMA and correctly generates a uffd-wp event, masking the
fact that the uffd-wp bit on the PTE was lost.

IMHO, explicitly preserving the uffd-wp bit on the PTE is still
necessary, since we're not sure if losing that bit is safe in
all cases :)

> 
> Certainly, I'll check the uffd-wp bit as well and get back to you soon.
> 
> Cheers,
> Lance



  reply	other threads:[~2025-09-29 11:30 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-28  4:48 [PATCH 1/1] mm/rmap: fix soft-dirty bit loss when remapping zero-filled mTHP subpage to shared zeropage Lance Yang
2025-09-29  4:44 ` Dev Jain
2025-09-29 10:15   ` Lance Yang
2025-09-29  7:25 ` David Hildenbrand
2025-09-29 10:29   ` Lance Yang
2025-09-29 11:29     ` Lance Yang [this message]
2025-09-29 12:08       ` David Hildenbrand
2025-09-29 13:22         ` Lance Yang
2025-09-29 16:11           ` David Hildenbrand
2025-09-30  1:53             ` Lance Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1f66374a-a901-49e7-95c8-96b1e5a5f22d@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=byungchul@sk.com \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=gourry@gourry.net \
    --cc=harry.yoo@oracle.com \
    --cc=ioworker0@gmail.com \
    --cc=jannh@google.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=matthew.brost@intel.com \
    --cc=npache@redhat.com \
    --cc=rakie.kim@sk.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=stable@vger.kernel.org \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.