All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Liam.Howlett@oracle.com, akpm@linux-foundation.org,
	baohua@kernel.org, dev.jain@arm.com, harry.yoo@oracle.com,
	jannh@google.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, lorenzo.stoakes@oracle.com, riel@surriel.com,
	stable@kernel.org, vbabka@kernel.org
Subject: Re: [PATCH] mm/rmap: fix incorrect pte restoration for lazyfree folios
Date: Thu, 26 Feb 2026 18:28:23 +0800	[thread overview]
Message-ID: <dfd37e2c-d939-4682-a693-ae3941358fea@linux.dev> (raw)
In-Reply-To: <ece450e0-8e19-42bd-9a7a-fdfc7ab785e6@kernel.org>



On 2026/2/26 18:06, David Hildenbrand (Arm) wrote:
> On 2/26/26 08:09, Lance Yang wrote:
>>
>> On Tue, Feb 24, 2026 at 05:01:50PM +0100, David Hildenbrand (Arm) wrote:
>>> On 2/24/26 12:43, Lorenzo Stoakes wrote:
>>>>
>>>> Sorry I misread the original mail rushing through this is old... so this is less
>>>> pressing than I thought (for some reason I thought it was merged last cycle...!)
>>>> but it's a good example of how stuff can go unnoticed for a while.
>>>>
>>>> In that case maybe a revert is a bit much and we just want the simplest possible
>>>> fix for backporting.
>>>
>>> Dev volunteered to un-messify some of the stuff here. In particular, to
>>> extend batching to all cases, not just some hand-selected ones.
>>>
>>> Support for file folios is on the way.
>>>
>>>>
>>>> But is the proposed 'just assume wrprotect' sensible? David?
>>>
>>> In general, I think so. If PTEs were writable, they certainly have
>>> PAE set. The write-fault handler can fully recover from that (as PAE is
>>> set). If it's ever a performance problem (doubt), we can revisit.
>>>
>>> I'm wondering whether we should just perform the wrprotect earlier:
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 0f00570d1b9e..19b875ee3fad 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -2150,6 +2150,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>
>>>                         /* Nuke the page table entry. */
>>>                         pteval = get_and_clear_ptes(mm, address, pvmw.pte, nr_pages);
>>> +
>>> +                       /*
>>> +                        * Our batch might include writable and read-only
>>> +                        * PTEs. When we have to restore the mapping, just
>>> +                        * assume read-only to not accidentally upgrade
>>> +                        * write permissions for PTEs that must not be
>>> +                        * writable.
>>> +                        */
>>> +                       pteval = pte_wrprotect(pteval);
>>> +
>>>                         /*
>>>                          * We clear the PTE but do not flush so potentially
>>>                          * a remote CPU could still be writing to the folio
>>>
>>>
>>> Given that nobody asks for writability (pte_write()) later.
>>>
>>> Or does someone care?
>>>
>>> Staring at set_tlb_ubc_flush_pending()->pte_accessible() I am
>>> not 100% sure. Could pte_wrprotect() turn a PTE inaccessible on some
>>> architecture (write-only)? I don't think so.
>>>
>>>
>>> We have the following options:
>>>
>>> 1) pte_wrprotect(): fake that all was read-only.
>>>
>>> Either we do it like Dev suggests, or we do it as above early.
>>>
>>> The downside is that any code that might later want to know "was
>>> this possibly writable" would get that information. Well, it wouldn't
>>> get that information reliably *today* already (and that sounds a bit shaky).
>>
>> Makes sense to me :)
>>
>>> 2) Tell batching logic to honor pte_write()
>>>
>>> Sounds suboptimal for some cases that really don't care in the future.
>>>
>>> 3) Tell batching logic to tell us if any pte was writable: FPB_MERGE_WRITE
>>>
>>> ... then we know for sure whether any PTE was writable and we could
>>>
>>> (a) Pass it as we did before around to all checks, like pte_accessible().
>>>
>>> (b) Have an explicit restore PTE where we play save.
>>>
>>>
>>> I raised to Dev in private that softdirty handling is also shaky, as we
>>> batch over that. Meaning that we could lose or gain softdirty PTE bits in
>>> a batch.
>>
>> I guess we won't lose soft_dirty bits - only gain them (false positive):
>>
>> 1) get_and_clear_ptes() merges dirty bits from all PTEs via pte_mkdirty()
>> 2) pte_mkdirty() atomically sets both _PAGE_DIRTY and _PAGE_SOFT_DIRTY on
>> all architectures that support soft_dirty (x86, s390, powerpc, riscv)
>> 3) set_ptes() uses pte_advance_pfn() which keeps all flags intact
>>
>> So if any PTE in the batch was dirty, all PTEs become soft_dirty after
>> restore.
> 
> PTEs can be softdirty without being dirty. That over-complicates the
> situation.

Ah, it's even trickier then :D


      reply	other threads:[~2026-02-26 10:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24 11:09 [PATCH] mm/rmap: fix incorrect pte restoration for lazyfree folios Dev Jain
2026-02-24 11:31 ` Lorenzo Stoakes
2026-02-24 11:43   ` Lorenzo Stoakes
2026-02-24 16:01     ` David Hildenbrand (Arm)
2026-02-25  5:11       ` Dev Jain
2026-02-26 10:21         ` David Hildenbrand (Arm)
2026-02-26 10:27           ` Dev Jain
2026-02-26  7:01       ` Barry Song
2026-02-26 10:09         ` David Hildenbrand (Arm)
2026-02-26 10:20           ` Barry Song
2026-02-26  7:09       ` Lance Yang
2026-02-26 10:06         ` David Hildenbrand (Arm)
2026-02-26 10:28           ` Lance Yang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dfd37e2c-d939-4682-a693-ae3941358fea@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=riel@surriel.com \
    --cc=stable@kernel.org \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.