From: David Hildenbrand <david@redhat.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
Mark Rutland <mark.rutland@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Muchun Song <muchun.song@linux.dev>
Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v1 3/4] mm/memory: Use ptep_get_lockless_norecency() for orig_pte
Date: Tue, 26 Mar 2024 18:38:01 +0100 [thread overview]
Message-ID: <f6996c4f-da60-4267-bcf1-09e4fd40c91b@redhat.com> (raw)
In-Reply-To: <febd0c97-8869-4ce5-bd37-cbbdf5be0a43@arm.com>
On 26.03.24 18:27, Ryan Roberts wrote:
> On 26/03/2024 17:02, David Hildenbrand wrote:
>> On 15.02.24 13:17, Ryan Roberts wrote:
>>> Let's convert handle_pte_fault()'s use of ptep_get_lockless() to
>>> ptep_get_lockless_norecency() to save orig_pte.
>>>
>>> There are a number of places that follow this model:
>>>
>>> orig_pte = ptep_get_lockless(ptep)
>>> ...
>>> <lock>
>>> if (!pte_same(orig_pte, ptep_get(ptep)))
>>> // RACE!
>>> ...
>>> <unlock>
>>>
>>> So we need to be careful to convert all of those to use
>>> pte_same_norecency() so that the access and dirty bits are excluded from
>>> the comparison.
>>>
>>> Additionally there are a couple of places that genuinely rely on the
>>> access and dirty bits of orig_pte, but with some careful refactoring, we
>>> can use ptep_get() once we are holding the lock to achieve equivalent
>>> logic.
>>
>> We really should document that changed behavior somewhere where it can be easily
>> found: that orig_pte might have incomplete/stale accessed/dirty information.
>
> I could add it to the orig_pte definition in the `struct vm_fault`?
>
>>
>>
>>> @@ -5343,7 +5356,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>>> vmf->address, &vmf->ptl);
>>> if (unlikely(!vmf->pte))
>>> return 0;
>>> - vmf->orig_pte = ptep_get_lockless(vmf->pte);
>>> + vmf->orig_pte = ptep_get_lockless_norecency(vmf->pte);
>>> vmf->flags |= FAULT_FLAG_ORIG_PTE_VALID;
>>>
>>> if (pte_none(vmf->orig_pte)) {
>>> @@ -5363,7 +5376,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>>>
>>> spin_lock(vmf->ptl);
>>> entry = vmf->orig_pte;
>>> - if (unlikely(!pte_same(ptep_get(vmf->pte), entry))) {
>>> + if (unlikely(!pte_same_norecency(ptep_get(vmf->pte), entry))) {
>>> update_mmu_tlb(vmf->vma, vmf->address, vmf->pte);
>>> goto unlock;
>>
>> I was wondering about the following:
>>
>> Assume the PTE is not dirty.
>>
>> Thread 1 does
>
> Sorry not sure what threads have to do with this? How is the vmf shared between
> threads? What have I misunderstood...
Assume we have a HW that does not have HW-managed access/dirty bits. One
that ends up using generic ptep_set_access_flags(). Access/dirty bits
are always updated under PT lock.
Then, imagine two threads. One is the the fault path here. another
thread performs some other magic that sets the PTE dirty under PTL.
>
>>
>> vmf->orig_pte = ptep_get_lockless_norecency(vmf->pte)
>> /* not dirty */
>>
>> /* Now, thread 2 ends up setting the PTE dirty under PT lock. */
>>
>> spin_lock(vmf->ptl);
>> entry = vmf->orig_pte;
>> if (unlikely(!pte_same(ptep_get(vmf->pte), entry))) {
>> ...
>> }
>> ...
>> entry = pte_mkyoung(entry);
>
> Do you mean pte_mkdirty() here? You're talking about dirty everywhere else.
No, that is just thread 1 seeing "oh, nothing to do" and then goes ahead
and unconditionally does that in handle_pte_fault().
>
>> if (ptep_set_access_flags(vmf->vma, ...)
>> ...
>> pte_unmap_unlock(vmf->pte, vmf->ptl);
>>
>>
>> Generic ptep_set_access_flags() will do another pte_same() check and realize
>> "hey, there was a change!" let's update the PTE!
>>
>> set_pte_at(vma->vm_mm, address, ptep, entry);
>
> This is called from the generic ptep_set_access_flags() in your example, right?
>
Yes.
>>
>> would overwrite the dirty bit set by thread 2.
>
> I'm not really sure what you are getting at... Is your concern that there is a
> race where the page could become dirty in the meantime and it now gets lost? I
> think that's why arm64 overrides ptep_set_access_flags(); since the hw can
> update access/dirty we have to deal with the races.
My concern is that your patch can in subtle ways lead to use losing PTE
dirty bits on architectures that don't have the HW-managed dirty bit.
They do exist ;)
Arm64 should be fine in that regard.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-03-26 17:38 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-15 12:17 [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64 Ryan Roberts
2024-02-15 12:17 ` [RFC PATCH v1 1/4] mm: Introduce ptep_get_lockless_norecency() Ryan Roberts
[not found] ` <7aefa967-43aa-490b-ae0d-7d1455402e89@redhat.com>
2024-03-26 16:39 ` Ryan Roberts
2024-03-27 9:28 ` David Hildenbrand
2024-03-27 9:57 ` Ryan Roberts
2024-03-27 17:02 ` David Hildenbrand
2024-02-15 12:17 ` [RFC PATCH v1 2/4] mm/gup: Use ptep_get_lockless_norecency() Ryan Roberts
2024-03-26 16:30 ` David Hildenbrand
2024-03-26 16:48 ` Ryan Roberts
2024-02-15 12:17 ` [RFC PATCH v1 3/4] mm/memory: Use ptep_get_lockless_norecency() for orig_pte Ryan Roberts
2024-03-26 17:02 ` David Hildenbrand
2024-03-26 17:27 ` Ryan Roberts
2024-03-26 17:38 ` David Hildenbrand [this message]
2024-03-26 17:48 ` Ryan Roberts
2024-03-26 17:58 ` David Hildenbrand
2024-03-27 9:51 ` Ryan Roberts
2024-03-27 17:05 ` David Hildenbrand
2024-02-15 12:17 ` [RFC PATCH v1 4/4] arm64/mm: Override ptep_get_lockless_norecency() Ryan Roberts
2024-03-26 16:35 ` David Hildenbrand
2024-03-26 16:17 ` [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64 David Hildenbrand
2024-03-26 16:31 ` Ryan Roberts
[not found] ` <de143212-49ce-4c30-8bfa-4c0ff613f107@redhat.com>
2024-03-26 16:53 ` Ryan Roberts
2024-03-26 17:04 ` David Hildenbrand
2024-03-26 17:32 ` Ryan Roberts
2024-03-26 17:39 ` David Hildenbrand
2024-03-26 17:51 ` Ryan Roberts
2024-03-27 9:34 ` David Hildenbrand
2024-03-27 10:01 ` Ryan Roberts
2024-04-03 12:59 ` Ryan Roberts
2024-04-08 8:36 ` David Hildenbrand
2024-04-09 16:35 ` Ryan Roberts
2024-04-10 20:09 ` David Hildenbrand
2024-04-11 9:45 ` Ryan Roberts
[not found] ` <70a36403-aefd-4311-b612-84e602465689@redhat.com>
2024-04-15 9:28 ` Ryan Roberts
[not found] ` <3e50030d-2289-4470-a727-a293baa21618@redhat.com>
2024-04-15 13:30 ` Ryan Roberts
[not found] ` <969dc6c3-2764-4a35-9fa6-7596832fb2a3@redhat.com>
2024-04-15 14:34 ` Ryan Roberts
[not found] ` <11b1c25b-3e20-4acf-9be5-57b508266c5b@redhat.com>
2024-04-15 15:17 ` Ryan Roberts
2024-04-15 15:22 ` David Hildenbrand
2024-04-15 15:53 ` Ryan Roberts
2024-04-15 16:02 ` David Hildenbrand
2024-04-23 10:15 ` Ryan Roberts
2024-04-23 10:18 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f6996c4f-da60-4267-bcf1-09e4fd40c91b@redhat.com \
--to=david@redhat.com \
--cc=adrian.hunter@intel.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=catalin.marinas@arm.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mark.rutland@arm.com \
--cc=muchun.song@linux.dev \
--cc=ryan.roberts@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox