All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <muchun.song@linux.dev>
Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v1 3/4] mm/memory: Use ptep_get_lockless_norecency() for orig_pte
Date: Wed, 27 Mar 2024 18:05:52 +0100	[thread overview]
Message-ID: <44fc26ae-de7d-4bed-af7b-bcd2c593c676@redhat.com> (raw)
In-Reply-To: <de03fcd0-53fe-4672-b148-7a5eda19be03@arm.com>

On 27.03.24 10:51, Ryan Roberts wrote:
> On 26/03/2024 17:58, David Hildenbrand wrote:
>>>>>>
>>>>>> vmf->orig_pte = ptep_get_lockless_norecency(vmf->pte)
>>>>>> /* not dirty */
>>>>>>
>>>>>> /* Now, thread 2 ends up setting the PTE dirty under PT lock. */
>>>
>>> Ahh, this comment about thread 2 is not referring to the code immediately below
>>> it. It all makes much more sense now. :)
>>
>> Sorry :)
>>
>>>
>>>>>>
>>>>>> spin_lock(vmf->ptl);
>>>>>> entry = vmf->orig_pte;
>>>>>> if (unlikely(!pte_same(ptep_get(vmf->pte), entry))) {
>>>>>>        ...
>>>>>> }
>>>>>> ...
>>>>>> entry = pte_mkyoung(entry);
>>>>>
>>>>> Do you mean pte_mkdirty() here? You're talking about dirty everywhere else.
>>>>
>>>> No, that is just thread 1 seeing "oh, nothing to do" and then goes ahead and
>>>> unconditionally does that in handle_pte_fault().
>>>>
>>>>>
>>>>>> if (ptep_set_access_flags(vmf->vma, ...)
>>>>>> ...
>>>>>> pte_unmap_unlock(vmf->pte, vmf->ptl);
>>>>>>
>>>>>>
>>>>>> Generic ptep_set_access_flags() will do another pte_same() check and realize
>>>>>> "hey, there was a change!" let's update the PTE!
>>>>>>
>>>>>> set_pte_at(vma->vm_mm, address, ptep, entry);
>>>>>
>>>>> This is called from the generic ptep_set_access_flags() in your example, right?
>>>>>
>>>>
>>>> Yes.
>>>>
>>>>>>
>>>>>> would overwrite the dirty bit set by thread 2.
>>>>>
>>>>> I'm not really sure what you are getting at... Is your concern that there is a
>>>>> race where the page could become dirty in the meantime and it now gets lost? I
>>>>> think that's why arm64 overrides ptep_set_access_flags(); since the hw can
>>>>> update access/dirty we have to deal with the races.
>>>>
>>>> My concern is that your patch can in subtle ways lead to use losing PTE dirty
>>>> bits on architectures that don't have the HW-managed dirty bit. They do exist ;)
>>>
>>> But I think the example you give can already happen today? Thread 1 reads
>>> orig_pte = ptep_get_lockless(). So that's already racy, if thread 2 is going to
>>> set dirty just after the get, then thread 1 is going to set the PTE back to (a
>>> modified version of) orig_pte. Isn't it already broken?
>>
>> No, because the pte_same() check under PTL would have detected it, and we would
>> have backed out. And I think the problem comes to live when we convert
>> pte_same()->pte_same_norecency(), because we fail to protect PTE access/dirty
>> changes that happend under PTL from another thread.
> 
> Ahh yep. Got it. I absolutely knew that you would be correct, but I still walked
> right into it!
> 
> I think one could argue that the generic ptep_set_access_flags() is not
> implementing its own spec:
> 
> "
> ... Only sets the access flags (dirty, accessed), as well as write permission.
> Furthermore, we know it always gets set to a "more permissive" setting ...
> "
> 
> Surely it should be folding the access and dirty bits from *ptep into entry if
> they are set?

Likely yes. Unless it's also used to clear access/dirty (don't think so, 
and would not be documented).

But the simplification made sense for now, because you previously 
checked that pte_same(), and nobody can modify it concurrently.

> 
> Regardless, I think this example proves that its fragile and subtle. I'm not
> really sure how to fix it more generally/robustly. Any thoughts? If not perhaps
> we are better off keeping ptep_get_lockless() around and only using
> ptep_get_lockless_norecency() for the really obviously correct cases?

Maybe one of the "sources of problems" is that we have a 
ptep_get_lockless_norecency() call followed by a pte_same() check, like 
done here.

Not the source of all problems I believe, though ...

-- 
Cheers,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <muchun.song@linux.dev>
Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v1 3/4] mm/memory: Use ptep_get_lockless_norecency() for orig_pte
Date: Wed, 27 Mar 2024 18:05:52 +0100	[thread overview]
Message-ID: <44fc26ae-de7d-4bed-af7b-bcd2c593c676@redhat.com> (raw)
In-Reply-To: <de03fcd0-53fe-4672-b148-7a5eda19be03@arm.com>

On 27.03.24 10:51, Ryan Roberts wrote:
> On 26/03/2024 17:58, David Hildenbrand wrote:
>>>>>>
>>>>>> vmf->orig_pte = ptep_get_lockless_norecency(vmf->pte)
>>>>>> /* not dirty */
>>>>>>
>>>>>> /* Now, thread 2 ends up setting the PTE dirty under PT lock. */
>>>
>>> Ahh, this comment about thread 2 is not referring to the code immediately below
>>> it. It all makes much more sense now. :)
>>
>> Sorry :)
>>
>>>
>>>>>>
>>>>>> spin_lock(vmf->ptl);
>>>>>> entry = vmf->orig_pte;
>>>>>> if (unlikely(!pte_same(ptep_get(vmf->pte), entry))) {
>>>>>>        ...
>>>>>> }
>>>>>> ...
>>>>>> entry = pte_mkyoung(entry);
>>>>>
>>>>> Do you mean pte_mkdirty() here? You're talking about dirty everywhere else.
>>>>
>>>> No, that is just thread 1 seeing "oh, nothing to do" and then goes ahead and
>>>> unconditionally does that in handle_pte_fault().
>>>>
>>>>>
>>>>>> if (ptep_set_access_flags(vmf->vma, ...)
>>>>>> ...
>>>>>> pte_unmap_unlock(vmf->pte, vmf->ptl);
>>>>>>
>>>>>>
>>>>>> Generic ptep_set_access_flags() will do another pte_same() check and realize
>>>>>> "hey, there was a change!" let's update the PTE!
>>>>>>
>>>>>> set_pte_at(vma->vm_mm, address, ptep, entry);
>>>>>
>>>>> This is called from the generic ptep_set_access_flags() in your example, right?
>>>>>
>>>>
>>>> Yes.
>>>>
>>>>>>
>>>>>> would overwrite the dirty bit set by thread 2.
>>>>>
>>>>> I'm not really sure what you are getting at... Is your concern that there is a
>>>>> race where the page could become dirty in the meantime and it now gets lost? I
>>>>> think that's why arm64 overrides ptep_set_access_flags(); since the hw can
>>>>> update access/dirty we have to deal with the races.
>>>>
>>>> My concern is that your patch can in subtle ways lead to use losing PTE dirty
>>>> bits on architectures that don't have the HW-managed dirty bit. They do exist ;)
>>>
>>> But I think the example you give can already happen today? Thread 1 reads
>>> orig_pte = ptep_get_lockless(). So that's already racy, if thread 2 is going to
>>> set dirty just after the get, then thread 1 is going to set the PTE back to (a
>>> modified version of) orig_pte. Isn't it already broken?
>>
>> No, because the pte_same() check under PTL would have detected it, and we would
>> have backed out. And I think the problem comes to live when we convert
>> pte_same()->pte_same_norecency(), because we fail to protect PTE access/dirty
>> changes that happend under PTL from another thread.
> 
> Ahh yep. Got it. I absolutely knew that you would be correct, but I still walked
> right into it!
> 
> I think one could argue that the generic ptep_set_access_flags() is not
> implementing its own spec:
> 
> "
> ... Only sets the access flags (dirty, accessed), as well as write permission.
> Furthermore, we know it always gets set to a "more permissive" setting ...
> "
> 
> Surely it should be folding the access and dirty bits from *ptep into entry if
> they are set?

Likely yes. Unless it's also used to clear access/dirty (don't think so, 
and would not be documented).

But the simplification made sense for now, because you previously 
checked that pte_same(), and nobody can modify it concurrently.

> 
> Regardless, I think this example proves that its fragile and subtle. I'm not
> really sure how to fix it more generally/robustly. Any thoughts? If not perhaps
> we are better off keeping ptep_get_lockless() around and only using
> ptep_get_lockless_norecency() for the really obviously correct cases?

Maybe one of the "sources of problems" is that we have a 
ptep_get_lockless_norecency() call followed by a pte_same() check, like 
done here.

Not the source of all problems I believe, though ...

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-03-27 17:06 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-15 12:17 [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64 Ryan Roberts
2024-02-15 12:17 ` Ryan Roberts
2024-02-15 12:17 ` [RFC PATCH v1 1/4] mm: Introduce ptep_get_lockless_norecency() Ryan Roberts
2024-02-15 12:17   ` Ryan Roberts
2024-03-26 16:27   ` David Hildenbrand
2024-03-26 16:27     ` David Hildenbrand
2024-03-26 16:39     ` Ryan Roberts
2024-03-26 16:39       ` Ryan Roberts
2024-03-27  9:28       ` David Hildenbrand
2024-03-27  9:28         ` David Hildenbrand
2024-03-27  9:57         ` Ryan Roberts
2024-03-27  9:57           ` Ryan Roberts
2024-03-27 17:02           ` David Hildenbrand
2024-03-27 17:02             ` David Hildenbrand
2024-02-15 12:17 ` [RFC PATCH v1 2/4] mm/gup: Use ptep_get_lockless_norecency() Ryan Roberts
2024-02-15 12:17   ` Ryan Roberts
2024-03-26 16:30   ` David Hildenbrand
2024-03-26 16:30     ` David Hildenbrand
2024-03-26 16:48     ` Ryan Roberts
2024-03-26 16:48       ` Ryan Roberts
2024-02-15 12:17 ` [RFC PATCH v1 3/4] mm/memory: Use ptep_get_lockless_norecency() for orig_pte Ryan Roberts
2024-02-15 12:17   ` Ryan Roberts
2024-03-26 17:02   ` David Hildenbrand
2024-03-26 17:02     ` David Hildenbrand
2024-03-26 17:27     ` Ryan Roberts
2024-03-26 17:27       ` Ryan Roberts
2024-03-26 17:38       ` David Hildenbrand
2024-03-26 17:38         ` David Hildenbrand
2024-03-26 17:48         ` Ryan Roberts
2024-03-26 17:48           ` Ryan Roberts
2024-03-26 17:58           ` David Hildenbrand
2024-03-26 17:58             ` David Hildenbrand
2024-03-27  9:51             ` Ryan Roberts
2024-03-27  9:51               ` Ryan Roberts
2024-03-27 17:05               ` David Hildenbrand [this message]
2024-03-27 17:05                 ` David Hildenbrand
2024-02-15 12:17 ` [RFC PATCH v1 4/4] arm64/mm: Override ptep_get_lockless_norecency() Ryan Roberts
2024-02-15 12:17   ` Ryan Roberts
2024-03-26 16:35   ` David Hildenbrand
2024-03-26 16:35     ` David Hildenbrand
2024-03-26 16:17 ` [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64 David Hildenbrand
2024-03-26 16:17   ` David Hildenbrand
2024-03-26 16:31   ` Ryan Roberts
2024-03-26 16:31     ` Ryan Roberts
2024-03-26 16:34     ` David Hildenbrand
2024-03-26 16:34       ` David Hildenbrand
2024-03-26 16:53       ` Ryan Roberts
2024-03-26 16:53         ` Ryan Roberts
2024-03-26 17:04         ` David Hildenbrand
2024-03-26 17:04           ` David Hildenbrand
2024-03-26 17:32           ` Ryan Roberts
2024-03-26 17:32             ` Ryan Roberts
2024-03-26 17:39             ` David Hildenbrand
2024-03-26 17:39               ` David Hildenbrand
2024-03-26 17:51               ` Ryan Roberts
2024-03-26 17:51                 ` Ryan Roberts
2024-03-27  9:34                 ` David Hildenbrand
2024-03-27  9:34                   ` David Hildenbrand
2024-03-27 10:01                   ` Ryan Roberts
2024-03-27 10:01                     ` Ryan Roberts
2024-04-03 12:59                   ` Ryan Roberts
2024-04-03 12:59                     ` Ryan Roberts
2024-04-08  8:36                     ` David Hildenbrand
2024-04-08  8:36                       ` David Hildenbrand
2024-04-09 16:35                       ` Ryan Roberts
2024-04-09 16:35                         ` Ryan Roberts
2024-04-10 20:09                         ` David Hildenbrand
2024-04-10 20:09                           ` David Hildenbrand
2024-04-11  9:45                           ` Ryan Roberts
2024-04-11  9:45                             ` Ryan Roberts
2024-04-12 20:16                             ` David Hildenbrand
2024-04-12 20:16                               ` David Hildenbrand
2024-04-15  9:28                               ` Ryan Roberts
2024-04-15  9:28                                 ` Ryan Roberts
2024-04-15 10:57                                 ` David Hildenbrand
2024-04-15 10:57                                   ` David Hildenbrand
2024-04-15 13:30                                   ` Ryan Roberts
2024-04-15 13:30                                     ` Ryan Roberts
2024-04-15 14:23                                     ` David Hildenbrand
2024-04-15 14:23                                       ` David Hildenbrand
2024-04-15 14:34                                       ` Ryan Roberts
2024-04-15 14:34                                         ` Ryan Roberts
2024-04-15 14:58                                         ` David Hildenbrand
2024-04-15 14:58                                           ` David Hildenbrand
2024-04-15 15:17                                           ` Ryan Roberts
2024-04-15 15:17                                             ` Ryan Roberts
2024-04-15 15:22                                             ` David Hildenbrand
2024-04-15 15:22                                               ` David Hildenbrand
2024-04-15 15:53                                               ` Ryan Roberts
2024-04-15 15:53                                                 ` Ryan Roberts
2024-04-15 16:02                                                 ` David Hildenbrand
2024-04-15 16:02                                                   ` David Hildenbrand
2024-04-23 10:15                                                   ` Ryan Roberts
2024-04-23 10:15                                                     ` Ryan Roberts
2024-04-23 10:18                                                     ` David Hildenbrand
2024-04-23 10:18                                                       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44fc26ae-de7d-4bed-af7b-bcd2c593c676@redhat.com \
    --to=david@redhat.com \
    --cc=adrian.hunter@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=catalin.marinas@arm.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=muchun.song@linux.dev \
    --cc=ryan.roberts@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.