All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mark Rutland <mark.rutland@arm.com>,
	John Hubbard <jhubbard@nvidia.com>,
	linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org
Subject: Re: [PATCH 2/2] arm64/mm: Improve comment in contpte_ptep_get_lockless()
Date: Fri, 1 Mar 2024 18:47:46 +0000	[thread overview]
Message-ID: <ZeIi0io3tcPKMpnT@arm.com> (raw)
In-Reply-To: <20240226120321.1055731-3-ryan.roberts@arm.com>

On Mon, Feb 26, 2024 at 12:03:21PM +0000, Ryan Roberts wrote:
> Make clear the atmicity/consistency requirements of the API and how we
> achieve them.
> 
> Link: https://lore.kernel.org/linux-mm/Zc-Tqqfksho3BHmU@arm.com/
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/mm/contpte.c | 24 ++++++++++++++----------
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
> index be0a226c4ff9..1b64b4c3f8bf 100644
> --- a/arch/arm64/mm/contpte.c
> +++ b/arch/arm64/mm/contpte.c
> @@ -183,16 +183,20 @@ EXPORT_SYMBOL_GPL(contpte_ptep_get);
>  pte_t contpte_ptep_get_lockless(pte_t *orig_ptep)
>  {
>  	/*
> -	 * Gather access/dirty bits, which may be populated in any of the ptes
> -	 * of the contig range. We may not be holding the PTL, so any contiguous
> -	 * range may be unfolded/modified/refolded under our feet. Therefore we
> -	 * ensure we read a _consistent_ contpte range by checking that all ptes
> -	 * in the range are valid and have CONT_PTE set, that all pfns are
> -	 * contiguous and that all pgprots are the same (ignoring access/dirty).
> -	 * If we find a pte that is not consistent, then we must be racing with
> -	 * an update so start again. If the target pte does not have CONT_PTE
> -	 * set then that is considered consistent on its own because it is not
> -	 * part of a contpte range.
> +	 * The ptep_get_lockless() API requires us to read and return *orig_ptep
> +	 * so that it is self-consistent, without the PTL held, so we may be
> +	 * racing with other threads modifying the pte. Usually a READ_ONCE()
> +	 * would suffice, but for the contpte case, we also need to gather the
> +	 * access and dirty bits from across all ptes in the contiguous block,
> +	 * and we can't read all of those neighbouring ptes atomically, so any
> +	 * contiguous range may be unfolded/modified/refolded under our feet.
> +	 * Therefore we ensure we read a _consistent_ contpte range by checking
> +	 * that all ptes in the range are valid and have CONT_PTE set, that all
> +	 * pfns are contiguous and that all pgprots are the same (ignoring
> +	 * access/dirty). If we find a pte that is not consistent, then we must
> +	 * be racing with an update so start again. If the target pte does not
> +	 * have CONT_PTE set then that is considered consistent on its own
> +	 * because it is not part of a contpte range.
>  	 */

I haven't had the time to properly think about this function but,
depending on what its semantics are, we might not guarantee that, at the
time of reading a pte, we have the correct dirty state from the other
ptes in the range.

Theoretical: let's say we read the first pte in the contig range and
it's clean but further down there's a dirty one. Another (v)CPU breaks
the contig range, sets the dirty bit everywhere, there's some
pte_mkclean for all of them and they are collapsed into a contig range
again. The function above on the first (v)CPU returns a clean pte when
it should have actually been dirty at the time of read.

Throughout the callers of this function, I couldn't find one where it
matters. So I concluded that they don't need the dirty state. Normally
the dirty state is passed to the page flags, so not lost after the pte
has been cleaned.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mark Rutland <mark.rutland@arm.com>,
	John Hubbard <jhubbard@nvidia.com>,
	linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org
Subject: Re: [PATCH 2/2] arm64/mm: Improve comment in contpte_ptep_get_lockless()
Date: Fri, 1 Mar 2024 18:47:46 +0000	[thread overview]
Message-ID: <ZeIi0io3tcPKMpnT@arm.com> (raw)
In-Reply-To: <20240226120321.1055731-3-ryan.roberts@arm.com>

On Mon, Feb 26, 2024 at 12:03:21PM +0000, Ryan Roberts wrote:
> Make clear the atmicity/consistency requirements of the API and how we
> achieve them.
> 
> Link: https://lore.kernel.org/linux-mm/Zc-Tqqfksho3BHmU@arm.com/
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/mm/contpte.c | 24 ++++++++++++++----------
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
> index be0a226c4ff9..1b64b4c3f8bf 100644
> --- a/arch/arm64/mm/contpte.c
> +++ b/arch/arm64/mm/contpte.c
> @@ -183,16 +183,20 @@ EXPORT_SYMBOL_GPL(contpte_ptep_get);
>  pte_t contpte_ptep_get_lockless(pte_t *orig_ptep)
>  {
>  	/*
> -	 * Gather access/dirty bits, which may be populated in any of the ptes
> -	 * of the contig range. We may not be holding the PTL, so any contiguous
> -	 * range may be unfolded/modified/refolded under our feet. Therefore we
> -	 * ensure we read a _consistent_ contpte range by checking that all ptes
> -	 * in the range are valid and have CONT_PTE set, that all pfns are
> -	 * contiguous and that all pgprots are the same (ignoring access/dirty).
> -	 * If we find a pte that is not consistent, then we must be racing with
> -	 * an update so start again. If the target pte does not have CONT_PTE
> -	 * set then that is considered consistent on its own because it is not
> -	 * part of a contpte range.
> +	 * The ptep_get_lockless() API requires us to read and return *orig_ptep
> +	 * so that it is self-consistent, without the PTL held, so we may be
> +	 * racing with other threads modifying the pte. Usually a READ_ONCE()
> +	 * would suffice, but for the contpte case, we also need to gather the
> +	 * access and dirty bits from across all ptes in the contiguous block,
> +	 * and we can't read all of those neighbouring ptes atomically, so any
> +	 * contiguous range may be unfolded/modified/refolded under our feet.
> +	 * Therefore we ensure we read a _consistent_ contpte range by checking
> +	 * that all ptes in the range are valid and have CONT_PTE set, that all
> +	 * pfns are contiguous and that all pgprots are the same (ignoring
> +	 * access/dirty). If we find a pte that is not consistent, then we must
> +	 * be racing with an update so start again. If the target pte does not
> +	 * have CONT_PTE set then that is considered consistent on its own
> +	 * because it is not part of a contpte range.
>  	 */

I haven't had the time to properly think about this function but,
depending on what its semantics are, we might not guarantee that, at the
time of reading a pte, we have the correct dirty state from the other
ptes in the range.

Theoretical: let's say we read the first pte in the contig range and
it's clean but further down there's a dirty one. Another (v)CPU breaks
the contig range, sets the dirty bit everywhere, there's some
pte_mkclean for all of them and they are collapsed into a contig range
again. The function above on the first (v)CPU returns a clean pte when
it should have actually been dirty at the time of read.

Throughout the callers of this function, I couldn't find one where it
matters. So I concluded that they don't need the dirty state. Normally
the dirty state is passed to the page flags, so not lost after the pte
has been cleaned.

-- 
Catalin


  parent reply	other threads:[~2024-03-01 18:48 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-26 12:03 [PATCH 0/2] Address some contpte nits Ryan Roberts
2024-02-26 12:03 ` Ryan Roberts
2024-02-26 12:03 ` [PATCH 1/2] arm64/mm: Export contpte symbols only to GPL users Ryan Roberts
2024-02-26 12:03   ` Ryan Roberts
2024-02-26 12:25   ` David Hildenbrand
2024-02-26 12:25     ` David Hildenbrand
2024-02-26 12:40     ` Ryan Roberts
2024-02-26 12:40       ` Ryan Roberts
2024-02-27  2:49   ` John Hubbard
2024-02-27  2:49     ` John Hubbard
2024-03-04 17:38   ` Catalin Marinas
2024-03-04 17:38     ` Catalin Marinas
2024-02-26 12:03 ` [PATCH 2/2] arm64/mm: Improve comment in contpte_ptep_get_lockless() Ryan Roberts
2024-02-26 12:03   ` Ryan Roberts
2024-02-26 12:30   ` David Hildenbrand
2024-02-26 12:30     ` David Hildenbrand
2024-02-26 12:37     ` Ryan Roberts
2024-02-26 12:37       ` Ryan Roberts
2024-02-26 12:40       ` David Hildenbrand
2024-02-26 12:40         ` David Hildenbrand
2024-02-27 23:45   ` John Hubbard
2024-02-27 23:45     ` John Hubbard
2024-03-01 18:47   ` Catalin Marinas [this message]
2024-03-01 18:47     ` Catalin Marinas
2024-03-04 12:54     ` Ryan Roberts
2024-03-04 12:54       ` Ryan Roberts
2024-03-04 17:37       ` Catalin Marinas
2024-03-04 17:37         ` Catalin Marinas
2024-03-04 18:40         ` Ryan Roberts
2024-03-04 18:40           ` Ryan Roberts
2024-03-04 22:04           ` David Hildenbrand
2024-03-04 22:04             ` David Hildenbrand
2024-03-05  9:13             ` Ryan Roberts
2024-03-05  9:13               ` Ryan Roberts
2024-03-05  9:14             ` Ryan Roberts
2024-03-05  9:14               ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZeIi0io3tcPKMpnT@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=jhubbard@nvidia.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=ryan.roberts@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.