Linux-RISC-V Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: Samuel Holland <samuel.holland@sifive.com>,
	"David Hildenbrand (Red Hat)" <david@kernel.org>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <pjw@kernel.org>,
	linux-riscv@lists.infradead.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org
Cc: devicetree@vger.kernel.org,
	Suren Baghdasaryan <surenb@google.com>,
	linux-kernel@vger.kernel.org, Mike Rapoport <rppt@kernel.org>,
	Michal Hocko <mhocko@suse.com>, Conor Dooley <conor@kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Krzysztof Kozlowski <krzk+dt@kernel.org>,
	Alexandre Ghiti <alex@ghiti.fr>,
	Emil Renner Berthing <kernel@esmil.dk>,
	Rob Herring <robh+dt@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>
Subject: Re: [PATCH v3 08/22] mm: Allow page table accessors to be non-idempotent
Date: Thu, 11 Dec 2025 13:59:53 +0000	[thread overview]
Message-ID: <a063f6c5-2785-4a9f-8079-25edb3e54cef@arm.com> (raw)
In-Reply-To: <af8fd275-a34d-4b2f-8834-9c85dab2bbae@sifive.com>

On 11/12/2025 00:33, Samuel Holland wrote:
> On 2025-11-28 2:47 AM, David Hildenbrand (Red Hat) wrote:
>> On 11/27/25 17:57, Ryan Roberts wrote:
>>> On 13/11/2025 01:45, Samuel Holland wrote:
>>>> Currently, some functions such as pte_offset_map() are passed both
>>>> pointers to hardware page tables, and pointers to previously-read PMD
>>>> entries on the stack. To ensure correctness in the first case, these
>>>> functions must use the page table accessor function (pmdp_get()) to
>>>> dereference the supplied pointer. However, this means pmdp_get() is
>>>> called twice in the second case. This double call must be avoided if
>>>> pmdp_get() applies some non-idempotent transformation to the value.
>>>>
>>>> Avoid the double transformation by calling set_pmd() on the stack
>>>> variables where necessary to keep set_pmd()/pmdp_get() calls balanced.
>>>
>>> I don't think this is a good solution.
>>
>> Agreed,
>>
>>     set_pmd(&pmd, pmd);
>>
>> is rather horrible.
> I agree that this patch is ugly. The only way I see to avoid code like this is
> to refactor (or duplicate) the functions so no function takes pointers to both
> hardware page tables and on-stack page table entries. Is that sort of
> refactoring the right direction to go for v4?

From a quick look at the code, I think that some cases are solvable by
refactoring to pass the value instead of the pointer, and leave it to the higher
level decide how to read the value from the pointer - it knows if it is pointing
to HW pgtable or if it's a (e.g) stack value.

But the more I look at the code, the more instances I find where pointers to
stack variables are being passed to arch pgtable helpers as if they are HW
pgtable entry pointers. (Mainly pmd level).

I wonder if we need to bite the bullet and explicitly separate the types? At
each level, we have:

 1. page table entry value
 2. pointer to page table entry _value_ (e.g. pointer to pXX_t on stack)
 3. pointer to page table entry in HW pgtable

Today, 1 is represented by pte_t, pmd_t, etc. 2 and 3 are represented by the
same type; pte_t*, pmd_t*, etc.

If we create a new type for 3, it will both document and enforce when type 2 or
type 3 is required.

e.g:

// pte_t: defined by arch.
typedef unsigned long pte_t;

// ptep_t: new opaque type that can't be dereferenced.
struct __ptep_t;
typedef struct __ptep_t *ptep_t;

// getter/setter responsible for cast & deref as appropriate.
pte_t ptep_get(ptep_t ptep)
{
	return READ_ONCE(*(pte_t *)ptep);
}

int do_stuff(void)
{
	// value on stack: ok
	pte_t mypte;

	// pointer to value on stack: ok
	pte_t *pmypte = &mypte;

	// handle to entry on stack: not allowed by compiler!
	ptep_t myptep = &mypte;

	// handle to entry in pgtable: ok
	ptep_t myptep = pte_offset_kernel(...);

	// read value of pgtable entry: ok
	pte_t val = ptep_get(myptep);

	// attempt to pass pointer to stack variable: not allowed by compiler!
	pte_t val = ptep_get(&mypte);

	// attempt to directly dereference ptep: not allowed by compiler!
	pte_t val = *myptep;
}


We could do this incrementally by initially typedefing ptep_t to be:
typedef pte_t *ptep_t;

Then we could flip the switch arch-by-arch to enable the stronger checking. We
likely wouldn't need to convert arches that don't care.

I think by doing this, it will expose all the current issues and force us to fix
them properly.

On the related subject of conversion to pXXp_get(); I've been looking into this
and personally, I think we should have 2 helper flavours at each level:

 - pXXd_get()      optimizable by compiler; defaults to C dereference
 - pXXd_get_once() single-copy-atomic and unmovable by compiler

It simplifies the converstion process, and reduces the risk of bugs
significantly (go read about the arm32 issues discussed in Anshuman's series if
you haven't done already).

I appreciate this is all probably a lot more work than you would prefer to sign
up for, I'd be happy to collaborate if we get concensus that this approach makes
sense. What do you think?

Thanks,
Ryan


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2025-12-11 14:00 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-13  1:45 [PATCH v3 00/22] riscv: Memory type control for platforms with physical memory aliases Samuel Holland
2025-11-13  1:45 ` [PATCH v3 01/22] mm/ptdump: replace READ_ONCE() with standard page table accessors Samuel Holland
2025-11-13  1:45 ` [PATCH v3 02/22] mm: " Samuel Holland
2025-11-13  4:05   ` Dev Jain
2025-11-13  1:45 ` [PATCH v3 03/22] mm/dirty: replace READ_ONCE() with pudp_get() Samuel Holland
2025-11-13  1:45 ` [PATCH v3 04/22] perf/events: replace READ_ONCE() with standard page table accessors Samuel Holland
2025-11-13 19:10   ` David Hildenbrand (Red Hat)
2025-11-13  1:45 ` [PATCH v3 05/22] mm: Move the fallback definitions of pXXp_get() Samuel Holland
2025-11-13 19:11   ` David Hildenbrand (Red Hat)
2025-11-13  1:45 ` [PATCH v3 06/22] mm: Always use page table accessor functions Samuel Holland
2025-11-13  4:53   ` kernel test robot
2025-11-13  5:46   ` kernel test robot
2025-11-26 11:08   ` Christophe Leroy (CS GROUP)
2025-11-26 11:09   ` Ryan Roberts
2025-11-26 12:16     ` David Hildenbrand (Red Hat)
2025-11-26 12:19       ` David Hildenbrand (Red Hat)
2025-11-26 12:27         ` Lorenzo Stoakes
2025-11-26 12:35           ` David Hildenbrand (Red Hat)
2025-11-26 13:03             ` Ryan Roberts
2025-11-26 13:47               ` Wei Yang
2025-11-26 14:22                 ` Ryan Roberts
2025-11-26 14:37                   ` Lorenzo Stoakes
2025-11-26 14:53                     ` David Hildenbrand (Red Hat)
2025-11-26 14:46                   ` David Hildenbrand (Red Hat)
2025-11-26 14:52                     ` Lorenzo Stoakes
2025-11-26 14:56                       ` David Hildenbrand (Red Hat)
2025-11-26 15:08                         ` Lorenzo Stoakes
2025-11-26 15:12                           ` David Hildenbrand (Red Hat)
2025-11-26 16:07                             ` Ryan Roberts
2025-11-26 16:34                               ` Ryan Roberts
2025-11-26 20:31                                 ` David Hildenbrand (Red Hat)
2025-11-27  7:14                                   ` David Hildenbrand (Red Hat)
2025-11-27  7:31                                     ` David Hildenbrand (Red Hat)
2025-11-27 15:32                                       ` Ryan Roberts
2025-11-27 19:39                                 ` Christophe Leroy (CS GROUP)
2025-11-27 19:44                                 ` Christophe Leroy (CS GROUP)
2025-11-27  8:26                   ` Christophe Leroy (CS GROUP)
2025-11-27  8:35                     ` David Hildenbrand (Red Hat)
2025-11-13  1:45 ` [PATCH v3 07/22] checkpatch: Warn on page table access without accessors Samuel Holland
2025-11-13  2:21   ` Joe Perches
2025-11-13  2:36     ` Samuel Holland
2025-11-13 19:17       ` David Hildenbrand (Red Hat)
2025-12-11  0:29         ` Samuel Holland
2025-11-13  1:45 ` [PATCH v3 08/22] mm: Allow page table accessors to be non-idempotent Samuel Holland
2025-11-13  7:19   ` kernel test robot
2025-11-27 16:57   ` Ryan Roberts
2025-11-27 17:47     ` David Hildenbrand (Red Hat)
2025-12-11  0:33       ` Samuel Holland
2025-12-11 13:59         ` Ryan Roberts [this message]
2025-12-16 10:29           ` Lorenzo Stoakes
2025-12-16 17:46             ` Ryan Roberts
2025-12-18 17:27               ` Lorenzo Stoakes
2025-12-18  9:49           ` David Hildenbrand (Red Hat)
2025-11-13  1:45 ` [PATCH v3 09/22] riscv: hibernate: Replace open-coded pXXp_get() Samuel Holland
2025-11-13  1:45 ` [PATCH v3 10/22] riscv: mm: Always use page table accessor functions Samuel Holland
2025-11-13  1:45 ` [PATCH v3 11/22] riscv: mm: Simplify set_p4d() and set_pgd() Samuel Holland
2025-11-13  1:45 ` [PATCH v3 12/22] riscv: mm: Deduplicate _PAGE_CHG_MASK definition Samuel Holland
2025-11-13  1:45 ` [PATCH v3 13/22] riscv: ptdump: Only show N and MT bits when enabled in the kernel Samuel Holland
2025-11-13  1:45 ` [PATCH v3 14/22] riscv: mm: Fix up memory types when writing page tables Samuel Holland
2025-11-13  1:45 ` [PATCH v3 15/22] riscv: mm: Expose all page table bits to assembly code Samuel Holland
2025-11-13  1:45 ` [PATCH v3 16/22] riscv: alternative: Add an ALTERNATIVE_3 macro Samuel Holland
2025-11-13  1:45 ` [PATCH v3 17/22] riscv: alternative: Allow calls with alternate link registers Samuel Holland
2025-11-13  1:45 ` [PATCH v3 18/22] riscv: Fix logic for selecting DMA_DIRECT_REMAP Samuel Holland
2025-11-13  1:45 ` [PATCH v3 19/22] dt-bindings: riscv: Describe physical memory regions Samuel Holland
2025-12-04 15:12   ` Rob Herring
2025-11-13  1:45 ` [PATCH v3 20/22] riscv: mm: Use physical memory aliases to apply PMAs Samuel Holland
2025-11-13  1:45 ` [PATCH v3 21/22] riscv: dts: starfive: jh7100: Use physical memory ranges for DMA Samuel Holland
2025-11-13  1:45 ` [PATCH v3 22/22] riscv: dts: eswin: eic7700: " Samuel Holland
2025-11-13 19:13 ` [PATCH v3 00/22] riscv: Memory type control for platforms with physical memory aliases David Hildenbrand (Red Hat)
2025-12-04 16:23 ` Nick Kossifidis
2025-12-19  8:10 ` patchwork-bot+linux-riscv
2026-02-27 10:54   ` Marcel Ziswiler
2026-04-30  3:26 ` patchwork-bot+linux-riscv

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a063f6c5-2785-4a9f-8079-25edb3e54cef@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@ghiti.fr \
    --cc=conor@kernel.org \
    --cc=david@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=kernel@esmil.dk \
    --cc=krzk+dt@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=palmer@dabbelt.com \
    --cc=pjw@kernel.org \
    --cc=robh+dt@kernel.org \
    --cc=rppt@kernel.org \
    --cc=samuel.holland@sifive.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox