From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>,
kernel test robot <lkp@intel.com>,
linux-mm@kvack.org, llvm@lists.linux.dev,
oe-kbuild-all@lists.linux.dev,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@redhat.com>,
"Mike Rapoport (IBM)" <rppt@kernel.org>,
Arnd Bergmann <arnd@arndb.de>,
x86@kernel.org, linux-m68k@lists.linux-m68k.org,
linux-fsdevel@vger.kernel.org, kasan-dev@googlegroups.com,
linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
Dimitri Sivanich <dimitri.sivanich@hpe.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Muchun Song <muchun.song@linux.dev>,
Andrey Ryabinin <ryabinin.a.a@gmail.com>,
Miaohe Lin <linmiaohe@huawei.com>,
Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
Christoph Lameter <cl@linux-foundation.org>,
Uladzislau Rezki <urezki@gmail.com>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries
Date: Fri, 20 Sep 2024 10:47:45 +0100 [thread overview]
Message-ID: <Zu1EwTItDrnkTVTB@shell.armlinux.org.uk> (raw)
In-Reply-To: <9e68ffad-8a7e-40d7-a6f3-fa989a834068@arm.com>
On Fri, Sep 20, 2024 at 08:57:23AM +0200, Ryan Roberts wrote:
> On 19/09/2024 21:25, Russell King (Oracle) wrote:
> > On Thu, Sep 19, 2024 at 07:49:09PM +0200, Ryan Roberts wrote:
> >> On 19/09/2024 18:06, Russell King (Oracle) wrote:
> >>> On Thu, Sep 19, 2024 at 05:48:58PM +0200, Ryan Roberts wrote:
> >>>>> 32-bit arm uses, in some circumstances, an array because each level 1
> >>>>> page table entry is actually two descriptors. It needs to be this way
> >>>>> because each level 2 table pointed to by each level 1 entry has 256
> >>>>> entries, meaning it only occupies 1024 bytes in a 4096 byte page.
> >>>>>
> >>>>> In order to cut down on the wastage, treat the level 1 page table as
> >>>>> groups of two entries, which point to two consecutive 1024 byte tables
> >>>>> in the level 2 page.
> >>>>>
> >>>>> The level 2 entry isn't suitable for the kernel's use cases (there are
> >>>>> no bits to represent accessed/dirty and other important stuff that the
> >>>>> Linux MM wants) so we maintain the hardware page tables and a separate
> >>>>> set that Linux uses in the same page. Again, the software tables are
> >>>>> consecutive, so from Linux's perspective, the level 2 page tables
> >>>>> have 512 entries in them and occupy one full page.
> >>>>>
> >>>>> This is documented in arch/arm/include/asm/pgtable-2level.h
> >>>>>
> >>>>> However, what this means is that from the software perspective, the
> >>>>> level 1 page table descriptors are an array of two entries, both of
> >>>>> which need to be setup when creating a level 2 page table, but only
> >>>>> the first one should ever be dereferenced when walking the tables,
> >>>>> otherwise the code that walks the second level of page table entries
> >>>>> will walk off the end of the software table into the actual hardware
> >>>>> descriptors.
> >>>>>
> >>>>> I've no idea what the idea is behind introducing pgd_get() and what
> >>>>> it's semantics are, so I can't comment further.
> >>>>
> >>>> The helper is intended to read the value of the entry pointed to by the passed
> >>>> in pointer. And it shoiuld be read in a "single copy atomic" manner, meaning no
> >>>> tearing. Further, the PTL is expected to be held when calling the getter. If the
> >>>> HW can write to the entry such that its racing with the lock holder (i.e. HW
> >>>> update of access/dirty) then READ_ONCE() should be suitable for most
> >>>> architectures. If there is no possibility of racing (because HW doesn't write to
> >>>> the entry), then a simple dereference would be sufficient, I think (which is
> >>>> what the core code was already doing in most cases).
> >>>
> >>> The core code should be making no access to the PGD entries on 32-bit
> >>> ARM since the PGD level does not exist. Writes are done at PMD level
> >>> in arch code. Reads are done by core code at PMD level.
> >>>
> >>> It feels to me like pgd_get() just doesn't fit the model to which 32-bit
> >>> ARM was designed to use decades ago, so I want full details about what
> >>> pgd_get() is going to be used for and how it is going to be used,
> >>> because I feel completely in the dark over this new development. I fear
> >>> that someone hasn't understood the Linux page table model if they're
> >>> wanting to access stuff at levels that effectively "aren't implemented"
> >>> in the architecture specific kernel model of the page tables.
> >>
> >> This change isn't as big and scary as I think you fear.
> >
> > The situation is as I state above. Core code must _not_ dereference pgd
> > pointers on 32-bit ARM.
>
> Let's just rewind a bit. This thread exists because the kernel test robot failed
> to compile pgd_none_or_clear_bad() (a core-mm function) for the arm architecture
> after Anshuman changed the direct pgd dereference to pgdp_get(). The reason
> compilation failed is because arm defines its own pgdp_get() override, but it is
> broken (there is a typo).
Let's not rewind, because had you fully read and digested my reply, you
would have seen why this isn't a problem... but let me spell it out.
>
> Code before Anshuman's change:
>
> static inline int pgd_none_or_clear_bad(pgd_t *pgd)
> {
> if (pgd_none(*pgd))
> return 1;
> if (unlikely(pgd_bad(*pgd))) {
> pgd_clear_bad(pgd);
> return 1;
> }
> return 0;
> }
This isn't a problem as the code stands. While there is a dereference
in C, that dereference is a simple struct copy, something that we use
everywhere in the kernel. However, that is as far as it goes, because
neither pgd_none() and pgd_bad() make use of their argument, and thus
the compiler will optimise it away, resulting in no actual access to
the page tables - _as_ _intended_.
If these are going to be converted to pgd_get(), then we need pgd_get()
to _also_ be optimised away, and if e.g. this is the only place that
pgd_get() is going to be used, the suggestion I made in my previous
email is entirely reasonable, since we know that the result of pgd_get()
will not actually be used.
> As an aside, the kernel also dereferences p4d, pud, pmd and pte pointers in
> various circumstances.
I already covered these in my previous reply.
> And other changes in this series are also replacing those
> direct dereferences with calls to similar helpers. The fact that these are all
> folded (by a custom arm implementation if I've understood the below correctly)
> just means that each dereference is returning what you would call the pmd from
> the HW perspective, I think?
It'll "return" the first of each pair of level-1 page table entries,
which is pgd[0] or *p4d, *pud, *pmd - but all of these except *pmd
need to be optimised away, so throwing lots of READ_ONCE() around
this code without considering this is certainly the wrong approach.
> >> The core-mm today
> >> dereferences pgd pointers (and p4d, pud, pmd pointers) directly in its code. See
> >> follow_pfnmap_start(),
> >
> > Doesn't seem to exist at least not in 6.11.
>
> Appologies, I'm on mm-unstable and that isn't upstream yet. See follow_pte() in
> v6.11 or __apply_to_page_range(), or pgd_none_or_clear_bad() as per above.
Looking at follow_pte(), it's not a problem.
I think we wouldn't be having this conversation before:
commit a32618d28dbe6e9bf8ec508ccbc3561a7d7d32f0
Author: Russell King <rmk+kernel@arm.linux.org.uk>
Date: Tue Nov 22 17:30:28 2011 +0000
ARM: pgtable: switch to use pgtable-nopud.h
where:
-#define pgd_none(pgd) (0)
-#define pgd_bad(pgd) (0)
existed before this commit - and thus the dereference in things like:
pgd_none(*pgd)
wouldn't even be visible to beyond the preprocessor step.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
next prev parent reply other threads:[~2024-09-20 9:48 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-17 7:31 [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries Anshuman Khandual
2024-09-17 7:31 ` [PATCH V2 1/7] m68k/mm: Change pmd_val() Anshuman Khandual
2024-09-17 8:40 ` Ryan Roberts
2024-09-17 10:20 ` David Hildenbrand
2024-09-17 10:27 ` Ryan Roberts
2024-09-17 10:30 ` David Hildenbrand
2024-09-17 7:31 ` [PATCH V2 2/7] x86/mm: Drop page table entry address output from pxd_ERROR() Anshuman Khandual
2024-09-17 10:22 ` David Hildenbrand
2024-09-17 11:19 ` Dave Hansen
2024-09-17 11:25 ` Anshuman Khandual
2024-09-17 11:31 ` David Hildenbrand
2024-09-17 7:31 ` [PATCH V2 3/7] mm: Use ptep_get() for accessing PTE entries Anshuman Khandual
2024-09-17 8:44 ` Ryan Roberts
2024-09-17 10:28 ` David Hildenbrand
2024-09-18 6:32 ` Anshuman Khandual
2024-09-19 8:04 ` David Hildenbrand
2024-09-19 9:20 ` Anshuman Khandual
2024-09-17 7:31 ` [PATCH V2 4/7] mm: Use pmdp_get() for accessing PMD entries Anshuman Khandual
2024-09-17 10:05 ` Ryan Roberts
2024-09-18 18:57 ` kernel test robot
2024-09-19 7:21 ` Anshuman Khandual
2024-09-18 19:07 ` kernel test robot
2024-09-19 7:12 ` Anshuman Khandual
2024-09-17 7:31 ` [PATCH V2 5/7] mm: Use pudp_get() for accessing PUD entries Anshuman Khandual
2024-09-17 7:31 ` [PATCH V2 6/7] mm: Use p4dp_get() for accessing P4D entries Anshuman Khandual
2024-09-17 7:31 ` [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries Anshuman Khandual
2024-09-18 20:30 ` kernel test robot
2024-09-19 7:55 ` Anshuman Khandual
2024-09-19 9:11 ` Russell King (Oracle)
2024-09-19 15:48 ` Ryan Roberts
2024-09-19 17:06 ` Russell King (Oracle)
2024-09-19 17:49 ` Ryan Roberts
2024-09-19 20:25 ` Russell King (Oracle)
2024-09-20 6:57 ` Ryan Roberts
2024-09-20 9:47 ` Russell King (Oracle) [this message]
2024-09-23 15:21 ` Ryan Roberts
2024-09-25 10:05 ` [PATCH V2 0/7] mm: Use pxdp_get() for accessing page table entries Christophe Leroy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zu1EwTItDrnkTVTB@shell.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=arnd@arndb.de \
--cc=cl@linux-foundation.org \
--cc=david@redhat.com \
--cc=dennis@kernel.org \
--cc=dimitri.sivanich@hpe.com \
--cc=hch@infradead.org \
--cc=kasan-dev@googlegroups.com \
--cc=linmiaohe@huawei.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-m68k@lists.linux-m68k.org \
--cc=linux-mm@kvack.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=lkp@intel.com \
--cc=llvm@lists.linux.dev \
--cc=muchun.song@linux.dev \
--cc=oe-kbuild-all@lists.linux.dev \
--cc=rppt@kernel.org \
--cc=ryabinin.a.a@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=tj@kernel.org \
--cc=urezki@gmail.com \
--cc=viro@zeniv.linux.org.uk \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).