From: Jason Gunthorpe <jgg@nvidia.com>
To: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
Peter Xu <peterx@redhat.com>
Subject: Re: [RFC PATCH 1/8] mm: Provide pagesize to pmd_populate()
Date: Mon, 25 Mar 2024 13:19:19 -0300 [thread overview]
Message-ID: <20240325161919.GD6245@nvidia.com> (raw)
In-Reply-To: <54d78f1b7e7f1c671e40b7c0c637380bcb834326.1711377230.git.christophe.leroy@csgroup.eu>
On Mon, Mar 25, 2024 at 03:55:54PM +0100, Christophe Leroy wrote:
> Unlike many architectures, powerpc 8xx hardware tablewalk requires
> a two level process for all page sizes, allthough second level only
> has one entry when pagesize is 8M.
>
> To fit with Linux page table topology and without requiring special
> page directory layout like hugepd, the page entry will be replicated
> 1024 times in the standard page table. However for large pages it is
> necessary to set bits in the level-1 (PMD) entry. At the time being,
> for 512k pages the flag is kept in the PTE and inserted in the PMD
> entry at TLB miss exception, that is necessary because we can have
> pages of different sizes in a page table. However the 12 PTE bits are
> fully used and there is no room for an additional bit for page size.
>
> For 8M pages, there will be only one page per PMD entry, it is
> therefore possible to flag the pagesize in the PMD entry, with the
> advantage that the information will already be at the right place for
> the hardware.
>
> To do so, add a new helper called pmd_populate_size() which takes the
> page size as an additional argument, and modify __pte_alloc() to also
> take that argument. pte_alloc() is left unmodified in order to
> reduce churn on callers, and a pte_alloc_size() is added for use by
> pte_alloc_huge().
>
> When an architecture doesn't provide pmd_populate_size(),
> pmd_populate() is used as a fallback.
I think it would be a good idea to document what the semantic is
supposed to be for sz?
Just a general remark, probably nothing for this, but with these new
arguments the historical naming seems pretty tortured for
pte_alloc_size().. Something like pmd_populate_leaf(size) as a naming
scheme would make this more intuitive. Ie pmd_populate_leaf() gives
you a PMD entry where the entry points to a leaf page table able to
store folios of at least size.
Anyhow, I thought the edits to the mm helpers were fine, certainly
much nicer than hugepd. Do you see a path to remove hugepd entirely
from here?
Thanks,
Jason
WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Peter Xu <peterx@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [RFC PATCH 1/8] mm: Provide pagesize to pmd_populate()
Date: Mon, 25 Mar 2024 13:19:19 -0300 [thread overview]
Message-ID: <20240325161919.GD6245@nvidia.com> (raw)
In-Reply-To: <54d78f1b7e7f1c671e40b7c0c637380bcb834326.1711377230.git.christophe.leroy@csgroup.eu>
On Mon, Mar 25, 2024 at 03:55:54PM +0100, Christophe Leroy wrote:
> Unlike many architectures, powerpc 8xx hardware tablewalk requires
> a two level process for all page sizes, allthough second level only
> has one entry when pagesize is 8M.
>
> To fit with Linux page table topology and without requiring special
> page directory layout like hugepd, the page entry will be replicated
> 1024 times in the standard page table. However for large pages it is
> necessary to set bits in the level-1 (PMD) entry. At the time being,
> for 512k pages the flag is kept in the PTE and inserted in the PMD
> entry at TLB miss exception, that is necessary because we can have
> pages of different sizes in a page table. However the 12 PTE bits are
> fully used and there is no room for an additional bit for page size.
>
> For 8M pages, there will be only one page per PMD entry, it is
> therefore possible to flag the pagesize in the PMD entry, with the
> advantage that the information will already be at the right place for
> the hardware.
>
> To do so, add a new helper called pmd_populate_size() which takes the
> page size as an additional argument, and modify __pte_alloc() to also
> take that argument. pte_alloc() is left unmodified in order to
> reduce churn on callers, and a pte_alloc_size() is added for use by
> pte_alloc_huge().
>
> When an architecture doesn't provide pmd_populate_size(),
> pmd_populate() is used as a fallback.
I think it would be a good idea to document what the semantic is
supposed to be for sz?
Just a general remark, probably nothing for this, but with these new
arguments the historical naming seems pretty tortured for
pte_alloc_size().. Something like pmd_populate_leaf(size) as a naming
scheme would make this more intuitive. Ie pmd_populate_leaf() gives
you a PMD entry where the entry points to a leaf page table able to
store folios of at least size.
Anyhow, I thought the edits to the mm helpers were fine, certainly
much nicer than hugepd. Do you see a path to remove hugepd entirely
from here?
Thanks,
Jason
next prev parent reply other threads:[~2024-03-25 16:20 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-25 14:55 [RFC PATCH 0/8] Reimplement huge pages without hugepd on powerpc 8xx Christophe Leroy
2024-03-25 14:55 ` Christophe Leroy
2024-03-25 14:55 ` [RFC PATCH 1/8] mm: Provide pagesize to pmd_populate() Christophe Leroy
2024-03-25 14:55 ` Christophe Leroy
2024-03-25 16:19 ` Jason Gunthorpe [this message]
2024-03-25 16:19 ` Jason Gunthorpe
2024-03-25 19:05 ` Christophe Leroy
2024-03-25 19:05 ` Christophe Leroy
2024-03-26 15:01 ` Jason Gunthorpe
2024-03-26 15:01 ` Jason Gunthorpe
2024-03-27 9:58 ` Christophe Leroy
2024-03-27 9:58 ` Christophe Leroy
2024-03-27 16:57 ` Jason Gunthorpe
2024-03-27 16:57 ` Jason Gunthorpe
2024-04-03 18:24 ` Christophe Leroy
2024-04-03 18:24 ` Christophe Leroy
2024-04-04 11:46 ` Jason Gunthorpe
2024-04-04 11:46 ` Jason Gunthorpe
2024-05-26 9:29 ` Christophe Leroy
2024-05-26 9:29 ` Christophe Leroy
2024-03-25 14:55 ` [RFC PATCH 2/8] mm: Provide page size to pte_alloc_huge() Christophe Leroy
2024-03-25 14:55 ` Christophe Leroy
2024-03-25 14:55 ` [RFC PATCH 3/8] mm: Provide pmd to pte_leaf_size() Christophe Leroy
2024-03-25 14:55 ` Christophe Leroy
2024-03-25 14:55 ` [RFC PATCH 4/8] mm: Provide mm_struct and address to huge_ptep_get() Christophe Leroy
2024-03-25 14:55 ` Christophe Leroy
2024-03-25 16:35 ` Jason Gunthorpe
2024-03-25 16:35 ` Jason Gunthorpe
2024-05-26 9:25 ` Christophe Leroy
2024-05-26 9:25 ` Christophe Leroy
2024-03-25 14:55 ` [RFC PATCH 5/8] powerpc/mm: Allow hugepages without hugepd Christophe Leroy
2024-03-25 14:55 ` Christophe Leroy
2024-03-25 14:55 ` [RFC PATCH 6/8] powerpc/8xx: Fix size given to set_huge_pte_at() Christophe Leroy
2024-03-25 14:55 ` Christophe Leroy
2024-03-25 14:56 ` [RFC PATCH 7/8] powerpc/8xx: Remove support for 8M pages Christophe Leroy
2024-03-25 14:56 ` Christophe Leroy
2024-03-25 14:56 ` [RFC PATCH 8/8] powerpc/8xx: Add back support for 8M pages using contiguous PTE entries Christophe Leroy
2024-03-25 14:56 ` Christophe Leroy
2024-03-25 16:38 ` [RFC PATCH 0/8] Reimplement huge pages without hugepd on powerpc 8xx Jason Gunthorpe
2024-03-25 16:38 ` Jason Gunthorpe
2024-04-11 16:15 ` Peter Xu
2024-04-11 16:15 ` Peter Xu
2024-04-12 14:08 ` Christophe Leroy
2024-04-12 14:08 ` Christophe Leroy
2024-04-12 14:30 ` Peter Xu
2024-04-12 14:30 ` Peter Xu
2024-04-15 19:12 ` Christophe Leroy
2024-04-15 19:12 ` Christophe Leroy
2024-04-16 10:58 ` Christophe Leroy
2024-04-16 10:58 ` Christophe Leroy
2024-04-16 19:40 ` Peter Xu
2024-04-16 19:40 ` Peter Xu
2024-05-17 14:27 ` Oscar Salvador
2024-05-17 14:27 ` Oscar Salvador
2024-05-22 9:08 ` Christophe Leroy
2024-05-22 9:08 ` Christophe Leroy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240325161919.GD6245@nvidia.com \
--to=jgg@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=christophe.leroy@csgroup.eu \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.