From: Oscar Salvador <osalvador@suse.de>
To: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linux-kernel@vger.kernel.org, Nicholas Piggin <npiggin@gmail.com>,
linux-mm@kvack.org, Peter Xu <peterx@redhat.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Andrew Morton <akpm@linux-foundation.org>,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v5 02/18] mm: Define __pte_leaf_size() to also take a PMD entry
Date: Tue, 11 Jun 2024 11:34:23 +0200 [thread overview]
Message-ID: <ZmgaHyS0izhtKbx6@localhost.localdomain> (raw)
In-Reply-To: <172b11c93e0de7a84937af2da9f80bd17c56b8c9.1717955558.git.christophe.leroy@csgroup.eu>
On Mon, Jun 10, 2024 at 07:54:47AM +0200, Christophe Leroy wrote:
> On powerpc 8xx, when a page is 8M size, the information is in the PMD
> entry. So allow architectures to provide __pte_leaf_size() instead of
> pte_leaf_size() and provide the PMD entry to that function.
>
> When __pte_leaf_size() is not defined, define it as a pte_leaf_size()
> so that architectures not interested in the PMD arguments are not
> impacted.
>
> Only define a default pte_leaf_size() when __pte_leaf_size() is not
> defined to make sure nobody adds new calls to pte_leaf_size() in the
> core.
Hi Christophe,
Now I am going to give you a hard time, so sorry in advance.
I should have raised this before, but I was not fully aware of it.
There is an ongoing effort of unifying pagewalkers [1], so hugetlb does not have
to be special-cased anymore, and the operations we do for THP on page-table basis
work for hugetlb as well.
The most special bit about this is huge_ptep_get.
huge_ptep_get() gets special handled on arm/arm64/riscv and s390.
arm64 and riscv is about cont-pmd/pte and propagate the dirty/young bits bits, so that
is fine as walkers can already understand that.
s390 is a funny one because it converts pud/pmd to pte and viceversa, because hugetlb
*works* with ptes, so before returning the pte it has to transfer all
bits from PUD/PMD level into a something that PTE level can understand.
As you can imagine, this can be gone as we already have all the
information in PUD/PMD and that is all pagewalkers need.
But we are left with the one you will introduce in patch#8.
8MB pages get mapped as cont-pte, but all the information is stored in
the PMD entries (size, dirtiness, present etc).
huge_ptep_get() will return the PMD for 8MB, and so all operations hugetlb
code performs with what huge_ptep_get returns will be performed on those PMDs.
Which brings me to this point:
I do not think __pte_leaf_size is needed. AFAICS, it should be enough to define
pmd_leaf on 8xx, and return 8MB if it is a 8MB hugepage.
#define pmd_leaf pmd_leaf
static inline bool pmd_leaf(pmd_t pmd)
{
return pmd_val(pmd) & _PMD_PAGE_8M);
}
and then pmd_leaf_size to return _PMD_PAGE_8M.
This will help because on the ongoing effort of unifying hugetlb and
getting rid of huge_ptep_get() [1], pagewalkers will stumble upon the
8mb-PMD as they do for regular PMDs.
Which means that they would be caught in the following code:
ptl = pmd_huge_lock(pmd, vma);
if (ptl) {
- 8MB hugepages will be handled here
smaps_pmd_entry(pmd, addr, walk);
spin_unlock(ptl);
}
/* pte stuff */
...
where pmd_huge_lock is:
static inline spinlock_t *pmd_huge_lock(pmd_t *pmd, struct vm_area_struct *vma)
{
spinlock_t *ptl = pmd_lock(vma->vm_mm, pmd);
if (pmd_leaf(*pmd) || pmd_devmap(*pmd))
return ptl;
spin_unlock(ptl);
return NULL;
}
So, since pmd_leaf() will return true for 8MB hugepages, we are fine,
because anyway we want to perform pagetable operations on *that* PMD and
not the ptes that are cont-mapped, which is different for e.g: 512K
hugepages, where we perform it on pte level.
So I would suggest that instead of this patch, we have one implementing pmd_leaf
and pmd_leaf_size for 8Mb hugepages on power8xx, as that takes us closer to our goal of
unifying hugetlb.
[1] https://github.com/leberus/linux/tree/hugetlb-pagewalk-v2
--
Oscar Salvador
SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Oscar Salvador <osalvador@suse.de>
To: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jason Gunthorpe <jgg@nvidia.com>, Peter Xu <peterx@redhat.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Nicholas Piggin <npiggin@gmail.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v5 02/18] mm: Define __pte_leaf_size() to also take a PMD entry
Date: Tue, 11 Jun 2024 11:34:23 +0200 [thread overview]
Message-ID: <ZmgaHyS0izhtKbx6@localhost.localdomain> (raw)
In-Reply-To: <172b11c93e0de7a84937af2da9f80bd17c56b8c9.1717955558.git.christophe.leroy@csgroup.eu>
On Mon, Jun 10, 2024 at 07:54:47AM +0200, Christophe Leroy wrote:
> On powerpc 8xx, when a page is 8M size, the information is in the PMD
> entry. So allow architectures to provide __pte_leaf_size() instead of
> pte_leaf_size() and provide the PMD entry to that function.
>
> When __pte_leaf_size() is not defined, define it as a pte_leaf_size()
> so that architectures not interested in the PMD arguments are not
> impacted.
>
> Only define a default pte_leaf_size() when __pte_leaf_size() is not
> defined to make sure nobody adds new calls to pte_leaf_size() in the
> core.
Hi Christophe,
Now I am going to give you a hard time, so sorry in advance.
I should have raised this before, but I was not fully aware of it.
There is an ongoing effort of unifying pagewalkers [1], so hugetlb does not have
to be special-cased anymore, and the operations we do for THP on page-table basis
work for hugetlb as well.
The most special bit about this is huge_ptep_get.
huge_ptep_get() gets special handled on arm/arm64/riscv and s390.
arm64 and riscv is about cont-pmd/pte and propagate the dirty/young bits bits, so that
is fine as walkers can already understand that.
s390 is a funny one because it converts pud/pmd to pte and viceversa, because hugetlb
*works* with ptes, so before returning the pte it has to transfer all
bits from PUD/PMD level into a something that PTE level can understand.
As you can imagine, this can be gone as we already have all the
information in PUD/PMD and that is all pagewalkers need.
But we are left with the one you will introduce in patch#8.
8MB pages get mapped as cont-pte, but all the information is stored in
the PMD entries (size, dirtiness, present etc).
huge_ptep_get() will return the PMD for 8MB, and so all operations hugetlb
code performs with what huge_ptep_get returns will be performed on those PMDs.
Which brings me to this point:
I do not think __pte_leaf_size is needed. AFAICS, it should be enough to define
pmd_leaf on 8xx, and return 8MB if it is a 8MB hugepage.
#define pmd_leaf pmd_leaf
static inline bool pmd_leaf(pmd_t pmd)
{
return pmd_val(pmd) & _PMD_PAGE_8M);
}
and then pmd_leaf_size to return _PMD_PAGE_8M.
This will help because on the ongoing effort of unifying hugetlb and
getting rid of huge_ptep_get() [1], pagewalkers will stumble upon the
8mb-PMD as they do for regular PMDs.
Which means that they would be caught in the following code:
ptl = pmd_huge_lock(pmd, vma);
if (ptl) {
- 8MB hugepages will be handled here
smaps_pmd_entry(pmd, addr, walk);
spin_unlock(ptl);
}
/* pte stuff */
...
where pmd_huge_lock is:
static inline spinlock_t *pmd_huge_lock(pmd_t *pmd, struct vm_area_struct *vma)
{
spinlock_t *ptl = pmd_lock(vma->vm_mm, pmd);
if (pmd_leaf(*pmd) || pmd_devmap(*pmd))
return ptl;
spin_unlock(ptl);
return NULL;
}
So, since pmd_leaf() will return true for 8MB hugepages, we are fine,
because anyway we want to perform pagetable operations on *that* PMD and
not the ptes that are cont-mapped, which is different for e.g: 512K
hugepages, where we perform it on pte level.
So I would suggest that instead of this patch, we have one implementing pmd_leaf
and pmd_leaf_size for 8Mb hugepages on power8xx, as that takes us closer to our goal of
unifying hugetlb.
[1] https://github.com/leberus/linux/tree/hugetlb-pagewalk-v2
--
Oscar Salvador
SUSE Labs
next prev parent reply other threads:[~2024-06-11 9:36 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-10 5:54 [PATCH v5 00/18] Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64) Christophe Leroy
2024-06-10 5:54 ` Christophe Leroy
2024-06-10 5:54 ` [PATCH v5 01/18] powerpc/64e: Remove unused IBM HTW code [SQUASHED] Christophe Leroy
2024-06-10 5:54 ` Christophe Leroy
2024-06-10 5:54 ` [PATCH v5 02/18] mm: Define __pte_leaf_size() to also take a PMD entry Christophe Leroy
2024-06-10 5:54 ` Christophe Leroy
2024-06-11 9:34 ` Oscar Salvador [this message]
2024-06-11 9:34 ` Oscar Salvador
2024-06-11 14:17 ` Peter Xu
2024-06-11 14:17 ` Peter Xu
2024-06-11 15:08 ` Oscar Salvador
2024-06-11 15:08 ` Oscar Salvador
2024-06-11 15:20 ` Peter Xu
2024-06-11 15:20 ` Peter Xu
2024-06-11 16:10 ` Oscar Salvador
2024-06-11 16:10 ` Oscar Salvador
2024-06-11 19:00 ` LEROY Christophe
2024-06-11 19:00 ` LEROY Christophe
2024-06-11 21:43 ` Peter Xu
2024-06-11 21:43 ` Peter Xu
2024-06-13 7:19 ` Oscar Salvador
2024-06-13 7:19 ` Oscar Salvador
2024-06-13 16:43 ` LEROY Christophe
2024-06-13 16:43 ` LEROY Christophe
2024-06-14 14:14 ` Oscar Salvador
2024-06-14 14:14 ` Oscar Salvador
2024-06-11 16:53 ` LEROY Christophe
2024-06-11 16:53 ` LEROY Christophe
2024-06-11 14:50 ` LEROY Christophe
2024-06-11 14:50 ` LEROY Christophe
2024-06-10 5:54 ` [PATCH v5 03/18] mm: Provide mm_struct and address to huge_ptep_get() Christophe Leroy
2024-06-10 5:54 ` Christophe Leroy
2024-06-10 5:54 ` [PATCH v5 04/18] powerpc/mm: Remove _PAGE_PSIZE Christophe Leroy
2024-06-10 5:54 ` Christophe Leroy
2024-06-10 5:54 ` [PATCH v5 05/18] powerpc/mm: Fix __find_linux_pte() on 32 bits with PMD leaf entries Christophe Leroy
2024-06-10 5:54 ` Christophe Leroy
2024-06-10 5:54 ` [PATCH v5 06/18] powerpc/mm: Allow hugepages without hugepd Christophe Leroy
2024-06-10 5:54 ` Christophe Leroy
2024-06-10 5:54 ` [PATCH v5 07/18] powerpc/8xx: Fix size given to set_huge_pte_at() Christophe Leroy
2024-06-10 5:54 ` Christophe Leroy
2024-06-10 5:54 ` [PATCH v5 08/18] powerpc/8xx: Rework support for 8M pages using contiguous PTE entries Christophe Leroy
2024-06-10 5:54 ` Christophe Leroy
2024-06-10 5:54 ` [PATCH v5 09/18] powerpc/8xx: Simplify struct mmu_psize_def Christophe Leroy
2024-06-10 5:54 ` Christophe Leroy
2024-06-10 5:54 ` [PATCH v5 10/18] powerpc/e500: Remove enc and ind fields from " Christophe Leroy
2024-06-10 5:54 ` Christophe Leroy
2024-06-10 5:54 ` [PATCH v5 11/18] powerpc/e500: Switch to 64 bits PGD on 85xx (32 bits) Christophe Leroy
2024-06-10 5:54 ` Christophe Leroy
2024-06-10 5:54 ` [PATCH v5 12/18] powerpc/e500: Encode hugepage size in PTE bits Christophe Leroy
2024-06-10 5:54 ` Christophe Leroy
2024-06-10 5:54 ` [PATCH v5 13/18] powerpc/e500: Don't pre-check write access on data TLB error Christophe Leroy
2024-06-10 5:54 ` Christophe Leroy
2024-06-10 5:54 ` [PATCH v5 14/18] powerpc/e500: Free r10 for FIND_PTE Christophe Leroy
2024-06-10 5:54 ` Christophe Leroy
2024-06-10 5:55 ` [PATCH v5 15/18] powerpc/e500: Use contiguous PMD instead of hugepd Christophe Leroy
2024-06-10 5:55 ` Christophe Leroy
2024-06-10 5:55 ` [PATCH v5 16/18] powerpc/64s: Use contiguous PMD/PUD instead of HUGEPD Christophe Leroy
2024-06-10 5:55 ` Christophe Leroy
2024-06-13 7:39 ` Oscar Salvador
2024-06-13 7:39 ` Oscar Salvador
2024-06-24 14:24 ` LEROY Christophe
2024-06-24 14:24 ` LEROY Christophe
2024-06-10 5:55 ` [PATCH v5 17/18] powerpc/mm: Remove hugepd leftovers Christophe Leroy
2024-06-10 5:55 ` Christophe Leroy
2024-06-10 5:55 ` [PATCH v5 18/18] mm: Remove CONFIG_ARCH_HAS_HUGEPD Christophe Leroy
2024-06-10 5:55 ` Christophe Leroy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZmgaHyS0izhtKbx6@localhost.localdomain \
--to=osalvador@suse.de \
--cc=akpm@linux-foundation.org \
--cc=christophe.leroy@csgroup.eu \
--cc=jgg@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=npiggin@gmail.com \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.