All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, Nicholas Piggin <npiggin@gmail.com>,
	linux-mm@kvack.org,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v5 02/18] mm: Define __pte_leaf_size() to also take a PMD entry
Date: Tue, 11 Jun 2024 17:08:45 +0200	[thread overview]
Message-ID: <ZmhofWIiMC3I0aMF@localhost.localdomain> (raw)
In-Reply-To: <ZmhcepJrkDpJ7mSC@x1n>

On Tue, Jun 11, 2024 at 10:17:30AM -0400, Peter Xu wrote:
> Oscar,
> 
> On Tue, Jun 11, 2024 at 11:34:23AM +0200, Oscar Salvador wrote:
> > Which means that they would be caught in the following code:
> > 
> >         ptl = pmd_huge_lock(pmd, vma);
> >         if (ptl) {
> > 	        - 8MB hugepages will be handled here
> >                 smaps_pmd_entry(pmd, addr, walk);
> >                 spin_unlock(ptl);
> >         }
> > 	/* pte stuff */
> > 	...
> 
> Just one quick comment: I think there's one challenge though as this is
> also not a generic "pmd leaf", but a pgtable page underneath.  I think it
> means smaps_pmd_entry() won't trivially work here, e.g., it will start to
> do this:
> 
> 	if (pmd_present(*pmd)) {
> 		page = vm_normal_page_pmd(vma, addr, *pmd);
> 
> Here vm_normal_page_pmd() will only work if pmd_leaf() satisfies its
> definition as:
> 
>  * - It should contain a huge PFN, which points to a huge page larger than
>  *   PAGE_SIZE of the platform.  The PFN format isn't important here.
> 
> But now it's a pgtable page, containing cont-ptes.  Similarly, I think most
> pmd_*() helpers will stop working there if we report it as a leaf.

Heh, I think I managed to confuse myself.
I do not why but I thought that

 static inline pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
        if (ptep_is_8m_pmdp(mm, addr, ptep))
             ptep = pte_offset_kernel((pmd_t *)ptep, 0);
        return ptep_get(ptep);
 }

would return the address of the pmd for 8MB hugepages, but it will
return the address of the first pte?

Then yeah, this will not work as I thought.

The problem is that we do not have spare bits for 8xx to mark these ptes
as cont-ptes or mark them pte as 8MB, so I do not see a clear path on how
we could remove huge_ptep_get for 8xx.

I am really curious though how we handle that for THP? Or THP on 8xx
does not support that size?
 

-- 
Oscar Salvador
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Oscar Salvador <osalvador@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v5 02/18] mm: Define __pte_leaf_size() to also take a PMD entry
Date: Tue, 11 Jun 2024 17:08:45 +0200	[thread overview]
Message-ID: <ZmhofWIiMC3I0aMF@localhost.localdomain> (raw)
In-Reply-To: <ZmhcepJrkDpJ7mSC@x1n>

On Tue, Jun 11, 2024 at 10:17:30AM -0400, Peter Xu wrote:
> Oscar,
> 
> On Tue, Jun 11, 2024 at 11:34:23AM +0200, Oscar Salvador wrote:
> > Which means that they would be caught in the following code:
> > 
> >         ptl = pmd_huge_lock(pmd, vma);
> >         if (ptl) {
> > 	        - 8MB hugepages will be handled here
> >                 smaps_pmd_entry(pmd, addr, walk);
> >                 spin_unlock(ptl);
> >         }
> > 	/* pte stuff */
> > 	...
> 
> Just one quick comment: I think there's one challenge though as this is
> also not a generic "pmd leaf", but a pgtable page underneath.  I think it
> means smaps_pmd_entry() won't trivially work here, e.g., it will start to
> do this:
> 
> 	if (pmd_present(*pmd)) {
> 		page = vm_normal_page_pmd(vma, addr, *pmd);
> 
> Here vm_normal_page_pmd() will only work if pmd_leaf() satisfies its
> definition as:
> 
>  * - It should contain a huge PFN, which points to a huge page larger than
>  *   PAGE_SIZE of the platform.  The PFN format isn't important here.
> 
> But now it's a pgtable page, containing cont-ptes.  Similarly, I think most
> pmd_*() helpers will stop working there if we report it as a leaf.

Heh, I think I managed to confuse myself.
I do not why but I thought that

 static inline pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
        if (ptep_is_8m_pmdp(mm, addr, ptep))
             ptep = pte_offset_kernel((pmd_t *)ptep, 0);
        return ptep_get(ptep);
 }

would return the address of the pmd for 8MB hugepages, but it will
return the address of the first pte?

Then yeah, this will not work as I thought.

The problem is that we do not have spare bits for 8xx to mark these ptes
as cont-ptes or mark them pte as 8MB, so I do not see a clear path on how
we could remove huge_ptep_get for 8xx.

I am really curious though how we handle that for THP? Or THP on 8xx
does not support that size?
 

-- 
Oscar Salvador
SUSE Labs


  reply	other threads:[~2024-06-11 15:09 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-10  5:54 [PATCH v5 00/18] Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64) Christophe Leroy
2024-06-10  5:54 ` Christophe Leroy
2024-06-10  5:54 ` [PATCH v5 01/18] powerpc/64e: Remove unused IBM HTW code [SQUASHED] Christophe Leroy
2024-06-10  5:54   ` Christophe Leroy
2024-06-10  5:54 ` [PATCH v5 02/18] mm: Define __pte_leaf_size() to also take a PMD entry Christophe Leroy
2024-06-10  5:54   ` Christophe Leroy
2024-06-11  9:34   ` Oscar Salvador
2024-06-11  9:34     ` Oscar Salvador
2024-06-11 14:17     ` Peter Xu
2024-06-11 14:17       ` Peter Xu
2024-06-11 15:08       ` Oscar Salvador [this message]
2024-06-11 15:08         ` Oscar Salvador
2024-06-11 15:20         ` Peter Xu
2024-06-11 15:20           ` Peter Xu
2024-06-11 16:10           ` Oscar Salvador
2024-06-11 16:10             ` Oscar Salvador
2024-06-11 19:00             ` LEROY Christophe
2024-06-11 19:00               ` LEROY Christophe
2024-06-11 21:43               ` Peter Xu
2024-06-11 21:43                 ` Peter Xu
2024-06-13  7:19               ` Oscar Salvador
2024-06-13  7:19                 ` Oscar Salvador
2024-06-13 16:43                 ` LEROY Christophe
2024-06-13 16:43                   ` LEROY Christophe
2024-06-14 14:14                   ` Oscar Salvador
2024-06-14 14:14                     ` Oscar Salvador
2024-06-11 16:53         ` LEROY Christophe
2024-06-11 16:53           ` LEROY Christophe
2024-06-11 14:50     ` LEROY Christophe
2024-06-11 14:50       ` LEROY Christophe
2024-06-10  5:54 ` [PATCH v5 03/18] mm: Provide mm_struct and address to huge_ptep_get() Christophe Leroy
2024-06-10  5:54   ` Christophe Leroy
2024-06-10  5:54 ` [PATCH v5 04/18] powerpc/mm: Remove _PAGE_PSIZE Christophe Leroy
2024-06-10  5:54   ` Christophe Leroy
2024-06-10  5:54 ` [PATCH v5 05/18] powerpc/mm: Fix __find_linux_pte() on 32 bits with PMD leaf entries Christophe Leroy
2024-06-10  5:54   ` Christophe Leroy
2024-06-10  5:54 ` [PATCH v5 06/18] powerpc/mm: Allow hugepages without hugepd Christophe Leroy
2024-06-10  5:54   ` Christophe Leroy
2024-06-10  5:54 ` [PATCH v5 07/18] powerpc/8xx: Fix size given to set_huge_pte_at() Christophe Leroy
2024-06-10  5:54   ` Christophe Leroy
2024-06-10  5:54 ` [PATCH v5 08/18] powerpc/8xx: Rework support for 8M pages using contiguous PTE entries Christophe Leroy
2024-06-10  5:54   ` Christophe Leroy
2024-06-10  5:54 ` [PATCH v5 09/18] powerpc/8xx: Simplify struct mmu_psize_def Christophe Leroy
2024-06-10  5:54   ` Christophe Leroy
2024-06-10  5:54 ` [PATCH v5 10/18] powerpc/e500: Remove enc and ind fields from " Christophe Leroy
2024-06-10  5:54   ` Christophe Leroy
2024-06-10  5:54 ` [PATCH v5 11/18] powerpc/e500: Switch to 64 bits PGD on 85xx (32 bits) Christophe Leroy
2024-06-10  5:54   ` Christophe Leroy
2024-06-10  5:54 ` [PATCH v5 12/18] powerpc/e500: Encode hugepage size in PTE bits Christophe Leroy
2024-06-10  5:54   ` Christophe Leroy
2024-06-10  5:54 ` [PATCH v5 13/18] powerpc/e500: Don't pre-check write access on data TLB error Christophe Leroy
2024-06-10  5:54   ` Christophe Leroy
2024-06-10  5:54 ` [PATCH v5 14/18] powerpc/e500: Free r10 for FIND_PTE Christophe Leroy
2024-06-10  5:54   ` Christophe Leroy
2024-06-10  5:55 ` [PATCH v5 15/18] powerpc/e500: Use contiguous PMD instead of hugepd Christophe Leroy
2024-06-10  5:55   ` Christophe Leroy
2024-06-10  5:55 ` [PATCH v5 16/18] powerpc/64s: Use contiguous PMD/PUD instead of HUGEPD Christophe Leroy
2024-06-10  5:55   ` Christophe Leroy
2024-06-13  7:39   ` Oscar Salvador
2024-06-13  7:39     ` Oscar Salvador
2024-06-24 14:24     ` LEROY Christophe
2024-06-24 14:24       ` LEROY Christophe
2024-06-10  5:55 ` [PATCH v5 17/18] powerpc/mm: Remove hugepd leftovers Christophe Leroy
2024-06-10  5:55   ` Christophe Leroy
2024-06-10  5:55 ` [PATCH v5 18/18] mm: Remove CONFIG_ARCH_HAS_HUGEPD Christophe Leroy
2024-06-10  5:55   ` Christophe Leroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZmhofWIiMC3I0aMF@localhost.localdomain \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=jgg@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.