Re: [External] Re: [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mike Rapoport <rppt@kernel.org>
To: Usama Arif <usama.arif@bytedance.com>
Cc: linux-mm@kvack.org, muchun.song@linux.dev,
	mike.kravetz@oracle.com, linux-kernel@vger.kernel.org,
	fam.zheng@bytedance.com, liangma@liangbit.com,
	simon.evans@bytedance.com, punit.agrawal@bytedance.com
Subject: Re: [External] Re: [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region
Date: Thu, 27 Jul 2023 07:30:02 +0300	[thread overview]
Message-ID: <20230727043002.GA1901145@kernel.org> (raw)
In-Reply-To: <440d4a0e-c1ea-864b-54cb-aab74858319a@bytedance.com>

On Wed, Jul 26, 2023 at 04:02:21PM +0100, Usama Arif wrote:
> 
> On 26/07/2023 12:01, Mike Rapoport wrote:
> > On Mon, Jul 24, 2023 at 02:46:42PM +0100, Usama Arif wrote:
> > > This propagates the hugepage size from the memblock APIs
> > > (memblock_alloc_try_nid_raw and memblock_alloc_range_nid)
> > > so that it can be stored in struct memblock region. This does not
> > > introduce any functional change and hugepage_size is not used in
> > > this commit. It is just a setup for the next commit where huge_pagesize
> > > is used to skip initialization of struct pages that will be freed later
> > > when HVO is enabled.
> > > 
> > > Signed-off-by: Usama Arif <usama.arif@bytedance.com>
> > > ---
> > >   arch/arm64/mm/kasan_init.c                   |  2 +-
> > >   arch/powerpc/platforms/pasemi/iommu.c        |  2 +-
> > >   arch/powerpc/platforms/pseries/setup.c       |  4 +-
> > >   arch/powerpc/sysdev/dart_iommu.c             |  2 +-
> > >   include/linux/memblock.h                     |  8 ++-
> > >   mm/cma.c                                     |  4 +-
> > >   mm/hugetlb.c                                 |  6 +-
> > >   mm/memblock.c                                | 60 ++++++++++++--------
> > >   mm/mm_init.c                                 |  2 +-
> > >   mm/sparse-vmemmap.c                          |  2 +-
> > >   tools/testing/memblock/tests/alloc_nid_api.c |  2 +-
> > >   11 files changed, 56 insertions(+), 38 deletions(-)
> > > 
> > 
> > [ snip ]
> > 
> > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > > index f71ff9f0ec81..bb8019540d73 100644
> > > --- a/include/linux/memblock.h
> > > +++ b/include/linux/memblock.h
> > > @@ -63,6 +63,7 @@ struct memblock_region {
> > >   #ifdef CONFIG_NUMA
> > >   	int nid;
> > >   #endif
> > > +	phys_addr_t hugepage_size;
> > >   };
> > >   /**
> > > @@ -400,7 +401,8 @@ phys_addr_t memblock_phys_alloc_range(phys_addr_t size, phys_addr_t align,
> > >   				      phys_addr_t start, phys_addr_t end);
> > >   phys_addr_t memblock_alloc_range_nid(phys_addr_t size,
> > >   				      phys_addr_t align, phys_addr_t start,
> > > -				      phys_addr_t end, int nid, bool exact_nid);
> > > +				      phys_addr_t end, int nid, bool exact_nid,
> > > +				      phys_addr_t hugepage_size);
> > 
> > Rather than adding yet another parameter to memblock_phys_alloc_range() we
> > can have an API that sets a flag on the reserved regions.
> > With this the hugetlb reservation code can set a flag when HVO is
> > enabled and memmap_init_reserved_pages() will skip regions with this flag
> > set.
> > 
> 
> Hi,
> 
> Thanks for the review.
> 
> I think you meant memblock_alloc_range_nid/memblock_alloc_try_nid_raw and
> not memblock_phys_alloc_range?

Yes.
 
> My initial approach was to use flags, but I think it looks worse than what I
> have done in this RFC (I have pushed the flags prototype at
> https://github.com/uarif1/linux/commits/flags_skip_prep_init_gigantic_HVO,
> top 4 commits for reference (the main difference is patch 2 and 4 from
> RFC)). The major points are (the bigger issue is in patch 4):
> 
> - (RFC vs flags patch 2 comparison) In the RFC, hugepage_size is propagated
> from memblock_alloc_try_nid_raw through function calls. When using flags,
> the "no_init" boolean is propogated from memblock_alloc_try_nid_raw through
> function calls until the region flags are available in memblock_add_range
> and the new MEMBLOCK_NOINIT flag is set. I think its a bit more tricky to
> introduce a new function to set the flag in the region AFTER the call to
> memblock_alloc_try_nid_raw has finished as the memblock_region can not be
> found.
> So something (hugepage_size/flag information) still has to be propagated
> through function calls and a new argument needs to be added.

Sorry if I wasn't clear. I didn't mean to add flags parameter, I meant to
add a flag and a function that sets this flag for a range. So for
MEMBLOCK_NOINIT there would be 

int memblock_mark_noinit(phys_addr_t base, phys_addr_t size);

I'd just name this flag MEMBLOCK_RSRV_NOINIT to make it clear it controls
the reserved regions.

This won't require updating all call sites of memblock_alloc_range_nid()
and memblock_alloc_try_nid_raw() but only a small refactoring of
memblock_setclr_flag() and its callers.

> - (RFC vs flags patch 4 comparison) We can't skip initialization of the
> whole region, only the tail pages. We still need to initialize the
> HUGETLB_VMEMMAP_RESERVE_SIZE (PAGE_SIZE) struct pages for each gigantic
> page.
> In the RFC, hugepage_size from patch 2 was used in the for loop in
> memmap_init_reserved_pages in patch 4 to reserve
> HUGETLB_VMEMMAP_RESERVE_SIZE struct pages for every hugepage_size. This
> looks very simple and not hacky.

But this requires having hugetlb details in memblock which feels backwards
to me.

With memblock_mark_noinit() you can decide what parts of a gigantic page
should be initialized in __alloc_bootmem_huge_page() and mark as NOINIT
only relevant range.

> If we use a flag, there are 2 ways to initialize the
> HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage:
> 
> 1. (implemented in github link patch 4) memmap_init_reserved_pages skips the
> region for initialization as you suggested, and then we initialize
> HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage somewhere later (I
> did it in gather_bootmem_prealloc). When calling reserve_bootmem_region in
> gather_bootmem_prealloc, we need to skip early_page_uninitialised and this
> makes it look a bit hacky.
> 
> 2. We initialize the HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage
> in memmap_init_reserved_pages itself. As we have used a flag and havent
> passed hugepage_size, we need to get the gigantic page size somehow. There
> doesnt seem to be a nice way to determine the gigantic page size in that
> function which is architecture dependent. I think gigantic page size can be
> given by PAGE_SIZE << (PUD_SHIFT - PAGE_SHIFT), but not sure if this is ok
> for all architectures? If we can use PAGE_SIZE << (PUD_SHIFT - PAGE_SHIFT)
> it will look much better than point 1.
> 
> Both the RFC patches and the github flags implementation work, but I think
> RFC patches look much cleaner. If there is a strong preference for the the
> github patches I can send it to mailing list?
> 
> Thanks,
> Usama

-- 
Sincerely yours,
Mike.

next prev parent reply	other threads:[~2023-07-27  4:30 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-24 13:46 [RFC 0/4] mm/memblock: Skip prep and initialization of struct pages freed later by HVO Usama Arif
2023-07-24 13:46 ` [RFC 1/4] mm/hugetlb: Skip prep of tail pages when HVO is enabled Usama Arif
2023-07-24 17:33   ` kernel test robot
2023-07-24 13:46 ` [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region Usama Arif
2023-07-24 17:33   ` kernel test robot
2023-07-24 17:44   ` kernel test robot
2023-07-26 11:01   ` Mike Rapoport
2023-07-26 15:02     ` [External] " Usama Arif
2023-07-27  4:30       ` Mike Rapoport [this message]
2023-07-27 20:56         ` Usama Arif
2023-07-24 13:46 ` [RFC 3/4] mm/hugetlb_vmemmap: Use nid of the head page to reallocate it Usama Arif
2023-07-24 13:46 ` [RFC 4/4] mm/memblock: Skip initialization of struct pages freed later by HVO Usama Arif
2023-07-24 18:26   ` kernel test robot
2023-07-26 10:34 ` [RFC 0/4] mm/memblock: Skip prep and " Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230727043002.GA1901145@kernel.org \
    --to=rppt@kernel.org \
    --cc=fam.zheng@bytedance.com \
    --cc=liangma@liangbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=punit.agrawal@bytedance.com \
    --cc=simon.evans@bytedance.com \
    --cc=usama.arif@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.