linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: Mike Rapoport <rppt@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 "Sauerwein, David" <dssauerw@amazon.de>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	David Hildenbrand <david@redhat.com>,
	Marc Zyngier <maz@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Mike Rapoport <rppt@linux.ibm.com>, Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	 linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [PATCH v4 2/4] memblock: update initialization of reserved pages
Date: Mon, 31 Mar 2025 13:50:33 +0100	[thread overview]
Message-ID: <9f33c0b4517eaf5f36c515b92bdcb6170a4a576a.camel@infradead.org> (raw)
In-Reply-To: <20210511100550.28178-3-rppt@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 5063 bytes --]

On Tue, 2021-05-11 at 13:05 +0300, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> The struct pages representing a reserved memory region are initialized
> using reserve_bootmem_range() function. This function is called for each
> reserved region just before the memory is freed from memblock to the buddy
> page allocator.
> 
> The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> values set by the memory map initialization which makes it necessary to
> have a special treatment for such pages in pfn_valid() and
> pfn_valid_within().
> 
> Split out initialization of the reserved pages to a function with a
> meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> reserved regions and mark struct pages for the NOMAP regions as
> PageReserved.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  include/linux/memblock.h |  4 +++-
>  mm/memblock.c            | 28 ++++++++++++++++++++++++++--
>  2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 5984fff3f175..1b4c97c151ae 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -30,7 +30,9 @@ extern unsigned long long max_possible_pfn;
>   * @MEMBLOCK_NONE: no special request
>   * @MEMBLOCK_HOTPLUG: hotpluggable region
>   * @MEMBLOCK_MIRROR: mirrored region
> - * @MEMBLOCK_NOMAP: don't add to kernel direct mapping
> + * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as
> + * reserved in the memory map; refer to memblock_mark_nomap() description
> + * for further details
>   */
>  enum memblock_flags {
>  	MEMBLOCK_NONE		= 0x0,	/* No special request */
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..3abf2c3fea7f 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -906,6 +906,11 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
>   * @base: the base phys addr of the region
>   * @size: the size of the region
>   *
> + * The memory regions marked with %MEMBLOCK_NOMAP will not be added to the
> + * direct mapping of the physical memory. These regions will still be
> + * covered by the memory map. The struct page representing NOMAP memory
> + * frames in the memory map will be PageReserved()
> + *
>   * Return: 0 on success, -errno on failure.
>   */
>  int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
> @@ -2002,6 +2007,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
>  	return end_pfn - start_pfn;
>  }
>  
> +static void __init memmap_init_reserved_pages(void)
> +{
> +	struct memblock_region *region;
> +	phys_addr_t start, end;
> +	u64 i;
> +
> +	/* initialize struct pages for the reserved regions */
> +	for_each_reserved_mem_range(i, &start, &end)
> +		reserve_bootmem_region(start, end);
> +
> +	/* and also treat struct pages for the NOMAP regions as PageReserved */
> +	for_each_mem_region(region) {
> +		if (memblock_is_nomap(region)) {
> +			start = region->base;
> +			end = start + region->size;
> +			reserve_bootmem_region(start, end);
> +		}
> +	}
> +}
> +

In some cases, that whole call to reserve_bootmem_region() may be a no-
op because pfn_valid() is not true for *any* address in that range.

But reserve_bootmem_region() spends a long time iterating of them all,
and eventually doing nothing:

void __meminit reserve_bootmem_region(phys_addr_t start,
                                      phys_addr_t end, int nid)
{
        unsigned long start_pfn = PFN_DOWN(start);
        unsigned long end_pfn = PFN_UP(end);

        for (; start_pfn < end_pfn; start_pfn++) {
                if (pfn_valid(start_pfn)) {
                        struct page *page = pfn_to_page(start_pfn);

                        init_reserved_page(start_pfn, nid);

                        /*
                         * no need for atomic set_bit because the struct
                         * page is not visible yet so nobody should
                         * access it yet.
                         */
                        __SetPageReserved(page);
                }
        }
}

On platforms with large NOMAP regions (e.g. which are actually reserved
for guest memory to keep it out of the Linux address map and allow for
kexec-based live update of the hypervisor), this pointless loop ends up
taking a significant amount of time which is visible as guest steal
time during the live update.

Can reserve_bootmem_region() skip the loop *completely* if no PFN in
the range from start to end is valid? Or tweak the loop itself to have
an 'else' case which skips to the next valid PFN? Something like

 for(...) {
    if (pfn_valid(start_pfn)) {
       ...
    } else {
       start_pfn = next_valid_pfn(start_pfn);
    }
 }



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

  parent reply	other threads:[~2025-03-31 16:26 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-11 10:05 [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
2021-05-11 10:05 ` [PATCH v4 1/4] include/linux/mmzone.h: add documentation for pfn_valid() Mike Rapoport
2021-05-11 10:22   ` Ard Biesheuvel
2021-05-11 10:05 ` [PATCH v4 2/4] memblock: update initialization of reserved pages Mike Rapoport
2021-05-11 10:23   ` Ard Biesheuvel
2025-03-31 12:50   ` David Woodhouse [this message]
2025-03-31 14:50     ` Mike Rapoport
2025-03-31 15:13       ` David Woodhouse
2025-04-01 11:33         ` Mike Rapoport
2025-04-01 11:50           ` David Woodhouse
2025-04-01 13:19             ` Mike Rapoport
2025-04-02 20:18               ` [RFC PATCH 1/3] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() David Woodhouse
2025-04-02 20:18                 ` [RFC PATCH 2/3] mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM David Woodhouse
2025-04-03  6:19                   ` Mike Rapoport
2025-04-02 20:18                 ` [RFC PATCH 3/3] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM David Woodhouse
2025-04-03  6:24                   ` Mike Rapoport
2025-04-03  7:07                     ` David Woodhouse
2025-04-03  7:15                       ` David Woodhouse
2025-04-03 14:13                         ` Mike Rapoport
2025-04-03 14:17                           ` David Woodhouse
2025-04-03 14:25                             ` Mike Rapoport
2025-04-03 14:10                       ` Mike Rapoport
2025-04-03  6:19                 ` [RFC PATCH 1/3] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() Mike Rapoport
2021-05-11 10:05 ` [PATCH v4 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid() Mike Rapoport
2021-05-11 10:25   ` Ard Biesheuvel
2021-05-11 10:05 ` [PATCH v4 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
2021-05-11 10:26   ` Ard Biesheuvel
2021-05-11 23:40   ` Andrew Morton
2021-05-12  5:31     ` Mike Rapoport
2021-05-12  3:13 ` [PATCH v4 0/4] " Kefeng Wang
2021-05-12  7:00 ` Ard Biesheuvel
2021-05-12  7:33   ` Mike Rapoport
2021-05-12  7:59     ` Ard Biesheuvel
2021-05-12  8:32       ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f33c0b4517eaf5f36c515b92bdcb6170a4a576a.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=dssauerw@amazon.de \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=rppt@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).