From: Mike Rapoport <rppt@kernel.org>
To: David Woodhouse <dwmw2@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Sauerwein, David" <dssauerw@amazon.de>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Ard Biesheuvel <ardb@kernel.org>,
Catalin Marinas <catalin.marinas@arm.com>,
David Hildenbrand <david@redhat.com>,
Marc Zyngier <maz@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Mike Rapoport <rppt@linux.ibm.com>, Will Deacon <will@kernel.org>,
kvmarm@lists.cs.columbia.edu,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v4 2/4] memblock: update initialization of reserved pages
Date: Tue, 1 Apr 2025 16:19:54 +0300 [thread overview]
Message-ID: <Z-vn-sMtNfwyJ9VW@kernel.org> (raw)
In-Reply-To: <1787b97c267b53127c60a61419d99751f8a66b1a.camel@infradead.org>
On Tue, Apr 01, 2025 at 12:50:33PM +0100, David Woodhouse wrote:
> On Tue, 2025-04-01 at 14:33 +0300, Mike Rapoport wrote:
> > On Mon, Mar 31, 2025 at 04:13:56PM +0100, David Woodhouse wrote:
> > > On Mon, 2025-03-31 at 17:50 +0300, Mike Rapoport wrote:
> > > > On Mon, Mar 31, 2025 at 01:50:33PM +0100, David Woodhouse wrote:
> > > > > On Tue, 2021-05-11 at 13:05 +0300, Mike Rapoport wrote:
> > > > >
> > > > > On platforms with large NOMAP regions (e.g. which are actually reserved
> > > > > for guest memory to keep it out of the Linux address map and allow for
> > > > > kexec-based live update of the hypervisor), this pointless loop ends up
> > > > > taking a significant amount of time which is visible as guest steal
> > > > > time during the live update.
> > > > >
> > > > > Can reserve_bootmem_region() skip the loop *completely* if no PFN in
> > > > > the range from start to end is valid? Or tweak the loop itself to have
> > > > > an 'else' case which skips to the next valid PFN? Something like
> > > > >
> > > > > for(...) {
> > > > > if (pfn_valid(start_pfn)) {
> > > > > ...
> > > > > } else {
> > > > > start_pfn = next_valid_pfn(start_pfn);
> > > > > }
> > > > > }
> > > >
> > > > My understanding is that you have large reserved NOMAP ranges that don't
> > > > appear as memory at all, so no memory map for them is created and so
> > > > pfn_valid() is false for pfns in those ranges.
> > > >
> > > > If this is the case one way indeed would be to make
> > > > reserve_bootmem_region() skip ranges with no valid pfns.
> > > >
> > > > Another way could be to memblock_reserved_mark_noinit() such ranges and
> > > > then reserve_bootmem_region() won't even get called, but that would require
> > > > firmware to pass that information somehow.
> > >
> > > I was thinking along these lines (not even build tested)...
> > >
> > > I don't much like the (unsigned long)-1 part. I might make the helper
> > > 'static inline bool first_valid_pfn (unsigned long *pfn)' and return
> > > success or failure. But that's an implementation detail.
> > >
> > > index 6d1fb6162ac1..edd27ba3e908 100644
> > > --- a/include/asm-generic/memory_model.h
> > > +++ b/include/asm-generic/memory_model.h
> > > @@ -29,8 +29,43 @@ static inline int pfn_valid(unsigned long pfn)
> > > return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr;
> > > }
> > > #define pfn_valid pfn_valid
> > > +
> > > +static inline unsigned long first_valid_pfn(unsigned long pfn)
> > > +{
> > > + /* avoid <linux/mm.h> include hell */
> > > + extern unsigned long max_mapnr;
> > > + unsigned long pfn_offset = ARCH_PFN_OFFSET;
> > > +
> > > + if (pfn < pfn_offset)
> > > + return pfn_offset;
> > > +
> > > + if ((pfn - pfn_offset) < max_mapnr)
> > > + return pfn;
> > > +
> > > + return (unsigned long)(-1);
> > > +}
> >
> > This seems about right for FLATMEM. For SPARSEMEM it would be something
> > along these lines (I kept dubious -1):
>
> Thanks. Is that right even with CONFIG_SPARSEMEM_VMEMMAP? It seems that
> it's possible for pfn_valid() to be false for a given *page*, but there
> may still be valid pages in the remainder of the same section in that
> case?
Right, it might after memory hot-remove. At boot the entire section either
valid or not.
> I think it should only skip to the next section if the current section
> doesn't exist at all, not just when pfn_section_valid() return false?
Yeah, when pfn_section_valid() returns false it should itereate pfns until
the end of the section and check if they are valid.
> I also wasn't sure how to cope with the rcu_read_lock_sched() that
> happens in pfn_valid(). What's that protecting against? Does it mean
> that by the time pfn_valid() returns true, that might not be the
> correct answer any more?
That's protecting against kfree_rcu() in section_deactivate() so even if
the answer is still correct, later access to apparently valid struct page
may blow up :/
> > static inline unsigned long first_valid_pfn(unsigned long pfn)
> > {
> > unsigned long nr = pfn_to_section_nr(pfn);
> >
> > do {
> > if (pfn_valid(pfn))
> > return pfn;
> > pfn = section_nr_to_pfn(nr++);
> > } while (nr < NR_MEM_SECTIONS);
> >
> > return (unsigned long)-1;
> > }
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2025-04-01 13:22 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-11 10:05 [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
2021-05-11 10:05 ` Mike Rapoport
2021-05-11 10:05 ` Mike Rapoport
2021-05-11 10:05 ` [PATCH v4 1/4] include/linux/mmzone.h: add documentation for pfn_valid() Mike Rapoport
2021-05-11 10:05 ` Mike Rapoport
2021-05-11 10:05 ` Mike Rapoport
2021-05-11 10:22 ` Ard Biesheuvel
2021-05-11 10:22 ` Ard Biesheuvel
2021-05-11 10:22 ` Ard Biesheuvel
2021-05-11 10:05 ` [PATCH v4 2/4] memblock: update initialization of reserved pages Mike Rapoport
2021-05-11 10:05 ` Mike Rapoport
2021-05-11 10:05 ` Mike Rapoport
2021-05-11 10:23 ` Ard Biesheuvel
2021-05-11 10:23 ` Ard Biesheuvel
2021-05-11 10:23 ` Ard Biesheuvel
2025-03-31 12:50 ` David Woodhouse
2025-03-31 14:50 ` Mike Rapoport
2025-03-31 15:13 ` David Woodhouse
2025-04-01 11:33 ` Mike Rapoport
2025-04-01 11:50 ` David Woodhouse
2025-04-01 13:19 ` Mike Rapoport [this message]
2025-04-02 20:18 ` [RFC PATCH 1/3] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() David Woodhouse
2025-04-02 20:18 ` [RFC PATCH 2/3] mm: Implement for_each_valid_pfn() for CONFIG_FLATMEM David Woodhouse
2025-04-03 6:19 ` Mike Rapoport
2025-04-02 20:18 ` [RFC PATCH 3/3] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM David Woodhouse
2025-04-03 6:24 ` Mike Rapoport
2025-04-03 7:07 ` David Woodhouse
2025-04-03 7:15 ` David Woodhouse
2025-04-03 14:13 ` Mike Rapoport
2025-04-03 14:17 ` David Woodhouse
2025-04-03 14:25 ` Mike Rapoport
2025-04-03 14:10 ` Mike Rapoport
2025-04-03 6:19 ` [RFC PATCH 1/3] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() Mike Rapoport
2021-05-11 10:05 ` [PATCH v4 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid() Mike Rapoport
2021-05-11 10:05 ` Mike Rapoport
2021-05-11 10:05 ` Mike Rapoport
2021-05-11 10:25 ` Ard Biesheuvel
2021-05-11 10:25 ` Ard Biesheuvel
2021-05-11 10:25 ` Ard Biesheuvel
2021-05-11 10:05 ` [PATCH v4 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
2021-05-11 10:05 ` Mike Rapoport
2021-05-11 10:05 ` Mike Rapoport
2021-05-11 10:26 ` Ard Biesheuvel
2021-05-11 10:26 ` Ard Biesheuvel
2021-05-11 10:26 ` Ard Biesheuvel
2021-05-11 23:40 ` Andrew Morton
2021-05-11 23:40 ` Andrew Morton
2021-05-11 23:40 ` Andrew Morton
2021-05-12 5:31 ` Mike Rapoport
2021-05-12 5:31 ` Mike Rapoport
2021-05-12 5:31 ` Mike Rapoport
2021-05-12 3:13 ` [PATCH v4 0/4] " Kefeng Wang
2021-05-12 3:13 ` Kefeng Wang
2021-05-12 3:13 ` Kefeng Wang
2021-05-12 7:00 ` Ard Biesheuvel
2021-05-12 7:00 ` Ard Biesheuvel
2021-05-12 7:00 ` Ard Biesheuvel
2021-05-12 7:33 ` Mike Rapoport
2021-05-12 7:33 ` Mike Rapoport
2021-05-12 7:33 ` Mike Rapoport
2021-05-12 7:59 ` Ard Biesheuvel
2021-05-12 7:59 ` Ard Biesheuvel
2021-05-12 7:59 ` Ard Biesheuvel
2021-05-12 8:32 ` Mike Rapoport
2021-05-12 8:32 ` Mike Rapoport
2021-05-12 8:32 ` Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z-vn-sMtNfwyJ9VW@kernel.org \
--to=rppt@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=ardb@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=david@redhat.com \
--cc=dssauerw@amazon.de \
--cc=dwmw2@infradead.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=rppt@linux.ibm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.