Re: [PATCH] kho: add support for deferred struct page init

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mike Rapoport <rppt@kernel.org>
To: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pratyush Yadav <pratyush@kernel.org>,
	Evangelos Petrongonas <epetron@amazon.de>,
	Alexander Graf <graf@amazon.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jason Miu <jasonmiu@google.com>,
	linux-kernel@vger.kernel.org, kexec@lists.infradead.org,
	linux-mm@kvack.org, nh-open-source@amazon.com
Subject: Re: [PATCH] kho: add support for deferred struct page init
Date: Wed, 31 Dec 2025 11:46:39 +0200	[thread overview]
Message-ID: <aVTw_08W7zXR7Y0F@kernel.org> (raw)
In-Reply-To: <CA+CK2bCm=AGDSLTbs6etbqveFUHU80okE-bCT2zg20nrHXgHRQ@mail.gmail.com>

On Tue, Dec 30, 2025 at 01:21:31PM -0500, Pasha Tatashin wrote:
> On Tue, Dec 30, 2025 at 12:18 PM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Tue, Dec 30, 2025 at 11:18:12AM -0500, Pasha Tatashin wrote:
> > > On Tue, Dec 30, 2025 at 11:16 AM Mike Rapoport <rppt@kernel.org> wrote:
> > > >
> > > > On Tue, Dec 30, 2025 at 11:05:05AM -0500, Pasha Tatashin wrote:
> > > > > On Mon, Dec 29, 2025 at 4:03 PM Pratyush Yadav <pratyush@kernel.org> wrote:
> > > > > >
> > > > > > The magic is purely sanity checking. It is not used to decide anything
> > > > > > other than to make sure this is actually a KHO page. I don't intend to
> > > > > > change that. My point is, if we make sure the KHO pages are properly
> > > > > > initialized during MM init, then restoring can actually be a very cheap
> > > > > > operation, where you only do the sanity checking. You can even put the
> > > > > > magic check behind CONFIG_KEXEC_HANDOVER_DEBUG if you want, but I think
> > > > > > it is useful enough to keep in production systems too.
> > > > >
> > > > > It is part of a critical hotpath during blackout, should really be
> > > > > behind CONFIG_KEXEC_HANDOVER_DEBUG
> > > >
> > > > Do you have the numbers? ;-)
> > >
> > > The fastest reboot we can achieve is ~0.4s on ARM
> >
> > I meant the difference between assigning info.magic and skipping it.
> 
> It is proportional to the amount of preserved memory. Extra assignment
> for each page. In our fleet we have observed IOMMU page tables to be
> 20G in size. So, let's just assume it is 20G. That is: 20 * 1024^3 /

Do you see 400ms reboot times on machines that have 20G of IOMMU page
tables? That's impressive presuming the overall size of those machines. 

> 4096 = 5.24 million pages. If we access "struct page" only for the
> magic purpose, we fetch full 64-byte cacheline, which is 5.24 million
> * 64 bytes = 335 M, that is ~13ms with ~25G/s DRAM; and also each TLB
> miss will add some latency, 5.2M * 10ns = ~50ms. In total we can get
> 15ms ~ 50ms regression compared to 400ms, that is 4-12%. It will be
> less if we also access "struct page" for another reason at the same
> time, but still it adds up.

Your overhead calculations are based on the assumption that we don't
access struct page, but we do. We assign page->private during
deserialization and then initialize struct page during restore.
We get the hit of cache fetches and TLB misses anyway.

It would be interesting to see the difference *measured* on those large
systems.

> > > (shutdown+purgatory+boot), let's not add anything to regress, as every
> > > microsecond counts during blackout.
> >
> > Any added functionality adds cycles, this is inevitable. And neither KHO
> > nor LUO are near the completion, so we'll have to add functionality to both
> > of them. And the added functionality should be correct first and foremost.
> > And magic sanity check seems pretty useful and presumably cheap enough to
> > always keep it unless you see a real slowdown because of it.
> 
> Magic check is proportional to the amount of preserved memory. It is
> not a required functionality, only a sanity checking. I really do not
> see a reason to enable it in production. All other sanity struct page,
> and pg_flags related sanity checking are usually enabled with
> CONFIG_DEBUG_VM, so enabling it only with CONFIG_KEXEC_HANDOVER_DEBUG
> is better.

Having sanity checks in production could be useful because some errors
could be hard to reproduce in controlled environment with debug kernel.

Just last cycle there was commit 83c8f7b5e194 ("mm/mm_init: Introduce a
boot parameter for check_pages") that allows enabling page sanity checks in
production.

So my take is to keep the magic check until KHO/LUO mature at least.

-- 
Sincerely yours,
Mike.

next prev parent reply	other threads:[~2025-12-31  9:46 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-16  8:49 [PATCH] kho: add support for deferred struct page init Evangelos Petrongonas
2025-12-16 10:53 ` Pasha Tatashin
2025-12-16 11:57 ` Mike Rapoport
2025-12-16 14:26   ` Evangelos Petrongonas
2025-12-16 15:05   ` Pasha Tatashin
2025-12-16 15:19     ` Mike Rapoport
2025-12-16 15:36       ` Pasha Tatashin
2025-12-16 15:51         ` Pasha Tatashin
2025-12-20  2:27           ` Pratyush Yadav
2025-12-19  9:19         ` Mike Rapoport
2025-12-19 16:28           ` Pasha Tatashin
2025-12-20  3:20             ` Pratyush Yadav
2025-12-20 14:49               ` Pasha Tatashin
2025-12-22 15:33                 ` Pratyush Yadav
2025-12-22 15:55                   ` Pasha Tatashin
2025-12-22 16:24                     ` Pratyush Yadav
2025-12-23 17:37                       ` Pasha Tatashin
2025-12-29 21:03                         ` Pratyush Yadav
2025-12-30 16:05                           ` Pasha Tatashin
2025-12-30 16:16                             ` Mike Rapoport
2025-12-30 16:18                               ` Pasha Tatashin
2025-12-30 17:18                                 ` Mike Rapoport
2025-12-30 18:21                                   ` Pasha Tatashin
2025-12-31  9:46                                     ` Mike Rapoport [this message]
2025-12-30 16:14                           ` Mike Rapoport
  -- strict thread matches above, loose matches on Subject: below --
2025-12-24  7:34 Fadouse
2025-12-29 21:09 ` Pratyush Yadav
2025-12-30 15:05   ` Pasha Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aVTw_08W7zXR7Y0F@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=epetron@amazon.de \
    --cc=graf@amazon.com \
    --cc=jasonmiu@google.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nh-open-source@amazon.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=pratyush@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).