All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pratyush Yadav <pratyush@kernel.org>,
	Evangelos Petrongonas <epetron@amazon.de>,
	Alexander Graf <graf@amazon.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jason Miu <jasonmiu@google.com>,
	linux-kernel@vger.kernel.org, kexec@lists.infradead.org,
	linux-mm@kvack.org, nh-open-source@amazon.com
Subject: Re: [PATCH] kho: add support for deferred struct page init
Date: Wed, 31 Dec 2025 11:46:39 +0200	[thread overview]
Message-ID: <aVTw_08W7zXR7Y0F@kernel.org> (raw)
In-Reply-To: <CA+CK2bCm=AGDSLTbs6etbqveFUHU80okE-bCT2zg20nrHXgHRQ@mail.gmail.com>

On Tue, Dec 30, 2025 at 01:21:31PM -0500, Pasha Tatashin wrote:
> On Tue, Dec 30, 2025 at 12:18 PM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Tue, Dec 30, 2025 at 11:18:12AM -0500, Pasha Tatashin wrote:
> > > On Tue, Dec 30, 2025 at 11:16 AM Mike Rapoport <rppt@kernel.org> wrote:
> > > >
> > > > On Tue, Dec 30, 2025 at 11:05:05AM -0500, Pasha Tatashin wrote:
> > > > > On Mon, Dec 29, 2025 at 4:03 PM Pratyush Yadav <pratyush@kernel.org> wrote:
> > > > > >
> > > > > > The magic is purely sanity checking. It is not used to decide anything
> > > > > > other than to make sure this is actually a KHO page. I don't intend to
> > > > > > change that. My point is, if we make sure the KHO pages are properly
> > > > > > initialized during MM init, then restoring can actually be a very cheap
> > > > > > operation, where you only do the sanity checking. You can even put the
> > > > > > magic check behind CONFIG_KEXEC_HANDOVER_DEBUG if you want, but I think
> > > > > > it is useful enough to keep in production systems too.
> > > > >
> > > > > It is part of a critical hotpath during blackout, should really be
> > > > > behind CONFIG_KEXEC_HANDOVER_DEBUG
> > > >
> > > > Do you have the numbers? ;-)
> > >
> > > The fastest reboot we can achieve is ~0.4s on ARM
> >
> > I meant the difference between assigning info.magic and skipping it.
> 
> It is proportional to the amount of preserved memory. Extra assignment
> for each page. In our fleet we have observed IOMMU page tables to be
> 20G in size. So, let's just assume it is 20G. That is: 20 * 1024^3 /

Do you see 400ms reboot times on machines that have 20G of IOMMU page
tables? That's impressive presuming the overall size of those machines. 

> 4096 = 5.24 million pages. If we access "struct page" only for the
> magic purpose, we fetch full 64-byte cacheline, which is 5.24 million
> * 64 bytes = 335 M, that is ~13ms with ~25G/s DRAM; and also each TLB
> miss will add some latency, 5.2M * 10ns = ~50ms. In total we can get
> 15ms ~ 50ms regression compared to 400ms, that is 4-12%. It will be
> less if we also access "struct page" for another reason at the same
> time, but still it adds up.

Your overhead calculations are based on the assumption that we don't
access struct page, but we do. We assign page->private during
deserialization and then initialize struct page during restore.
We get the hit of cache fetches and TLB misses anyway.

It would be interesting to see the difference *measured* on those large
systems.

> > > (shutdown+purgatory+boot), let's not add anything to regress, as every
> > > microsecond counts during blackout.
> >
> > Any added functionality adds cycles, this is inevitable. And neither KHO
> > nor LUO are near the completion, so we'll have to add functionality to both
> > of them. And the added functionality should be correct first and foremost.
> > And magic sanity check seems pretty useful and presumably cheap enough to
> > always keep it unless you see a real slowdown because of it.
> 
> Magic check is proportional to the amount of preserved memory. It is
> not a required functionality, only a sanity checking. I really do not
> see a reason to enable it in production. All other sanity struct page,
> and pg_flags related sanity checking are usually enabled with
> CONFIG_DEBUG_VM, so enabling it only with CONFIG_KEXEC_HANDOVER_DEBUG
> is better.

Having sanity checks in production could be useful because some errors
could be hard to reproduce in controlled environment with debug kernel.

Just last cycle there was commit 83c8f7b5e194 ("mm/mm_init: Introduce a
boot parameter for check_pages") that allows enabling page sanity checks in
production.

So my take is to keep the magic check until KHO/LUO mature at least.

-- 
Sincerely yours,
Mike.


  reply	other threads:[~2025-12-31  9:46 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-16  8:49 [PATCH] kho: add support for deferred struct page init Evangelos Petrongonas
2025-12-16 10:53 ` Pasha Tatashin
2025-12-16 11:57 ` Mike Rapoport
2025-12-16 14:26   ` Evangelos Petrongonas
2025-12-16 15:05   ` Pasha Tatashin
2025-12-16 15:19     ` Mike Rapoport
2025-12-16 15:36       ` Pasha Tatashin
2025-12-16 15:51         ` Pasha Tatashin
2025-12-20  2:27           ` Pratyush Yadav
2025-12-19  9:19         ` Mike Rapoport
2025-12-19 16:28           ` Pasha Tatashin
2025-12-20  3:20             ` Pratyush Yadav
2025-12-20 14:49               ` Pasha Tatashin
2025-12-22 15:33                 ` Pratyush Yadav
2025-12-22 15:55                   ` Pasha Tatashin
2025-12-22 16:24                     ` Pratyush Yadav
2025-12-23 17:37                       ` Pasha Tatashin
2025-12-29 21:03                         ` Pratyush Yadav
2025-12-30 16:05                           ` Pasha Tatashin
2025-12-30 16:16                             ` Mike Rapoport
2025-12-30 16:18                               ` Pasha Tatashin
2025-12-30 17:18                                 ` Mike Rapoport
2025-12-30 18:21                                   ` Pasha Tatashin
2025-12-31  9:46                                     ` Mike Rapoport [this message]
2026-01-02 14:24                                       ` Pratyush Yadav
2026-01-02 14:05                             ` Pratyush Yadav
2025-12-30 16:14                           ` Mike Rapoport
2026-01-03  5:23                           ` Jason Miu
2026-02-04 18:44 ` Mike Rapoport
2026-02-05  9:39   ` Evangelos Petrongonas
  -- strict thread matches above, loose matches on Subject: below --
2025-12-24  7:34 Fadouse
2025-12-29 21:09 ` Pratyush Yadav
2025-12-30 15:05   ` Pasha Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aVTw_08W7zXR7Y0F@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=epetron@amazon.de \
    --cc=graf@amazon.com \
    --cc=jasonmiu@google.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nh-open-source@amazon.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=pratyush@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.