From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF89022301 for ; Wed, 31 Dec 2025 09:46:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767174407; cv=none; b=RyRl6F2Y4dyNmbS9LWAP5J8isTzh6BwYQNOgYnbBdd/cVMVxM2PhmKgQZ24IRbu+spXXfEtrskAXZqMl7x/YVTc3OZuU8kjv4y3EbmsYRMWaUGAtZ1KGlnUPanLG/sMp1r09qptB0JbEdCfjtwVYNy4T9rzsxnL3VfEo0z+1aj4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767174407; c=relaxed/simple; bh=Js4gGVpGtn7C+f3YQhPLQtxd+Rpk3HniIelYKQQwOw4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=GpDhk6EWnpI76gajDHw6YeUQhQW30pStPY7XewKpNRJJPg/vWSr9vUh/uG4vNYVZkqJ+1xtCk5UwcOXQwqs6429gkwIRODdf/5lmSnt23soHRyJrhMQdtbqPsreq/AMkEg7j4wokzKaUJGZn2+HP21kMhlimFQh3ULsPAyu/Rnc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VSe/zxSD; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VSe/zxSD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 05723C113D0; Wed, 31 Dec 2025 09:46:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1767174406; bh=Js4gGVpGtn7C+f3YQhPLQtxd+Rpk3HniIelYKQQwOw4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=VSe/zxSDWx34KjwRibmxVhaeoXwJc7vuj1CDNSJId3c2TQLzK6sKsUapbmCiuwwvR fyJpfMDnR8SDywH+tRLlataLTsKEeSz8BqXp8Jaw+nWFiWCqO415Jjvm1xnRncfgaf suBdOV+DX3COyxwr9fW360XBr/n01Hd3E4ULQoykp4RHuEc9JGdjsLPI3rd2pdLnGB 7z24kT3OG3AkMlWMQJPEOL9Zmxlhh7CPMXOhnGTc2E+E7t1OL2gL0l5lAKuE/pt2/y kvg9uD40StmRSVnI1EfCiqw2nWitUEvF9Rh8kjXV9/2/TXGZMPXUtPzaAW80yIhbkt DYSG5Henp/leQ== Date: Wed, 31 Dec 2025 11:46:39 +0200 From: Mike Rapoport To: Pasha Tatashin Cc: Pratyush Yadav , Evangelos Petrongonas , Alexander Graf , Andrew Morton , Jason Miu , linux-kernel@vger.kernel.org, kexec@lists.infradead.org, linux-mm@kvack.org, nh-open-source@amazon.com Subject: Re: [PATCH] kho: add support for deferred struct page init Message-ID: References: <86jyyecyzh.fsf@kernel.org> <863452cwns.fsf@kernel.org> <864ip99f1a.fsf@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Tue, Dec 30, 2025 at 01:21:31PM -0500, Pasha Tatashin wrote: > On Tue, Dec 30, 2025 at 12:18 PM Mike Rapoport wrote: > > > > On Tue, Dec 30, 2025 at 11:18:12AM -0500, Pasha Tatashin wrote: > > > On Tue, Dec 30, 2025 at 11:16 AM Mike Rapoport wrote: > > > > > > > > On Tue, Dec 30, 2025 at 11:05:05AM -0500, Pasha Tatashin wrote: > > > > > On Mon, Dec 29, 2025 at 4:03 PM Pratyush Yadav wrote: > > > > > > > > > > > > The magic is purely sanity checking. It is not used to decide anything > > > > > > other than to make sure this is actually a KHO page. I don't intend to > > > > > > change that. My point is, if we make sure the KHO pages are properly > > > > > > initialized during MM init, then restoring can actually be a very cheap > > > > > > operation, where you only do the sanity checking. You can even put the > > > > > > magic check behind CONFIG_KEXEC_HANDOVER_DEBUG if you want, but I think > > > > > > it is useful enough to keep in production systems too. > > > > > > > > > > It is part of a critical hotpath during blackout, should really be > > > > > behind CONFIG_KEXEC_HANDOVER_DEBUG > > > > > > > > Do you have the numbers? ;-) > > > > > > The fastest reboot we can achieve is ~0.4s on ARM > > > > I meant the difference between assigning info.magic and skipping it. > > It is proportional to the amount of preserved memory. Extra assignment > for each page. In our fleet we have observed IOMMU page tables to be > 20G in size. So, let's just assume it is 20G. That is: 20 * 1024^3 / Do you see 400ms reboot times on machines that have 20G of IOMMU page tables? That's impressive presuming the overall size of those machines. > 4096 = 5.24 million pages. If we access "struct page" only for the > magic purpose, we fetch full 64-byte cacheline, which is 5.24 million > * 64 bytes = 335 M, that is ~13ms with ~25G/s DRAM; and also each TLB > miss will add some latency, 5.2M * 10ns = ~50ms. In total we can get > 15ms ~ 50ms regression compared to 400ms, that is 4-12%. It will be > less if we also access "struct page" for another reason at the same > time, but still it adds up. Your overhead calculations are based on the assumption that we don't access struct page, but we do. We assign page->private during deserialization and then initialize struct page during restore. We get the hit of cache fetches and TLB misses anyway. It would be interesting to see the difference *measured* on those large systems. > > > (shutdown+purgatory+boot), let's not add anything to regress, as every > > > microsecond counts during blackout. > > > > Any added functionality adds cycles, this is inevitable. And neither KHO > > nor LUO are near the completion, so we'll have to add functionality to both > > of them. And the added functionality should be correct first and foremost. > > And magic sanity check seems pretty useful and presumably cheap enough to > > always keep it unless you see a real slowdown because of it. > > Magic check is proportional to the amount of preserved memory. It is > not a required functionality, only a sanity checking. I really do not > see a reason to enable it in production. All other sanity struct page, > and pg_flags related sanity checking are usually enabled with > CONFIG_DEBUG_VM, so enabling it only with CONFIG_KEXEC_HANDOVER_DEBUG > is better. Having sanity checks in production could be useful because some errors could be hard to reproduce in controlled environment with debug kernel. Just last cycle there was commit 83c8f7b5e194 ("mm/mm_init: Introduce a boot parameter for check_pages") that allows enabling page sanity checks in production. So my take is to keep the magic check until KHO/LUO mature at least. -- Sincerely yours, Mike.