From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20005C36008 for ; Wed, 26 Mar 2025 11:59:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9084280077; Wed, 26 Mar 2025 07:59:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C3D72280069; Wed, 26 Mar 2025 07:59:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0561280077; Wed, 26 Mar 2025 07:59:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9388F280069 for ; Wed, 26 Mar 2025 07:59:47 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 76D04B88B0 for ; Wed, 26 Mar 2025 11:59:48 +0000 (UTC) X-FDA: 83263558056.27.E74496F Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf28.hostedemail.com (Postfix) with ESMTP id CF09CC0009 for ; Wed, 26 Mar 2025 11:59:46 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=LhPUdmq1; spf=pass (imf28.hostedemail.com: domain of rppt@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742990386; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Oa6CEdQgzAg5lBq2N6XQeKHkfOX+obSEnZeTTxS3+iE=; b=s39NWtfowiSmIYjyX1ovx8KwHRxi5XiYGGDdMO3EBX8QWIW4G9gdmW4X7rxfEsmHn4GImF 5dBJsuK9jAby7u0Nzmj9de5VouH+zff17uOnETQKnzNv749f9dzDo7bmyQE9eH8+f361t1 C4OLXNONt2St3XNFXLB73r7Enflk5mc= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=LhPUdmq1; spf=pass (imf28.hostedemail.com: domain of rppt@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742990386; a=rsa-sha256; cv=none; b=WDMpBazlflhNc2R+k1Dsu9uFRHIsiaRCFD8/bqlfxgBXkXEFhwXQj18U460/5YdhvNuoWI 5DZTquZr1OYIh7qXaXJTTgq4kiJt79JrODgbvNciROU8ICAsGA74SYlAMGn+WyR3eMPZW1 04CgVCwmOv6g5w0YWqDn6scnLot+LTw= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 3BEDE6112F; Wed, 26 Mar 2025 11:59:41 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3E8AEC4CEE2; Wed, 26 Mar 2025 11:59:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742990385; bh=cjxrsIzX/WPyXr7qdJpErRGIpF/kZxl22HylGb7mEwc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=LhPUdmq1h7gvCSyDJ4n5B+4eRaGO5moH9CfG7jg7AQW5aPyeCDq2hBlFxoYjjVSlZ VC/V+Bixj8HZzyowNqA066Z5Np/7rN3pTPMilK3q5mgy0LnHOq+14v86HCYR2l+yn6 e6g8oVWNq4Oy9Ij3p/6BCYiNZnS5659ckKwyTePqMI8tZZz0VPq9KTqvsWWEwQ9zsJ f/61VbQQ9B4OxQWSMiWDMS+ss5uyO8eyB1ZikVuHGAoUaYZ7BLWyzPftmXh+1uIeY+ ru6EG6DyITSaRWZbJTdd9xC6DquohsYrSIlfLjsLjC60xDWGZmxOIvftTBoNzZfTMB 7ZVrb394mhsAg== Date: Wed, 26 Mar 2025 07:59:40 -0400 From: Mike Rapoport To: Frank van der Linden Cc: Changyuan Lyu , linux-kernel@vger.kernel.org, graf@amazon.com, akpm@linux-foundation.org, luto@kernel.org, anthony.yznaga@oracle.com, arnd@arndb.de, ashish.kalra@amd.com, benh@kernel.crashing.org, bp@alien8.de, catalin.marinas@arm.com, dave.hansen@linux.intel.com, dwmw2@infradead.org, ebiederm@xmission.com, mingo@redhat.com, jgowans@amazon.com, corbet@lwn.net, krzk@kernel.org, mark.rutland@arm.com, pbonzini@redhat.com, pasha.tatashin@soleen.com, hpa@zytor.com, peterz@infradead.org, ptyadav@amazon.de, robh+dt@kernel.org, robh@kernel.org, saravanak@google.com, skinsburskii@linux.microsoft.com, rostedt@goodmis.org, tglx@linutronix.de, thomas.lendacky@amd.com, usama.arif@bytedance.com, will@kernel.org, devicetree@vger.kernel.org, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org Subject: Re: [PATCH v5 07/16] kexec: add Kexec HandOver (KHO) generation helpers Message-ID: References: <20250320015551.2157511-1-changyuanl@google.com> <20250320015551.2157511-8-changyuanl@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: CF09CC0009 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: j363mwekmwzeosq3jzdttdomu7uh1prp X-HE-Tag: 1742990386-179031 X-HE-Meta: U2FsdGVkX19yCjOgV80Y7UBIG+WATNgA+i1QGRfOlJUj+zoyzKgngh5HUT0sIcwcX1n6xWUYR4HJCvpNaZlzhD7ArZ7379wLIuK37VUW+ky1K1vUHPeSm7IaIavedhwkxK/f1q82qO4MxK4MFhQm25602OlFA+lcRXygQNubfz6cxBi6I5vl3500NS+tCK59IDz3E5R08hdaZDprCUWo+Hd5ZhGj1xIQM0eVKPYFlY9WiaCRd3eWYYOgosEj8b1p2eitGEBPtfadI0trgCmevOScadcUPR2BytgOCXmlxqpFLfZIShJrIG3UhqZiVYAxDgBmDg3Y5n8uxoweM+flsrlnbLgyqisvz6EKQW5XSVs2/iJTNIuvJ1Ht2Wlx63WcvIquN3kG0iJcCHMNw6r+Gf79ZPc1+83zcay4rGSp89d1QR/+rMvK3hjWzvPVFJxThYimfwdurYOLWTS/8J2N4YxiZvz8ogPFUIriUQGnsTUptGDNPUI0y/RBLTWPf5EVrp0ttJt77hL4BNq7X+FEWOVUhmwfh3mWuA5urCPSP7lOm0lqQ32wsB/sDJSpNYXCJq2MDVQO4mdxmroEDh51588Aj22Iz8McSHPUszlWh3KFaEr53+iMUxgqD6cWHAye27y3nyCRHj/9nohcvVpEWOCqLC5W3UmNlVykAwF5RJ4x5Kl9sUZ9UbDwT1t6G9hg4mAQaFzjs9/u1iHmSM/IR2WVUu5L7g3MJy6wUvHv6tghZQIfxib7Gw8CfK/Fn5iTEKmcaDJkVM9pM7mbvHUPHRCgYN66YC6W4hz1eDwD6UGzmqskcPoge2Q4xJHSfDGz4/KLS9/0fshouh1lLUqC05DBGXHo7qrgTbF9tXxmQkT8VScCYCln5v6oD/EqDUUGp6nHY3HiQP68SFgTXk0m8tBDZS4e3Gtx3xn8NUXvcCXY/fF443e/SgolzzQSa+VIIsd/MVVFDnaBpqYpoen OWEXDvRr 5Tp9FhyT3/ksreZYiNt+ka1d/RYVenIm5Jy01ph/Z5JDQ8vFyNG4zfsWT9+rCR8BZIblNAHsDoL5XaM24v5MAmHwMdxQWrFer34ceLNQRfF4DfED7Cfir/pH7/TSG13ryZQT6QvC4R22x/DIZR12Pu8Ezv7/w4Zu0IHM5WOH6hadRLR68rL9giS+ktsbYia3PxM9P8xPx/i9DVQMa2Y2O160LRFHPT6yIsFJn9TxDEhpgJU3H1FIMZQU568ulrEzTS4Qq9G4MxOdUkAVjcEeDMX5qEG8QM9pIGv679FwtdfkH2zvPFvzYpvPU4XOq/+GTDo1oTNX4rE/Qn5EggEg81Dn4n7ZcmAGSOeG8QKsSqG3CkPm509JoHfPNYA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 25, 2025 at 02:56:52PM -0700, Frank van der Linden wrote: > On Tue, Mar 25, 2025 at 12:19 PM Mike Rapoport wrote: > > > > On Mon, Mar 24, 2025 at 11:40:43AM -0700, Frank van der Linden wrote: > [...] > > > Thanks for the work on this. > > > > > > Obviously it needs to happen while memblock is still active - but why > > > as close as possible to buddy initialization? > > > > One reason is to have all memblock allocations done to autoscale the > > scratch area. Another reason is to keep memblock structures small as long > > as possible as memblock_reserve()ing the preserved memory would quite > > inflate them. > > > > And it's overall simpler if memblock only allocates from scratch rather > > than doing some of early allocations from scratch and some elsewhere and > > still making sure they avoid the preserved ranges. > > Ah, thanks, I see the argument for the scratch area sizing. > > > > > > Ordering is always a sticky issue when it comes to doing things during > > > boot, of course. In this case, I can see scenarios where code that > > > runs a little earlier may want to use some preserved memory. The > > > > Can you elaborate about such scenarios? > > There has, for example, been some talk about making hugetlbfs > persistent. You could have hugetlb_cma active. The hugetlb CMA areas > are set up quite early, quite some time before KHO restores memory. So > that would have to be changed somehow if the location of the KHO init > call would remain as close as possible to buddy init as possible. I > suspect there may be other uses. I think we can address this when/if implementing preservation for hugetlbfs and it will be tricky. If hugetlb in the first kernel uses a lot of memory, we just won't have enough scratch space for early hugetlb reservations in the second kernel regardless of hugetlb_cma. On the other hand, we already have the preserved hugetlbfs memory, so we'd probably need to reserve less memory in the second kernel. But anyway, it's completely different discussion about how to preserve hugetlbfs. > > > current requirement in the patch set seems to be "after sparse/page > > > init", but I'm not sure why it needs to be as close as possibly to > > > buddy init. > > > > Why would you say that sparse/page init would be a requirement here? > > At least in its current form, the KHO code expects vmemmap to be > initialized, as it does its restore base on page structures, as > deserialize_bitmap expects them. I think the use of the page->private > field was discussed in a separate thread, I think. If that is done > differently, it wouldn't rely on vmemmap being initialized. In the current form KHO does relies on vmemmap being allocated, but it does not rely on it being initialized. Marking memblock ranges NOINT ensures nothing touches the corresponding struct pages and KHO can use their fields up to the point the memory is returned to KHO callers. > A few more things I've noticed (not sure if these were discussed before): > > * Should KHO depend on CONFIG_DEFERRED_STRUCT_PAGE_INIT? Essentially, > marking memblock ranges as NOINIT doesn't work without > DEFERRED_STRUCT_PAGE_INIT. Although, if the page->private use > disappears, this wouldn't be an issue anymore. It does. memmap_init_reserved_pages() is called always, no matter of CONFIG_DEFERRED_STRUCT_PAGE_INIT is set or not and it skips initialization of NOINIT regions. > * As a future extension, it could be nice to store vmemmap init > information in the KHO FDT. Then you can use that to init ranges in an > optimized way (HVO hugetlb or DAX-style persisted ranges) straight > away. These days memmap contents is unstable because of the folio/memdesc project, but in general carrying memory map data from kernel to kernel is indeed something to consider. > - Frank -- Sincerely yours, Mike.