From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 811EBC36002 for ; Sun, 6 Apr 2025 16:34:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7A5936B0005; Sun, 6 Apr 2025 12:34:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 753D56B0006; Sun, 6 Apr 2025 12:34:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61DB16B0008; Sun, 6 Apr 2025 12:34:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3FFC56B0005 for ; Sun, 6 Apr 2025 12:34:49 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1BF901A02FF for ; Sun, 6 Apr 2025 16:34:49 +0000 (UTC) X-FDA: 83304167898.21.352A2ED Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf26.hostedemail.com (Postfix) with ESMTP id 7D80914000B for ; Sun, 6 Apr 2025 16:34:47 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=sHYEa8E+; spf=pass (imf26.hostedemail.com: domain of rppt@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743957287; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IFKKm7Ern/16b5YntZX6gbHyvUt9H3U/nSXxdIz5XvQ=; b=YrSZhC7iUsrZ+OsRx1c/BPYl+rnzhBWQAkwqQi+T8oODujAo4bnlTwMfL9EcisJPTviIQo o16AV7zaaryCHlnMLM6DuA0vPAbcfMkaE9YYnRthnse7pQDLjS5cI4djQxLOt0qDJqnlZa 1NYLi5MnUOe5k8SfP/JdGwZKJybbEbA= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=sHYEa8E+; spf=pass (imf26.hostedemail.com: domain of rppt@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743957287; a=rsa-sha256; cv=none; b=SKO07ZbMAs0ctbHTuYMhBYguu20P19mPIuCJyrvbdCriAuS0aIp5itoIpW4B9T6TwWsYrX AOBJ2iow0pdrws23sSdmmbZ1oLzu9sctsFdQlKt1FDATXCWVIvSYrQXxZ8yyoBGrwHcq5c hAOr1X67QaH7sCEZdVVFwW+Y2WXWqmE= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 35E1061138; Sun, 6 Apr 2025 16:34:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2EE96C4CEE3; Sun, 6 Apr 2025 16:34:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1743957286; bh=VXmaK1KEQ4OdnNW6qkbtDrEkJuNzVNaxsi9xvoHemA4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=sHYEa8E+WecBvlODItuw2b7YX/6tTcKkPkVY5c8hgfkEY8qrvZS4g0a3Ps36mtTc7 C1ll8nDNb2+K9tEtN6sZSs+7NVh3o6k6Kg/4wianbIJby8nQu2r7eDW3P4ypSyCXGB +PTm6pWbv464BBckQVvKgFmhVS12NRwvpw2dOxLPxKRX/NeHfTRBeqj47VFsu/qaTq kDdF/2+c/Pr6gHEjMytyRvKIWHvox2DCJh1vPUVQTXmvT+5ihdqD2WA5RflpUqRy7u XiEBaOkZARTvBBRMrCpj6S6HvzxefWgH6WKx/07uPfbphejPBi1964kPEwB7U3GezU UycEnTm5Y3e0A== Date: Sun, 6 Apr 2025 19:34:30 +0300 From: Mike Rapoport To: Pratyush Yadav Cc: Jason Gunthorpe , Changyuan Lyu , linux-kernel@vger.kernel.org, graf@amazon.com, akpm@linux-foundation.org, luto@kernel.org, anthony.yznaga@oracle.com, arnd@arndb.de, ashish.kalra@amd.com, benh@kernel.crashing.org, bp@alien8.de, catalin.marinas@arm.com, dave.hansen@linux.intel.com, dwmw2@infradead.org, ebiederm@xmission.com, mingo@redhat.com, jgowans@amazon.com, corbet@lwn.net, krzk@kernel.org, mark.rutland@arm.com, pbonzini@redhat.com, pasha.tatashin@soleen.com, hpa@zytor.com, peterz@infradead.org, robh+dt@kernel.org, robh@kernel.org, saravanak@google.com, skinsburskii@linux.microsoft.com, rostedt@goodmis.org, tglx@linutronix.de, thomas.lendacky@amd.com, usama.arif@bytedance.com, will@kernel.org, devicetree@vger.kernel.org, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org Subject: Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation Message-ID: References: <20250320015551.2157511-1-changyuanl@google.com> <20250320015551.2157511-10-changyuanl@google.com> <20250403114209.GE342109@nvidia.com> <20250403142438.GF342109@nvidia.com> <20250404124729.GH342109@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 7D80914000B X-Stat-Signature: trqwjdbefjt9msnexdsyu8erwigddzk9 X-HE-Tag: 1743957287-589618 X-HE-Meta: U2FsdGVkX1//Zs12S99VVutpdQCrkI3bRxacLPr2BagX3B3i0e4r6BcDT+TICRI6KlIVW1CwJ4o6DBSO5/QiWGTm8zuAE9TXLk3nEEEa/46mHBFvEjbHoFOTexJUYuJPkkCqGCCoKMqkPCcTa3jtDmXLXCI5q3q8e4rydpvUByIPdvcdFrqdnkntS9Khry2pjDPXSDyectAQpvQsWq1+0u/l8vJwn1MzWW2dW9DZYgOqv+u9AmOHGxi1MKooNjADsD2MXb51q8gvhtoyhx+J6l/d0hF6gB6B6H7S4sxz4/aGe0RqI8AV5sVrYuT432UxrZrfx/h6RKMyIAL81cZkdZ/ZsRF8cPOJHFMDJdC/JBi9myKwD+tj1P63Gk/Fwy9FgSvkV3xD7Ea4lh6Nmtn9wgeDW/gHIy/FFWuV2zUfNiZSXBK1hYGf5rwlRMGqNctqIUSPy5IT/q0v1HcPOMMG8iYKN6FX3QlHQZx2VL9V0V/xzxdUjKCKKdmuaPGxYSQiVh/o9I0JrJqXPjpTeQmtJUwOCk5GnpUNe6AohneqMXA8tfeUoWRJIOoLqyiH6w4xoVqgKVVmogTYXAlNMNXmGZhzItqwKwZbFxB8aYNPtWIvMrnnle7d+ES5yVWagkDog+0BAuPc51NchoEKSLhzDJxlndw6M2ntd++X5MT/emUz1hdRo4qral2SrGZY7usrf4q/KEZ2PXxBqCRr7RGIWbQKqjV1pxvqheHVEf5xBGY7xTu+HBAgTwPJh28r42bdNulb78OQtsEELQdPZoqRH84y4cFLGxsiwd5O3HckJdr8Yo95IQbhqqaifFdCVqNaNV2oajCm7uv9+tXvwmkR2DUHlQQir3ZdS+7vCyykOht4QA0nOTEyaTZ3qGB/5dTH4PH7g66SYctg8XeQ3OO5LUF/JiYMZvtBP6ExGbdz4p7xL2iQjM24gC5z51FOsSWzwdryO8FH/DdiQnYAHE9 SER4Pwp0 +nGXlGMXMzKjXUj4rzWk4mNjRT6uQRSuiB9Thr3EfBuce5Y0hPGzA5uggCI+5/GP8x3CKMM2n+WzgvF4ZiqhIZqdnyDRkWpw0QlhaWDUZS6nd59CUhfQpbOAoWKdGm1uSz43rhgEuAiPvNnP1lICpuOpAN/VRFM/2HgIED7ejiTPp95t4DbXGL9xi3mNwrSl/ZupK0J8KQAdThHb91w6H39GROrjxzYWWRI9kpCgJfV1XkMNFYSZUnrFhOKyqsaRSZ/69Ipy7blGDNbh4vrQr7q9fVGQOBAIPXs1gd4/dBax0tPdjYWrfYJ+WnpioITNVEJWcx5lKZ1bF6o4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 04, 2025 at 04:15:28PM +0000, Pratyush Yadav wrote: > Hi Mike, > > On Fri, Apr 04 2025, Mike Rapoport wrote: > > [...] > > As for the optimizations of memblock reserve path, currently it what hurts > > the most in my and Pratyush experiments. They are not very representative, > > but still, preserving lots of pages/folios spread all over would have it's > > toll on the mm initialization. And I don't think invasive changes to how > > buddy and memory map initialization are the best way to move forward and > > optimize that. Quite possibly we'd want to be able to minimize amount of > > *ranges* that we preserve. > > > > So from the three alternatives we have now (xarrays + bitmaps, tables + > > bitmaps and maple tree for ranges) maple tree seems to be the simplest and > > efficient enough to start with. > > But you'd need to somehow serialize the maple tree ranges into some > format. So you would either end up going back to the kho_mem ranges we > had, or have to invent something more complex. The sample code you wrote > is pretty much going back to having kho_mem ranges. It's a bit better and it's not a part of FDT which Jason was so much against :) > And if you say that we should minimize the amount of ranges, the table + > bitmaps is still a fairly good data structure. You can very well have a > higher order table where your entire range is a handful of bits. This > lets you track a small number of ranges fairly efficiently -- both in > terms of memory and in terms of CPU. I think the only place where it > doesn't work as well as a maple tree is if you want to merge or split a > lot ranges quickly. But if you say that you only want to have a handful > of ranges, does that really matter? Until we all agree that we are bypassing memblock_reserve() and reimplementing memory map and free lists initialization for KHO we must minimize the amount of memblock_reserve() calls. And maple tree allows easily merge ranges where appropriate resulting in much smaller amount of ranges that kho_mem had. > Also, I think the allocation pattern depends on which use case you have > in mind. For hypervisor live update, you might very well only have a > handful of ranges. The use case I have in mind is for taking a userspace > process, quickly checkpointing it by dumping its memory contents to a > memfd, and restoring it after KHO. For that, the ability to do random > sparse allocations quickly helps a lot. > > So IMO the table works well for both sparse and dense allocations. So > why have a data structure that only solves one problem when we can have > one that solves both? And honestly, I don't think the table is that much > more complex either -- both in terms of understanding the idea and in > terms of code -- the whole thing is like 200 lines. It's more than 200 line longer than maple tree if we count the lines. My point is both table and xarrays are trying to optimize for an unknown goal. kho_mem with all it's drawbacks was an obvious baseline. Maple tree improves that baseline and it is more straightforward than the alternatives. > Also, I think changes to buddy initialization _is_ the way to optimize > boot times. Having maple tree ranges and moving them around into > memblock ranges does not really scale very well for anything other than > a handful of ranges, and we shouldn't limit ourselves to that without > good reason. As I said, this means an alternative implementation of the memory map and free lists, which has been and remains quite fragile. So we'd better start with something that does not require that in the roadmap. > -- > Regards, > Pratyush Yadav -- Sincerely yours, Mike.