From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 10033C369A1 for ; Fri, 4 Apr 2025 16:29:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: Message-ID:Date:References:In-Reply-To:Subject:CC:To:From:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=8M1kcrllrPJGqNpSTkGjHsOAwcJ4P4JfrDK2o2gPRAw=; b=B5Cr8DG3F/5/O17ejMQAlRnRPI lLDyGPSJnayszT3RprCAsrtSpvBtBpdRlgA7xi5oElZsifsRxc0XwWdnZ39e8J4Wpkp4BMyD05spb NLpX+JsGm7RF8nN83RbWvLKwRn8bSevmGwY5fjENnJjhSMsEMN7VdsFU4Mcmyr1Va/q6/TR7BC5Vq XKNf239cgPkpUBEbZ+PFvJdwIY4b0xFVvgVLRNX3KfgaOMMzszE1k88r+HGT0Y+zPATEv12yWMrLg Z7Lc0u/TOqv5/jIziqtyASfbAfVLurx6b0wbqCTWWfUfqDzHirZRHcQZjhFboKTRZ5mtR7Gy1kV+T iDOEWXkA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.1 #2 (Red Hat Linux)) id 1u0jvA-0000000CIjj-2YJm; Fri, 04 Apr 2025 16:29:32 +0000 Received: from smtp-fw-9105.amazon.com ([207.171.188.204]) by bombadil.infradead.org with esmtps (Exim 4.98.1 #2 (Red Hat Linux)) id 1u0jqy-0000000CIDE-06fC; Fri, 04 Apr 2025 16:25:13 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1743783912; x=1775319912; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=8M1kcrllrPJGqNpSTkGjHsOAwcJ4P4JfrDK2o2gPRAw=; b=GRrTQdV/oqMyDHVDf2wgm4Zc24mJeZVF0GWGtisl/D6UJl/3VDLEH5qs BkC6KousddgRX3CetGXbI5sG5NFwKG68CK3UmP19TUp6FRHi1Tkgb2XhS SFagrcb/OegaDa5ILUtBw5PzFSeg0eFfIqxpZUxtBrfUe0Jc8fPTnACls 8=; X-IronPort-AV: E=Sophos;i="6.15,188,1739836800"; d="scan'208";a="7655191" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-9105.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Apr 2025 16:25:05 +0000 Received: from EX19MTAUWB002.ant.amazon.com [10.0.21.151:19358] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.43.57:2525] with esmtp (Farcaster) id 41e9b48e-af72-465f-a631-5054d723289f; Fri, 4 Apr 2025 16:25:04 +0000 (UTC) X-Farcaster-Flow-ID: 41e9b48e-af72-465f-a631-5054d723289f Received: from EX19D020UWA001.ant.amazon.com (10.13.138.249) by EX19MTAUWB002.ant.amazon.com (10.250.64.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 4 Apr 2025 16:24:56 +0000 Received: from EX19MTAUWB001.ant.amazon.com (10.250.64.248) by EX19D020UWA001.ant.amazon.com (10.13.138.249) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 4 Apr 2025 16:24:55 +0000 Received: from email-imr-corp-prod-iad-all-1a-059220b4.us-east-1.amazon.com (10.25.36.214) by mail-relay.amazon.com (10.250.64.254) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14 via Frontend Transport; Fri, 4 Apr 2025 16:24:55 +0000 Received: from dev-dsk-ptyadav-1c-43206220.eu-west-1.amazon.com (dev-dsk-ptyadav-1c-43206220.eu-west-1.amazon.com [172.19.91.144]) by email-imr-corp-prod-iad-all-1a-059220b4.us-east-1.amazon.com (Postfix) with ESMTP id EBBDF42F0D; Fri, 4 Apr 2025 16:24:54 +0000 (UTC) Received: by dev-dsk-ptyadav-1c-43206220.eu-west-1.amazon.com (Postfix, from userid 23027615) id A83976148; Fri, 4 Apr 2025 16:24:54 +0000 (UTC) From: Pratyush Yadav To: Jason Gunthorpe CC: Mike Rapoport , Changyuan Lyu , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation In-Reply-To: <20250404143031.GB1336818@nvidia.com> References: <20250320015551.2157511-1-changyuanl@google.com> <20250320015551.2157511-10-changyuanl@google.com> <20250403114209.GE342109@nvidia.com> <20250403142438.GF342109@nvidia.com> <20250404124729.GH342109@nvidia.com> <20250404143031.GB1336818@nvidia.com> Date: Fri, 4 Apr 2025 16:24:54 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250404_092512_108493_CAA25167 X-CRM114-Status: GOOD ( 28.56 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, Apr 04 2025, Jason Gunthorpe wrote: > On Fri, Apr 04, 2025 at 04:53:13PM +0300, Mike Rapoport wrote: [...] >> Most drivers do not use folios > > Yes they do, either through kmalloc or through alloc_page/etc. "folio" > here is just some generic word meaning memory from the buddy allocator. > > The big question on my mind is if we need a way to preserve slab > objects as well.. Only if the objects in the slab cache are of a format that doesn't change, and I am not sure if that is the case anywhere. Maybe a driver written with KHO in mind would find it useful, but that's way down the line. > >> and for preserving memfd* and hugetlb we'd need to have some dance >> around that memory anyway. > > memfd is all folios - what do you mean? > > hugetlb is moving toward folios.. eg guestmemfd is supposed to be > taking the hugetlb special stuff and turning it into folios. > >> So I think kho_preserve_folio() would be a part of the fdbox or >> whatever that functionality will be called. > > It is part of KHO. Preserving the folios has to be sequenced with > starting the buddy allocator, and that is KHO's entire responsibility. > > I could see something like preserving slab being in a different layer, > built on preserving folios. Agree with both points. [...] >> As for the optimizations of memblock reserve path, currently it what hurts >> the most in my and Pratyush experiments. They are not very representative, >> but still, preserving lots of pages/folios spread all over would have it's >> toll on the mm initialization. > >> And I don't think invasive changes to how >> buddy and memory map initialization are the best way to move forward and >> optimize that. > > I'm pretty sure this is going to be the best performance path, but I > have no idea how invasive it would be to the buddy alloactor to make > it work. I don't imagine it would be that invasive TBH. memblock_free_pages() already checks for kmsan_memblock_free_pages() or early_page_initialised(), it can also check for kho_page() just as easily. > >> Quite possibly we'd want to be able to minimize amount of *ranges* >> that we preserve. > > I'm not sure, that seems backwards to me, we really don't want to have > KHO mem zones! So I think optimizing for, and thinking about ranges > doesn't make sense. > > The big ranges will arise naturally beacuse things like hugetlb > reservations should all be contiguous and the resulting folios should > all be allocated for the VM and also all be contigous. So vast, vast > amounts of memory will be high order and contiguous. Yes, and those can work quite well with table + bitmaps too. [...] -- Regards, Pratyush Yadav