From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74B103101BC; Tue, 18 Nov 2025 08:51:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763455905; cv=none; b=aFqyG99ZQHNhBUkvMUhzrnBpiIPvFN8edK4nnADvNpLsGd1Ap96s/i3wfrTJM2/7S+GNDa1UNwtbaVJiJrBbJfA0VjhdJhuIXB5Jps9z4AURFp5PXxTe09Gm7SMCXPYa2611fIyHdvPBPh1QB55JCJVvPsZL70owPJYNnkYw1qE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763455905; c=relaxed/simple; bh=qvUA4igN+3L19DyfGE1idHMAssgfS6Bs6bYW7pr0XHc=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=oYEFtf1VlD/DRp56VeKowi4hJ780J4RYbzdYtfO71jtSavPqCzMjSbc78/TOj+oMiS0rVThRNqnLGgnag7Rjj6qBgqu+Nwe2iJ5uBunEttYbQICcJkd7cSzr3j40LBe7HBIGLakHVA8BptV3+XW94+2hb+J6b8MvoOA8j0MaD2M= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=IwveRkhR; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="IwveRkhR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5B775C2BCB3; Tue, 18 Nov 2025 08:51:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763455905; bh=qvUA4igN+3L19DyfGE1idHMAssgfS6Bs6bYW7pr0XHc=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=IwveRkhRugwYpdfit9oT2x8PUOOCVrEXb0545F/qsp/2w4Z8PpQNCqP49Jo5KbUFD plXuA7r0nSCTJxZdCV8FxqKQYl2sYBBBNUHXLNp8ixPMjXK67ca8ZXMO1u4rairRc3 cUTgW85RYCXI0MwMUJxOAsAOMHgCnXJh1JXQUERyVbFiUOOhLNpEwchHlavt0AdRQc 8fDaSxkhI5AdC9sA1u8+58UuIT0Ce+uFIr/YOTI0q982Tj7vVAim/ZVsOvr9tYrGcG GcuKpx/gCEDUdIewKXNvNPbVawUqRjyltsq1vinrRty3tgnG8AdcES2vXyfVSmUaLx LLxYTWBXADAMg== Message-ID: <7f89c0cf-0601-4f61-a665-72364629681a@kernel.org> Date: Tue, 18 Nov 2025 09:51:39 +0100 Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 0/4] Extend xas_split* to support splitting arbitrarily large entries To: Ackerley Tng , Matthew Wilcox Cc: akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, michael.roth@amd.com, vannapurve@google.com References: <20251117224701.1279139-1-ackerleytng@google.com> From: "David Hildenbrand (Red Hat)" Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 18.11.25 00:43, Ackerley Tng wrote: > Matthew Wilcox writes: > >> On Mon, Nov 17, 2025 at 02:46:57PM -0800, Ackerley Tng wrote: >>> guest_memfd is planning to store huge pages in the filemap, and >>> guest_memfd's use of huge pages involves splitting of huge pages into >>> individual pages. Splitting of huge pages also involves splitting of >>> the filemap entries for the pages being split. > >> >> Hm, I'm not most concerned about the number of nodes you're allocating. > > Thanks for reminding me, I left this out of the original message. > > Splitting the xarray entry for a 1G folio (in a shift-18 node for > order=18 on x86), assuming XA_CHUNK_SHIFT is 6, would involve > > + shift-18 node (the original node will be reused - no new allocations) > + shift-12 node: 1 node allocated > + shift-6 node : 64 nodes allocated > + shift-0 node : 64 * 64 = 4096 nodes allocated > > This brings the total number of allocated nodes to 4161 nodes. struct > xa_node is 576 bytes, so that's 2396736 bytes or 2.28 MB, so splitting a > 1G folio to 4K pages costs ~2.5 MB just in filemap (XArray) entry > splitting. The other large memory cost would be from undoing HVO for the > HugeTLB folio. > >> I'm most concerned that, once we have memdescs, splitting a 1GB page >> into 512 * 512 4kB pages is going to involve allocating about 20MB >> of memory (80 bytes * 512 * 512). > > I definitely need to catch up on memdescs. What's the best place for me > to learn/get an overview of how memdescs will describe memory/replace > struct folios? > > I think there might be a better way to solve the original problem of > usage tracking with memdesc support, but this was intended to make > progress before memdescs. > >> Is this necessary to do all at once? > > The plan for guest_memfd was to first split from 1G to 4K, then optimize > on that by splitting in stages, from 1G to 2M as much as possible, then > to 4K only for the page ranges that the guest shared with the host. Right, we also discussed the non-uniform split as an optimization in the future. -- Cheers David