From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B39372BEFF8 for ; Mon, 17 Nov 2025 23:43:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763422989; cv=none; b=nHe+j7NVjHkY4N/c26KJjLpJpU3xpQkm9nCXqYD4av51HeZIs5c6ECYlzyKA975Emq8YkLXZj7kgZ2ZLHWbBpif2TcgYOAyOIB++JszAF2S15d+OloTw9eIMzS3sDFIio6oda0epGjUAkzoFHFIbUtqiOe4ZSXc5yNuTBfaQcZU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763422989; c=relaxed/simple; bh=dL2LiF1FldEZ/KOYQf5dYKmrAGWobpV3y8hV3ZgMVqE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jqPHPfkZkVyf3sxYHlc8yqvtU76gutW6c6Do0DjqlXcDYjBWaMDVP97brF6fuETyi7PwaHHB+IoUKFOdBsWHsliNY6rH65ta4bCgqTa0BukJaVnClZ2d2pp91jT6nY/BRXgaqRsFFmj6k2EWUolBVfjVvrRXEyco1PRI+CpcIlQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=rcbPVM8T; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rcbPVM8T" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-297f8a2ba9eso130330395ad.3 for ; Mon, 17 Nov 2025 15:43:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1763422985; x=1764027785; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=dL2LiF1FldEZ/KOYQf5dYKmrAGWobpV3y8hV3ZgMVqE=; b=rcbPVM8TnVfERGvsvSmP7961dYkn9YkidetAXAAKkfgeCWpf1NMpwK/foi3ueUvYcf scVXNhyVsUeOYJar3TfVxhYgp9xISLmUAlGAwOSdFZMDynJCt/fjjbI0yh3BTKwzWlDk gkv0pQZ/nHN+vn1W1LHqcn3VTSD0gIYzoR7+admWaeuP9Qcqo8t3KlXGKVfFnE3KXPfc AXyihUSB8/fEqtXX1SJ8KdzlqNbp9gu1m3aoxk27uLKmFifa4L7DEa1RleeHwPiOtl54 leWeQ2p0rSPU+lbmY78qwjvp6aY/1RTtPZC5a6whmO5vEjtRsExvY215ixGK40FPTpkh a2MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763422985; x=1764027785; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dL2LiF1FldEZ/KOYQf5dYKmrAGWobpV3y8hV3ZgMVqE=; b=o86A/SGI6YWhTzoLmNrWcOPv6CEVe81sdJnYQ2kGytWZg7rLwCevw7PUaMIgm9hgL0 YduisrsK6E5wp5Q4H+Q54YPlLg+8zupgvigvtmhET5u6y8jGld3sUvafW7SJGM0UE58U zu9Bor9PV9vcRZtMVDDeTwM2iWaR8mNKRf37dir9yV2xIxU9rCHnR3ecCM+ZbXKB2Qt1 U8VrMKqunMaLn1T2l74ecBtZkiwD6E0hQn/3Sqqp/2UXbIw70GKOvIwFuASvlLCTPENO EdgVUoeLoETv9ObYdwaUIXew6ZLkpTfjGUUoDI9n3EwH/qKpoJS6bS8PVfhY0sg3xzIU 80MQ== X-Forwarded-Encrypted: i=1; AJvYcCXcPmkKbUZY4mt1M6NtNJg5ZA00oO6brPpAg5WSnBik4CLhZOPKqZeNCLrCZ+kJOaek6qQdmz8oM1kcdB4=@vger.kernel.org X-Gm-Message-State: AOJu0YxinkB/DGiDGUhunwZyd7U7lE1JU13v+jcrXtr0eiLSyMo53nYd jdFBSKGeekLiGLSOnmhR2dEAZ7VkEzoCQ9eTC30L/z/GL8KfaNgYYqLdlzq7rzhhnSrqnFlEccH MXuJbg360Lsio388yycy8SY69Jg== X-Google-Smtp-Source: AGHT+IHWpz1bcVZOmz2II2H4ZeDGXr1yKcY8vxUhe1flIg0qA+8VT3coJzci/8Ii6V7iaVk1iTvEUQ+0iPC050VUeQ== X-Received: from plbkq4.prod.google.com ([2002:a17:903:2844:b0:295:41b0:5445]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:db0e:b0:297:c272:80ec with SMTP id d9443c01a7336-2986a741bbbmr167625985ad.42.1763422984901; Mon, 17 Nov 2025 15:43:04 -0800 (PST) Date: Mon, 17 Nov 2025 15:43:03 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251117224701.1279139-1-ackerleytng@google.com> Message-ID: Subject: Re: [RFC PATCH 0/4] Extend xas_split* to support splitting arbitrarily large entries From: Ackerley Tng To: Matthew Wilcox Cc: akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, david@redhat.com, michael.roth@amd.com, vannapurve@google.com Content-Type: text/plain; charset="UTF-8" Matthew Wilcox writes: > On Mon, Nov 17, 2025 at 02:46:57PM -0800, Ackerley Tng wrote: >> guest_memfd is planning to store huge pages in the filemap, and >> guest_memfd's use of huge pages involves splitting of huge pages into >> individual pages. Splitting of huge pages also involves splitting of >> the filemap entries for the pages being split. > > Hm, I'm not most concerned about the number of nodes you're allocating. Thanks for reminding me, I left this out of the original message. Splitting the xarray entry for a 1G folio (in a shift-18 node for order=18 on x86), assuming XA_CHUNK_SHIFT is 6, would involve + shift-18 node (the original node will be reused - no new allocations) + shift-12 node: 1 node allocated + shift-6 node : 64 nodes allocated + shift-0 node : 64 * 64 = 4096 nodes allocated This brings the total number of allocated nodes to 4161 nodes. struct xa_node is 576 bytes, so that's 2396736 bytes or 2.28 MB, so splitting a 1G folio to 4K pages costs ~2.5 MB just in filemap (XArray) entry splitting. The other large memory cost would be from undoing HVO for the HugeTLB folio. > I'm most concerned that, once we have memdescs, splitting a 1GB page > into 512 * 512 4kB pages is going to involve allocating about 20MB > of memory (80 bytes * 512 * 512). I definitely need to catch up on memdescs. What's the best place for me to learn/get an overview of how memdescs will describe memory/replace struct folios? I think there might be a better way to solve the original problem of usage tracking with memdesc support, but this was intended to make progress before memdescs. > Is this necessary to do all at once? The plan for guest_memfd was to first split from 1G to 4K, then optimize on that by splitting in stages, from 1G to 2M as much as possible, then to 4K only for the page ranges that the guest shared with the host.