From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 694D238BF97 for ; Wed, 13 May 2026 17:14:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778692466; cv=none; b=MwUTmKyaR+TW5zt8Y4rOy+6yO+fVpt9DLCrsurp+i5DOavMA8XCx2pZmXcD5kyBB+tMCIn8u9pxcrgUAy9c7pfTQ0KqyGpynr8w3EcK+fkaAulc8a29gsq7VSXWVq7m3UL2lTwZ0ctfp+27p9YrfFMFANthlIWBi5QI2NhPZ7H4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778692466; c=relaxed/simple; bh=LYpsmAvkUvdgxIcojX0pyYYC8iP9AzhiXAv7jZ/hj3Y=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=MOwGwSANtFrda4+uQCJQ0Fuj92gMhhyQrfLprR+9ErGMbk0Ih/6GdIv/gbZJCa0ZDi0vUVfztvlj9XmX5Wowo/Gua7LU4ul85DZ/ljeCcGSGmzv/FoC8R15u+nyHT4/ux/ZZFuHQBrQ/DWYXeH7WlqFhuuPPv+/1bBoTVkNBK40= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=MTmE+wDR; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="MTmE+wDR" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-48fd61d252cso1483445e9.1 for ; Wed, 13 May 2026 10:14:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1778692461; x=1779297261; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=55kMhwW1WRb8keUcZYpGwz3Iws6aBnbO5gEnqVKvWQc=; b=MTmE+wDR3QS64Kkk4z6n0qk4Kp/VPy9uY/StUpY2hK0m8j4Z0QNgad3x5BDbvW+yct 7PuZ+fGgL3qyWKf2G2f7+Mr4HOc4gepz/NsLRJ+ECe7yqORdV3dYl3+gtVRzuG0i08AD peFLaBL7qPqMvdpRoSBKVvUEKuO2kDbwzHHB+pDHYOXbHK2K3P6Eyp+py+nLxzjf27kU Y9ikZEgZP0wYSrdrpN6rt48Q3kNNmh+iWEwwgTdmfLWc4nWwL3aFH8tQgj2SZ5ImGzZT T++StuD5Eb3ysG/jg2smUxWVzkxl1cYGx5XBWcZuU9//MgWMZU10RQjnUNFHKRmd4Uul xpEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778692461; x=1779297261; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=55kMhwW1WRb8keUcZYpGwz3Iws6aBnbO5gEnqVKvWQc=; b=Ax/LT0k70tVQx9IqAwVlk/n6K7hCOcgBkC6AMgf4t1x6wNAvsninUajGJ7g/OT3O1S 9dZqKM+N4Bnk8oeI+5uqaLcYwEys642giSzbq7HzqOcEIooF8I3xcHHM1iChUKtrG6j3 hrXAJQ/goP0D4hZDo5IIuee9w9ah/2kaWgQfnlRSbsjn35CBsu0KHfJ+FjWMIgNTbvLW hgeRZMPl1ynvP8XG2Si9rEvbtziw0lC9jyyIKM/B2esKo8gW9TmO17U3WUoBAmqbkrZK dQq65OcoNxQMYEzBiv9/gAjy5E5z4H2MAGRGcrGBC5NSzYLD3arlp9c29xFulVs6PbEd VLTw== X-Forwarded-Encrypted: i=1; AFNElJ8LuFr6bwpSAXNWeu04Iwu4jFKE1m8SmRsEndOuxPdmtDEzSV0vKspG04yJpu86KTvuImUoRdKcKGSJW8k=@vger.kernel.org X-Gm-Message-State: AOJu0Yz64H889GvLz5H2E9vxj8jG4vNxIag8inB9SvW3NWOW7HwfeZ2O xDiaWLlA+SUeyvLCgZCA4vQUsrT3PIIy1UYRDE1c7e9bl4XO12Hxs7P2cG8q6lsLGfdbeINNr7d dyaXIULlpkSo/eQ== X-Received: from wmdi29.prod.google.com ([2002:a05:600c:291d:b0:485:3a14:a74e]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:3109:b0:485:46fd:7887 with SMTP id 5b1f17b1804b1-48fce9ef234mr53663625e9.13.1778692460978; Wed, 13 May 2026 10:14:20 -0700 (PDT) Date: Wed, 13 May 2026 17:14:20 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260320-page_alloc-unmapped-v2-0-28bf1bd54f41@google.com> X-Mailer: aerc 0.21.0 Message-ID: Subject: Re: [PATCH v2 00/22] mm: Add __GFP_UNMAPPED From: Brendan Jackman To: Gregory Price , Brendan Jackman Cc: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan , Lorenzo Stoakes , , , , , Sumit Garg , , , Will Deacon , , "Kalyazin, Nikita" , , "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Yosry Ahmed Content-Type: text/plain; charset="UTF-8" On Wed May 13, 2026 at 4:17 PM UTC, Gregory Price wrote: > On Fri, Mar 20, 2026 at 06:23:24PM +0000, Brendan Jackman wrote: >> >> Because of these ambitious usecases, it's core to this proposal that the >> feature >> overloading the concept of a migratetype, this extension is done by >> adding a new concept on top of migratetype: the _freetype_. A freetype >> is basically just a migratetype plus some flags, and it replaces >> migratetypes wherever the latter is currently used as to index free >> pages. >> > > I'm a bit confused why the need for additional level of indirection > instead of just adding a new migratetype. You still end up increasing > the migratetype matrix, just with a new dimension. > > (apologies if this was covered in prior work or discussions, just now > plugging myself into the series). > > Why not simply have an unmapped migratetype, for example, and on steal > you convert it to movable or whatever the preference is? Because the fact that only one migratetype currently supports being unmapped is a temporary happenstance of the guest_memfd usecase. In general, this needs to support having unmapped variants of ~arbitrary migratetypes. >> .:::: Hacky bits: simplistic secretmem integration >> >> The secretmem integration leaves the mmain optimisations on the table; >> the security-required flushes of the mermap areas are implemented via >> distinct tlb_flush_mm() calls. It should be possible to amortize the >> mermap TLB flushes completely into the normal VMA flushing. However, as >> far as I know there is no performance-sensitive usecase for secretmem. >> So, I've just implemented the minimal adoption. This will at least avoid >> fragmentation of the direct map, even if it doesn't reduce TLB flushing. >> If anyone knows of a workload that might benefit from dropping that >> flushing, let me know! > > Crossing a couple streams here, I wonder if there's some mechanisms > introduced by MST's latest multi-zeroing-avoidance [1] code that might > help deal with the problem here. > > MST wired up an optional user_addr into the buddy that allows us to sink > the zeroing step for folio_zero_user (or folio_user_zero or whatever) > into the post_alloc_hook - which includes some cache flushing. > > That conveniently gives you what you need for a TLB flush AND an > indicator that the allocation is intended for userland. > > Unless I'm fundamentally misunderstanding something, the pattern at least > seems similar. Yeah, I actually only noticed that yesterday due to your posts on that thread! I need to investigate it further. My assumption has always been that this isn't a general solution because we don't always _have_ a user address (e.g. for guest_memfd it's important that we can populate the memory via write(), so there's no user address), but it's pretty likely I'm missing something there. > In that sense, does this just become a post_alloc_hook that unmaps the > memory after zeroing and allocation? > > I get the intent is to have the majority of memory unmapped by default, > and then steal those blocks and map them as the kernel requires more > memory, but I wonder if it's cleaner to do it the other way and simply > have the buddy unmap on alloc after zeroing, and remap on free. That would be cleaner indeed, but the key question here isn't about the default state of memory here, it's about batching. The reason we need to do it at the block granularity is that a TLB flush every time we allocate one of these pages is a performance nonstarter - that's actually the entire point of this series. If you can afford a TLB flush per allocation then you don't need __GFP_UNMAPPED for the guest_memfd usecase, the existing direct map removal series [0] is already fine. [0] https://lore.kernel.org/all/20260410151746.61150-1-kalyazin@amazon.com/ > Seems like the free path would be trivial, check if the page is in the > direct map and if not, remap it and move on. Entirely hidden from > existing users. > > So, maybe a stupid question: Was the opposite mechanism considered > (unmap on alloc sunk into the buddy), and if so was it rejected for some > other reason? Hopefully my prev paragraph explained that it's not viable anyway, but: if we _did_ do the [un]mapping on a per-allocation basis, the disadvantage of unmap-on-alloc is just that we expect most pages in the system to be unmapped. So the majority of map-unmap cycles are pointless (map on free, but we're probably gonna unmap again on allloc). Cheers, Brendan