From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 694D238BF97
	for <linux-kernel@vger.kernel.org>; Wed, 13 May 2026 17:14:23 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1778692466; cv=none; b=MwUTmKyaR+TW5zt8Y4rOy+6yO+fVpt9DLCrsurp+i5DOavMA8XCx2pZmXcD5kyBB+tMCIn8u9pxcrgUAy9c7pfTQ0KqyGpynr8w3EcK+fkaAulc8a29gsq7VSXWVq7m3UL2lTwZ0ctfp+27p9YrfFMFANthlIWBi5QI2NhPZ7H4=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1778692466; c=relaxed/simple;
	bh=LYpsmAvkUvdgxIcojX0pyYYC8iP9AzhiXAv7jZ/hj3Y=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type; b=MOwGwSANtFrda4+uQCJQ0Fuj92gMhhyQrfLprR+9ErGMbk0Ih/6GdIv/gbZJCa0ZDi0vUVfztvlj9XmX5Wowo/Gua7LU4ul85DZ/ljeCcGSGmzv/FoC8R15u+nyHT4/ux/ZZFuHQBrQ/DWYXeH7WlqFhuuPPv+/1bBoTVkNBK40=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=MTmE+wDR; arc=none smtp.client-ip=209.85.128.74
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="MTmE+wDR"
Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-48fd61d252cso1483445e9.1
        for <linux-kernel@vger.kernel.org>; Wed, 13 May 2026 10:14:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20251104; t=1778692461; x=1779297261; darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=55kMhwW1WRb8keUcZYpGwz3Iws6aBnbO5gEnqVKvWQc=;
        b=MTmE+wDR3QS64Kkk4z6n0qk4Kp/VPy9uY/StUpY2hK0m8j4Z0QNgad3x5BDbvW+yct
         7PuZ+fGgL3qyWKf2G2f7+Mr4HOc4gepz/NsLRJ+ECe7yqORdV3dYl3+gtVRzuG0i08AD
         peFLaBL7qPqMvdpRoSBKVvUEKuO2kDbwzHHB+pDHYOXbHK2K3P6Eyp+py+nLxzjf27kU
         Y9ikZEgZP0wYSrdrpN6rt48Q3kNNmh+iWEwwgTdmfLWc4nWwL3aFH8tQgj2SZ5ImGzZT
         T++StuD5Eb3ysG/jg2smUxWVzkxl1cYGx5XBWcZuU9//MgWMZU10RQjnUNFHKRmd4Uul
         xpEA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1778692461; x=1779297261;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=55kMhwW1WRb8keUcZYpGwz3Iws6aBnbO5gEnqVKvWQc=;
        b=Ax/LT0k70tVQx9IqAwVlk/n6K7hCOcgBkC6AMgf4t1x6wNAvsninUajGJ7g/OT3O1S
         9dZqKM+N4Bnk8oeI+5uqaLcYwEys642giSzbq7HzqOcEIooF8I3xcHHM1iChUKtrG6j3
         hrXAJQ/goP0D4hZDo5IIuee9w9ah/2kaWgQfnlRSbsjn35CBsu0KHfJ+FjWMIgNTbvLW
         hgeRZMPl1ynvP8XG2Si9rEvbtziw0lC9jyyIKM/B2esKo8gW9TmO17U3WUoBAmqbkrZK
         dQq65OcoNxQMYEzBiv9/gAjy5E5z4H2MAGRGcrGBC5NSzYLD3arlp9c29xFulVs6PbEd
         VLTw==
X-Forwarded-Encrypted: i=1; AFNElJ8LuFr6bwpSAXNWeu04Iwu4jFKE1m8SmRsEndOuxPdmtDEzSV0vKspG04yJpu86KTvuImUoRdKcKGSJW8k=@vger.kernel.org
X-Gm-Message-State: AOJu0Yz64H889GvLz5H2E9vxj8jG4vNxIag8inB9SvW3NWOW7HwfeZ2O
	xDiaWLlA+SUeyvLCgZCA4vQUsrT3PIIy1UYRDE1c7e9bl4XO12Hxs7P2cG8q6lsLGfdbeINNr7d
	dyaXIULlpkSo/eQ==
X-Received: from wmdi29.prod.google.com ([2002:a05:600c:291d:b0:485:3a14:a74e])
 (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:600c:3109:b0:485:46fd:7887 with SMTP id 5b1f17b1804b1-48fce9ef234mr53663625e9.13.1778692460978;
 Wed, 13 May 2026 10:14:20 -0700 (PDT)
Date: Wed, 13 May 2026 17:14:20 +0000
In-Reply-To: <agSkMV26nhYukbnK@gourry-fedora-PF4VCD3F>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260320-page_alloc-unmapped-v2-0-28bf1bd54f41@google.com> <agSkMV26nhYukbnK@gourry-fedora-PF4VCD3F>
X-Mailer: aerc 0.21.0
Message-ID: <DIHPURUMTESI.1LP37OYJN1N31@google.com>
Subject: Re: [PATCH v2 00/22] mm: Add __GFP_UNMAPPED
From: Brendan Jackman <jackmanb@google.com>
To: Gregory Price <gourry@gourry.net>, Brendan Jackman <jackmanb@google.com>
Cc: Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, 
	Peter Zijlstra <peterz@infradead.org>, Andrew Morton <akpm@linux-foundation.org>, 
	David Hildenbrand <david@kernel.org>, Vlastimil Babka <vbabka@kernel.org>, Wei Xu <weixugc@google.com>, 
	Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>, Lorenzo Stoakes <ljs@kernel.org>, 
	<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, <x86@kernel.org>, 
	<rppt@kernel.org>, Sumit Garg <sumit.garg@oss.qualcomm.com>, <derkling@google.com>, 
	<reijiw@google.com>, Will Deacon <will@kernel.org>, <rientjes@google.com>, 
	"Kalyazin, Nikita" <kalyazin@amazon.co.uk>, <patrick.roy@linux.dev>, 
	"Itazuri, Takahiro" <itazur@amazon.co.uk>, Andy Lutomirski <luto@kernel.org>, 
	David Kaplan <david.kaplan@amd.com>, Thomas Gleixner <tglx@kernel.org>, Yosry Ahmed <yosry@kernel.org>
Content-Type: text/plain; charset="UTF-8"

On Wed May 13, 2026 at 4:17 PM UTC, Gregory Price wrote:
> On Fri, Mar 20, 2026 at 06:23:24PM +0000, Brendan Jackman wrote:
>> 
>> Because of these ambitious usecases, it's core to this proposal that the
>> feature
>> overloading the concept of a migratetype, this extension is done by
>> adding a new concept on top of migratetype: the _freetype_. A freetype
>> is basically just a migratetype plus some flags, and it replaces
>> migratetypes wherever the latter is currently used as to index free
>> pages.
>>
>
> I'm a bit confused why the need for additional level of indirection
> instead of just adding a new migratetype.  You still end up increasing
> the migratetype matrix, just with a new dimension.
>
> (apologies if this was covered in prior work or discussions, just now
>  plugging myself into the series).
>
> Why not simply have an unmapped migratetype, for example, and on steal
> you convert it to movable or whatever the preference is? 

Because the fact that only one migratetype currently supports being
unmapped is a temporary happenstance of the guest_memfd usecase. In
general, this needs to support having unmapped variants of ~arbitrary
migratetypes.

>> .:::: Hacky bits: simplistic secretmem integration
>> 
>> The secretmem integration leaves the mmain optimisations on the table;
>> the security-required flushes of the mermap areas are implemented via
>> distinct tlb_flush_mm() calls. It should be possible to amortize the
>> mermap TLB flushes completely into the normal VMA flushing. However, as
>> far as I know there is no performance-sensitive usecase for secretmem.
>> So, I've just implemented the minimal adoption. This will at least avoid
>> fragmentation of the direct map, even if it doesn't reduce TLB flushing.
>> If anyone knows of a workload that might benefit from dropping that
>> flushing, let me know!
>
> Crossing a couple streams here, I wonder if there's some mechanisms
> introduced by MST's latest multi-zeroing-avoidance [1] code that might
> help deal with the problem here.
>
> MST wired up an optional user_addr into the buddy that allows us to sink
> the zeroing step for folio_zero_user (or folio_user_zero or whatever)
> into the post_alloc_hook - which includes some cache flushing.
>
> That conveniently gives you what you need for a TLB flush AND an
> indicator that the allocation is intended for userland.
>
> Unless I'm fundamentally misunderstanding something, the pattern at least
> seems similar.

Yeah, I actually only noticed that yesterday due to your posts on that
thread! I need to investigate it further. My assumption has always been
that this isn't a general solution because we don't always _have_ a user
address (e.g. for guest_memfd it's important that we can populate the
memory via write(), so there's no user address), but it's pretty likely
I'm missing something there.

> In that sense, does this just become a post_alloc_hook that unmaps the
> memory after zeroing and allocation?
>
> I get the intent is to have the majority of memory unmapped by default,
> and then steal those blocks and map them as the kernel requires more
> memory, but I wonder if it's cleaner to do it the other way and simply
> have the buddy unmap on alloc after zeroing, and remap on free.

That would be cleaner indeed, but the key question here isn't about the
default state of memory here, it's about batching.

The reason we need to do it at the block granularity is that a TLB flush
every time we allocate one of these pages is a performance nonstarter -
that's actually the entire point of this series. If you can afford a TLB
flush per allocation then you don't need __GFP_UNMAPPED for the
guest_memfd usecase, the existing direct map removal series [0] is
already fine.

[0] https://lore.kernel.org/all/20260410151746.61150-1-kalyazin@amazon.com/

> Seems like the free path would be trivial, check if the page is in the
> direct map and if not, remap it and move on.  Entirely hidden from
> existing users.
>
> So, maybe a stupid question:  Was the opposite mechanism considered
> (unmap on alloc sunk into the buddy), and if so was it rejected for some
> other reason?

Hopefully my prev paragraph explained that it's not viable anyway, but:
if we _did_ do the [un]mapping on a per-allocation basis, the
disadvantage of unmap-on-alloc is just that we expect most pages in the
system to be unmapped. So the majority of map-unmap cycles are pointless
(map on free, but we're probably gonna unmap again on allloc).

Cheers,
Brendan