From: Brendan Jackman <jackmanb@google.com>
To: Brendan Jackman <jackmanb@google.com>,
Dave Hansen <dave.hansen@intel.com>,
"Roy, Patrick" <roypat@amazon.co.uk>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
"corbet@lwn.net" <corbet@lwn.net>,
"maz@kernel.org" <maz@kernel.org>,
"oliver.upton@linux.dev" <oliver.upton@linux.dev>,
"joey.gouly@arm.com" <joey.gouly@arm.com>,
"suzuki.poulose@arm.com" <suzuki.poulose@arm.com>,
"yuzenghui@huawei.com" <yuzenghui@huawei.com>,
"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
"will@kernel.org" <will@kernel.org>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"mingo@redhat.com" <mingo@redhat.com>,
"bp@alien8.de" <bp@alien8.de>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"x86@kernel.org" <x86@kernel.org>,
"hpa@zytor.com" <hpa@zytor.com>,
"luto@kernel.org" <luto@kernel.org>,
"peterz@infradead.org" <peterz@infradead.org>,
"willy@infradead.org" <willy@infradead.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"david@redhat.com" <david@redhat.com>,
"lorenzo.stoakes@oracle.com" <lorenzo.stoakes@oracle.com>,
"Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
"vbabka@suse.cz" <vbabka@suse.cz>,
"rppt@kernel.org" <rppt@kernel.org>,
"surenb@google.com" <surenb@google.com>,
"mhocko@suse.com" <mhocko@suse.com>,
"song@kernel.org" <song@kernel.org>,
"jolsa@kernel.org" <jolsa@kernel.org>,
"ast@kernel.org" <ast@kernel.org>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
"andrii@kernel.org" <andrii@kernel.org>,
"martin.lau@linux.dev" <martin.lau@linux.dev>,
"eddyz87@gmail.com" <eddyz87@gmail.com>,
"yonghong.song@linux.dev" <yonghong.song@linux.dev>,
"john.fastabend@gmail.com" <john.fastabend@gmail.com>,
"kpsingh@kernel.org" <kpsingh@kernel.org>,
"sdf@fomichev.me" <sdf@fomichev.me>,
"haoluo@google.com" <haoluo@google.com>,
"jgg@ziepe.ca" <jgg@ziepe.ca>,
"jhubbard@nvidia.com" <jhubbard@nvidia.com>,
"peterx@redhat.com" <peterx@redhat.com>,
"jannh@google.com" <jannh@google.com>,
"pfalcato@suse.de" <pfalcato@suse.de>,
"shuah@kernel.org" <shuah@kernel.org>,
"seanjc@google.com" <seanjc@google.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"kvmarm@lists.linux.dev" <kvmarm@lists.linux.dev>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"linux-kselftest@vger.kernel.org"
<linux-kselftest@vger.kernel.org>,
"Cali, Marco" <xmarcalx@amazon.co.uk>,
"Kalyazin, Nikita" <kalyazin@amazon.co.uk>,
"Thomson, Jack" <jackabt@amazon.co.uk>,
"derekmn@amazon.co.uk" <derekmn@amazon.co.uk>,
"tabba@google.com" <tabba@google.com>,
"ackerleytng@google.com" <ackerleytng@google.com>
Subject: Re: [PATCH v7 06/12] KVM: guest_memfd: add module param for disabling TLB flushing
Date: Fri, 31 Oct 2025 18:31:12 +0000 [thread overview]
Message-ID: <DDWPZY3AA7BX.1Y05FOYIHAI07@google.com> (raw)
In-Reply-To: <DDVS9ITBCE2Z.RSTLCU79EX8G@google.com>
On Thu Oct 30, 2025 at 4:05 PM UTC, Brendan Jackman wrote:
> On Thu Sep 25, 2025 at 6:27 PM UTC, Dave Hansen wrote:
>> On 9/24/25 08:22, Roy, Patrick wrote:
>>> Add an option to not perform TLB flushes after direct map manipulations.
>>
>> I'd really prefer this be left out for now. It's a massive can of worms.
>> Let's agree on something that works and has well-defined behavior before
>> we go breaking it on purpose.
>
> As David pointed out in the MM Alignment Session yesterday, I might be
> able to help here. In [0] I've proposed a way to break up the direct map
> by ASI's "sensitivity" concept, which is weaker than the "totally absent
> from the direct map" being proposed here, but it has kinda similar
> implementation challenges.
>
> Basically it introduces a thing called a "freetype" that extends the
> idea of migratetype. Like the existing idea of migratetype, it's used to
> physically group pages when allocating, and you can index free pages by
> it, i.e. each freetype gets its own freelist. But it can also encode
> other information than mobility (and the other stuff that's encoded in
> migratetype...).
>
> Could it make sense to use that logic to just have entire pageblocks
> that are absent from the direct map? Then when allocating memory for the
> guest_memfd we get it from one of those pageblocks. Then we only have to
> flush the TLB if there's no memory left in pageblocks of this freetype
> (so the allocator has to flip another pageblock over to the "no direct
> map" freetype, after removing it from the direct map).
>
> I haven't yet investigated this properly, I'll start doing that now.
> But I thought I'd immediately drop this note in case anyone can
> immediately see a reason why this doesn't work.
I spent some time poking around and I think there's only one issue here:
in this design the mapping/unmapping of the direct map happens while
allocating. But, it might need to allocate a pagetable to break down a
page.
In my ASI-specific presentation of that feature, I dodged this issue by
just requiring the whole ASI direct map to be set up at pageblock
granularity. This totally dodges the recursion issue since we just never
have to break down pages. (Actually, Dave Hansen suggested for the
initial implementation I simplify it by just doing all the ASI stuff at
4k, which achieves the same thing).
I guess we'd like to avoid globally fragmenting the whole direct map
just in case someone wants to use guest_memfd at some point? And, I
guess we could just instantaneously fragment it all at the instant that
someone wants to do that, but that's still a bit yucky.
If we just ignore this issue and try to allocate pagetables, it's
possible for a pathological physmap state to emerge where we get into
the allocator path that [un]maps a pageblock, but then need to allocate
a page to [un]map it, and that allocation in turn gets into the
[un]mapping path, and suddenly, turtles.
I think the simplest answer to that is to just fail the [un]map path if
we detect we're recursive, with something like a PF_MEMALLOC_* flag.
But this feels a bit yucky.
Other ideas might include: don't actually fragment the whole physmap,
but at least pre-allocate the pagetables down to pageblock granularity.
Or alternatively, this could point to an issue in the way I injected
[un]mapping into the allocator, and fixing that design flaw would solve
the problem.
I'll have to think about this some more on Monday but sharing my
thoughts now in case anyone has an idea already...
I've dumped the (untested) branch where I've adapted [0] for the
NO_DIRECT_MAP usecase here:
https://github.com/bjackman/linux/tree/demo-guest_memfd-physmap
> [0] https://lore.kernel.org/all/20250924-b4-asi-page-alloc-v1-0-2d861768041f@google.com/T/#t
next prev parent reply other threads:[~2025-10-31 18:31 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-24 15:10 [PATCH v7 00/12] Direct Map Removal Support for guest_memfd Patrick Roy
2025-09-24 15:10 ` [PATCH v7 01/12] arch: export set_direct_map_valid_noflush to KVM module Patrick Roy
2025-09-24 15:10 ` [PATCH v7 02/12] x86/tlb: export flush_tlb_kernel_range " Patrick Roy
2025-09-24 15:10 ` [PATCH v7 03/12] mm: introduce AS_NO_DIRECT_MAP Patrick Roy
2025-09-24 15:22 ` [PATCH v7 04/12] KVM: guest_memfd: Add stub for kvm_arch_gmem_invalidate Roy, Patrick
2025-09-24 15:22 ` [PATCH v7 05/12] KVM: guest_memfd: Add flag to remove from direct map Roy, Patrick
2025-09-25 11:00 ` David Hildenbrand
2025-09-25 15:52 ` Roy, Patrick
2025-09-25 19:28 ` David Hildenbrand
2025-09-26 14:49 ` Patrick Roy
2025-10-31 17:30 ` Brendan Jackman
2025-11-01 9:39 ` Mike Rapoport
2025-11-03 10:35 ` Brendan Jackman
2025-11-03 10:50 ` Mike Rapoport
2025-11-04 11:08 ` Brendan Jackman
2025-11-10 12:34 ` Mike Rapoport
2025-11-03 7:57 ` Aneesh Kumar K.V
2025-09-24 15:22 ` [PATCH v7 06/12] KVM: guest_memfd: add module param for disabling TLB flushing Roy, Patrick
2025-09-25 11:02 ` David Hildenbrand
2025-09-25 15:50 ` Roy, Patrick
2025-09-25 19:32 ` David Hildenbrand
2025-09-25 18:27 ` Dave Hansen
2025-09-25 19:20 ` David Hildenbrand
2025-09-25 19:59 ` Dave Hansen
2025-09-25 20:13 ` David Hildenbrand
2025-09-26 9:46 ` Patrick Roy
2025-09-26 10:53 ` Will Deacon
2025-09-26 20:09 ` David Hildenbrand
2025-09-27 7:38 ` Patrick Roy
2025-09-29 10:20 ` David Hildenbrand
2025-10-11 14:32 ` Patrick Roy
2025-11-07 15:29 ` Ackerley Tng
2025-11-07 17:22 ` Nikita Kalyazin
2025-11-07 17:21 ` Nikita Kalyazin
2025-10-30 16:05 ` Brendan Jackman
2025-10-31 18:31 ` Brendan Jackman [this message]
2025-09-24 15:22 ` [PATCH v7 07/12] KVM: selftests: load elf via bounce buffer Roy, Patrick
2025-09-24 15:22 ` [PATCH v7 08/12] KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1 Roy, Patrick
2025-09-24 15:22 ` [PATCH v7 09/12] KVM: selftests: Add guest_memfd based vm_mem_backing_src_types Roy, Patrick
2025-09-24 15:22 ` [PATCH v7 10/12] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests Roy, Patrick
2025-09-24 15:22 ` [PATCH v7 11/12] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape Roy, Patrick
2025-09-24 15:22 ` [PATCH v7 12/12] KVM: selftests: Test guest execution from direct map removed gmem Roy, Patrick
2025-10-30 17:18 ` Brendan Jackman
2025-09-25 10:26 ` [PATCH v7 04/12] KVM: guest_memfd: Add stub for kvm_arch_gmem_invalidate David Hildenbrand
2025-09-25 10:25 ` [PATCH v7 03/12] mm: introduce AS_NO_DIRECT_MAP David Hildenbrand
2025-09-24 15:29 ` [PATCH v7 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
2025-09-24 15:38 ` David Hildenbrand
2025-11-07 15:54 ` Brendan Jackman
2025-11-07 17:23 ` Nikita Kalyazin
2025-11-07 18:04 ` Brendan Jackman
2025-11-07 18:11 ` Nikita Kalyazin
2025-11-10 15:36 ` Brendan Jackman
2025-11-07 17:37 ` Brendan Jackman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DDWPZY3AA7BX.1Y05FOYIHAI07@google.com \
--to=jackmanb@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=derekmn@amazon.co.uk \
--cc=eddyz87@gmail.com \
--cc=haoluo@google.com \
--cc=hpa@zytor.com \
--cc=jackabt@amazon.co.uk \
--cc=jannh@google.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=joey.gouly@arm.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kalyazin@amazon.co.uk \
--cc=kpsingh@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=martin.lau@linux.dev \
--cc=maz@kernel.org \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=pfalcato@suse.de \
--cc=roypat@amazon.co.uk \
--cc=rppt@kernel.org \
--cc=sdf@fomichev.me \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=surenb@google.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=xmarcalx@amazon.co.uk \
--cc=yonghong.song@linux.dev \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).