Linux s390 Architecture development
 help / color / mirror / Atom feed
From: Takahiro Itazuri <itazur@amazon.com>
To: <fvdl@google.com>, <seanjc@google.com>, <ljs@kernel.org>
Cc: <Liam.Howlett@oracle.com>, <ackerleytng@google.com>,
	<agordeev@linux.ibm.com>, <ajones@ventanamicro.com>,
	<akpm@linux-foundation.org>, <alex@ghiti.fr>, <andrii@kernel.org>,
	<aou@eecs.berkeley.edu>, <ast@kernel.org>,
	<baolu.lu@linux.intel.com>, <borntraeger@linux.ibm.com>,
	<bp@alien8.de>, <bpf@vger.kernel.org>, <catalin.marinas@arm.com>,
	<chenhuacai@kernel.org>, <corbet@lwn.net>, <coxu@redhat.com>,
	<daniel@iogearbox.net>, <dave.hansen@linux.intel.com>,
	<david@kernel.org>, <derekmn@amazon.com>, <dev.jain@arm.com>,
	<eddyz87@gmail.com>, <gerald.schaefer@linux.ibm.com>,
	<gor@linux.ibm.com>, <haoluo@google.com>, <hca@linux.ibm.com>,
	<hpa@zytor.com>, <itazur@amazon.co.uk>, <jackabt@amazon.co.uk>,
	<jackmanb@google.com>, <jannh@google.com>, <jgg@ziepe.ca>,
	<jgross@suse.com>, <jhubbard@nvidia.com>,
	<jiayuan.chen@shopee.com>, <jmattson@google.com>,
	<joey.gouly@arm.com>, <john.fastabend@gmail.com>,
	<jolsa@kernel.org>, <jthoughton@google.com>,
	<kalyazin@amazon.co.uk>, <kas@kernel.org>, <kernel@xen0n.name>,
	<kpsingh@kernel.org>, <kvm@vger.kernel.org>,
	<kvmarm@lists.linux.dev>, <lenb@kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-doc@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linux-kselftest@vger.kernel.org>,
	<linux-mm@kvack.org>, <linux-pm@vger.kernel.org>,
	<linux-riscv@lists.infradead.org>, <linux-s390@vger.kernel.org>,
	<loongarch@lists.linux.dev>, <lorenzo.stoakes@oracle.com>,
	<luto@kernel.org>, <maobibo@loongson.cn>, <martin.lau@linux.dev>,
	<maz@kernel.org>, <mhocko@suse.com>, <mingo@redhat.com>,
	<mlevitsk@redhat.com>, <nikita.kalyazin@linux.dev>,
	<oupton@kernel.org>, <palmer@dabbelt.com>,
	<patrick.roy@linux.dev>, <pavel@kernel.org>,
	<pbonzini@redhat.com>, <peterx@redhat.com>,
	<peterz@infradead.org>, <pfalcato@suse.de>, <pjw@kernel.org>,
	<prsampat@amd.com>, <rafael@kernel.org>, <riel@surriel.com>,
	<rppt@kernel.org>, <ryan.roberts@arm.com>, <sdf@fomichev.me>,
	<shijie@os.amperecomputing.com>, <skhan@linuxfoundation.org>,
	<song@kernel.org>, <surenb@google.com>, <suzuki.poulose@arm.com>,
	<svens@linux.ibm.com>, <tabba@google.com>, <tglx@kernel.org>,
	<thuth@redhat.com>, <urezki@gmail.com>, <vannapurve@google.com>,
	<vbabka@kernel.org>, <will@kernel.org>, <willy@infradead.org>,
	<wu.fei9@sanechips.com.cn>, <x86@kernel.org>,
	<yang@os.amperecomputing.com>, <yangyicong@hisilicon.com>,
	<yonghong.song@linux.dev>, <yosry@kernel.org>,
	<yu-cheng.yu@intel.com>, <yuzenghui@huawei.com>,
	<zhengqi.arch@bytedance.com>, <zulinx86@gmai.com>
Subject: Re: [PATCH v12 10/16] KVM: guest_memfd: Add flag to remove from direct map
Date: Fri, 8 May 2026 08:18:10 +0000	[thread overview]
Message-ID: <20260508081812.12345-1-itazur@amazon.com> (raw)
In-Reply-To: <CAPTztWb67XZvfcMVnbegDNNW0LJa9UsaTGx3M898xJUJrekk0w@mail.gmail.com>

Hi Sean, Frank, Lorenzo,

On Tue, Apr 21, 2026 at 10:08:48AM -0700, Frank van der Linden wrote:
> On Tue, Apr 21, 2026 at 9:31 AM Sean Christopherson <seanjc@google.com> wrote:
> > Making guest_memfd responsible for zapping and restoring the direct map on a per-
> > folio basis feels wrong given the addition of AS_NO_DIRECT_MAP.  I especially don't
> > like that the "rules" for when an AS_NO_DIRECT_MAP folio has a direct map will vary
> > based on the owner, and even within an owner (e.g. guest_memfd) will be ad hoc.
> >
> > E.g. as per the series to add guest_memfd write() support[*]:
> >
> >   When direct map removal is implemented [2]
> >    - write() will not be allowed to access pages that have already
> >      been removed from direct map
> >    - on completion, write() will remove the populated pages from
> >      direct map
> >
> > That's pretty gross ABI, because with KVM_GMEM_FOLIO_NO_DIRECT_MAP, userspace can
> > write() exactly once.  To re-write memory, I assume userspace would need to do a
> > PUNCH_HOLE or truncate.
> >
> > What's preventing us from handling this automagically in e.g. filemap_add_folio()
> > and filemap_remove_folio()?  Then the usage rules are pretty straightforward: the
> > kernel must *always* assume the direct map is invalid for folios from
> > AS_NO_DIRECT_MAP mappings.
> >
> > Then if KVM needs to utilize a kernel mapping, e.g. in kvm_gmem_populate(), KVM
> > could use dedicated variants of kmap_local_xxx() to deal with a local mapping for
> > a folio/page without a direct map.  Or, KVM could simply disallow the specific
> > sequence that would require KVM to do the memcpy (I'm pretty sure we can do that
> > with in-place shared=>private conversion support).
> >
> > I realize that could throw a big wrench into write() performance, but IMO, before
> > merging either series, we need a complete story for exactly how this will all fit
> > together, in a maintainable fashion and with sane ABI.
>
> I agree with this - this approach would also allow for memory that was
> never in the direct map to begin with, or has been taken out already
> (for which I happen to have a use case :-)). guest_memfd and other
> code can then assume that AS_NO_DIRECT_MAP means they have to take
> explicit action to map it if needed. It's a clean, simple ABI.
>
> With the current set of patches, it seems like this couldn't be done
> in a clean manner.

Agreed with both of you.  I'll adopt the filemap-level approach:

- Move the zap/restore hooks from guest_memfd into filemap_add_folio()
  / filemap_remove_folio().
- Tighten AS_NO_DIRECT_MAP semantics so that, for folios in such a
  mapping, the direct map is invalid for the entire time the folio
  resides in the page cache.
- Drop the per-folio KVM_GMEM_FOLIO_NO_DIRECT_MAP bookkeeping in
  folio->private, since the existence of the folio in the mapping is
  itself the state.

On each guest memory population path,

- memcpy-based population from userspace goes through the userspace
  mapping of guest_memfd, not through the kernel direct map, so the
  filemap-level invariant doesn't affect it.  But this is slow, which
  is what motivated the write() syscall support.

- write(): meant to speed up the userspace-memcpy case above by doing
  the copy in the kernel.  I believe Brendan's __GFP_UNMAPPED/mermap
  work [1] would give us a low-overhead way to get temporary kernel
  access to an AS_NO_DIRECT_MAP.  Landing mermap may take a while, but
  this series does not introduce the write() path, so mermap is not a
  blocker for now.

- kvm_gmem_populate(): this is a TDX/SNP-only path, and NO_DIRECT_MAP
  is not available on those VM types —
  kvm_arch_gmem_supports_no_direct_map() returns false for
  KVM_X86_TDX_VM and KVM_X86_SNP_VM, which are its only callers
  today.  So it doesn't interact with the filemap invariant IIUC.

So, unless I'm missing any path, adopting the filemap-level approach in
this series should be fine.


I'd like to consult with you folks on how to proceed in advance.  In a
separate reply on the cover letter thread [2], Lorenzo and Sean
suggested that the mm pieces should go through the mm subsystem:

On Tue, Apr 21, 2026 at 04:36:00PM +0000, Sean Christopherson wrote:
> Yeah, when the time comes, the mm pieces definitely need to go through the mm
> tree.  Ideally, I think this would be merged in two separate parts, with all mm
> changes going through the mm tree, and then the KVM changes through the KVM tree
> using a stable topic branch/tag from Andrew.

I see two reasonable paths to get there, and would appreciate your
input on which you prefer:

Path A — validate on KVM side first, then split:
  - Post v13 as a single series on the KVM list, gather feedback and
    make sure the design is acceptable to KVM reviewers.
  - Once v13 looks good ("the time comes"), do the MM/KVM split,
    rebase the MM part onto the appropriate MM branch, and post the
    MM part to linux-mm to build consensus with MM maintainers.

Path B — split early and seek MM consensus in parallel:
  - With the filemap rework already in place, do the MM/KVM split
    now and post the MM part to linux-mm directly.  The KVM part follows
    on top of a stable topic from MM.

Which of the two would you rather see?  Happy to go either way.


[1] https://lore.kernel.org/all/20260320-page_alloc-unmapped-v2-0-28bf1bd54f41@google.com/
[2] https://lore.kernel.org/all/20260506080753.14517-1-itazur@amazon.com/

Takahiro


  reply	other threads:[~2026-05-08  8:18 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-10 15:17 [PATCH v12 00/16] Direct Map Removal Support for guest_memfd Kalyazin, Nikita
2026-04-10 15:17 ` [PATCH v12 01/16] set_memory: set_direct_map_* to take address Kalyazin, Nikita
2026-04-21 14:43   ` Lorenzo Stoakes
2026-04-10 15:18 ` [PATCH v12 02/16] set_memory: add folio_{zap,restore}_direct_map helpers Kalyazin, Nikita
2026-04-10 15:18 ` [PATCH v12 03/16] mm/secretmem: make use of folio_{zap,restore}_direct_map Kalyazin, Nikita
2026-04-10 15:18 ` [PATCH v12 04/16] mm/gup: drop secretmem optimization from gup_fast_folio_allowed Kalyazin, Nikita
2026-04-10 15:18 ` [PATCH v12 05/16] mm/gup: drop local variable in gup_fast_folio_allowed Kalyazin, Nikita
2026-04-10 15:18 ` [PATCH v12 06/16] mm: introduce AS_NO_DIRECT_MAP Kalyazin, Nikita
2026-04-10 15:19 ` [PATCH v12 07/16] KVM: guest_memfd: Add stub for kvm_arch_gmem_invalidate Kalyazin, Nikita
2026-04-10 15:19 ` [PATCH v12 08/16] KVM: x86: define kvm_arch_gmem_supports_no_direct_map() Kalyazin, Nikita
2026-04-10 15:19 ` [PATCH v12 09/16] KVM: arm64: " Kalyazin, Nikita
2026-04-21 16:55   ` Marc Zyngier
2026-04-10 15:19 ` [PATCH v12 10/16] KVM: guest_memfd: Add flag to remove from direct map Kalyazin, Nikita
2026-04-21 16:31   ` Sean Christopherson
2026-04-21 17:08     ` Frank van der Linden
2026-05-08  8:18       ` Takahiro Itazuri [this message]
2026-04-10 15:19 ` [PATCH v12 11/16] KVM: selftests: load elf via bounce buffer Kalyazin, Nikita
2026-04-10 15:19 ` [PATCH v12 12/16] KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1 Kalyazin, Nikita
2026-04-10 15:20 ` [PATCH v12 13/16] KVM: selftests: Add guest_memfd based vm_mem_backing_src_types Kalyazin, Nikita
2026-04-10 15:20 ` [PATCH v12 14/16] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests Kalyazin, Nikita
2026-04-10 15:20 ` [PATCH v12 15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape Kalyazin, Nikita
2026-04-10 15:20 ` [PATCH v12 16/16] KVM: selftests: Test guest execution from direct map removed gmem Kalyazin, Nikita
2026-04-21 13:40 ` [PATCH v12 00/16] Direct Map Removal Support for guest_memfd Lorenzo Stoakes
2026-04-21 16:36   ` Sean Christopherson
2026-05-06  8:07     ` Takahiro Itazuri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260508081812.12345-1-itazur@amazon.com \
    --to=itazur@amazon.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=ackerleytng@google.com \
    --cc=agordeev@linux.ibm.com \
    --cc=ajones@ventanamicro.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@ghiti.fr \
    --cc=andrii@kernel.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=ast@kernel.org \
    --cc=baolu.lu@linux.intel.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chenhuacai@kernel.org \
    --cc=corbet@lwn.net \
    --cc=coxu@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=derekmn@amazon.com \
    --cc=dev.jain@arm.com \
    --cc=eddyz87@gmail.com \
    --cc=fvdl@google.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=haoluo@google.com \
    --cc=hca@linux.ibm.com \
    --cc=hpa@zytor.com \
    --cc=itazur@amazon.co.uk \
    --cc=jackabt@amazon.co.uk \
    --cc=jackmanb@google.com \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jgross@suse.com \
    --cc=jhubbard@nvidia.com \
    --cc=jiayuan.chen@shopee.com \
    --cc=jmattson@google.com \
    --cc=joey.gouly@arm.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=jthoughton@google.com \
    --cc=kalyazin@amazon.co.uk \
    --cc=kas@kernel.org \
    --cc=kernel@xen0n.name \
    --cc=kpsingh@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=lenb@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=ljs@kernel.org \
    --cc=loongarch@lists.linux.dev \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=luto@kernel.org \
    --cc=maobibo@loongson.cn \
    --cc=martin.lau@linux.dev \
    --cc=maz@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=mlevitsk@redhat.com \
    --cc=nikita.kalyazin@linux.dev \
    --cc=oupton@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=patrick.roy@linux.dev \
    --cc=pavel@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pfalcato@suse.de \
    --cc=pjw@kernel.org \
    --cc=prsampat@amd.com \
    --cc=rafael@kernel.org \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=sdf@fomichev.me \
    --cc=seanjc@google.com \
    --cc=shijie@os.amperecomputing.com \
    --cc=skhan@linuxfoundation.org \
    --cc=song@kernel.org \
    --cc=surenb@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=svens@linux.ibm.com \
    --cc=tabba@google.com \
    --cc=tglx@kernel.org \
    --cc=thuth@redhat.com \
    --cc=urezki@gmail.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@kernel.org \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=wu.fei9@sanechips.com.cn \
    --cc=x86@kernel.org \
    --cc=yang@os.amperecomputing.com \
    --cc=yangyicong@hisilicon.com \
    --cc=yonghong.song@linux.dev \
    --cc=yosry@kernel.org \
    --cc=yu-cheng.yu@intel.com \
    --cc=yuzenghui@huawei.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=zulinx86@gmai.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox