linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Gowans, James" <jgowans@amazon.com>,
	"jack@suse.cz" <jack@suse.cz>,
	"muchun.song@linux.dev" <muchun.song@linux.dev>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"rppt@kernel.org" <rppt@kernel.org>,
	"brauner@kernel.org" <brauner@kernel.org>, "Graf (AWS),
	Alexander" <graf@amazon.de>,
	"anthony.yznaga@oracle.com" <anthony.yznaga@oracle.com>,
	"steven.sistare@oracle.com" <steven.sistare@oracle.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Durrant, Paul" <pdurrant@amazon.co.uk>,
	"seanjc@google.com" <seanjc@google.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"Woodhouse, David" <dwmw@amazon.co.uk>,
	"Saenz Julienne, Nicolas" <nsaenz@amazon.es>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"nh-open-source@amazon.com" <nh-open-source@amazon.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"jgg@ziepe.ca" <jgg@ziepe.ca>
Subject: Re: [PATCH 00/10] Introduce guestmemfs: persistent in-memory filesystem
Date: Tue, 6 Aug 2024 15:43:24 +0200	[thread overview]
Message-ID: <883a0f0d-7342-479e-aa3c-13deb7e99338@redhat.com> (raw)
In-Reply-To: <9802ddc299c72b189487fd56668de65a84f7d94b.camel@amazon.com>

> 1. Secret hiding: with guestmemfs all of the memory is out of the kernel
> direct map as an additional defence mechanism. This means no
> read()/write() syscalls to guestmemfs files, and no IO to it. The only
> way to access it is to mmap the file.

There are people interested into similar things for guest_memfd.

> 
> 2. No struct page overhead: the intended use case is for systems whose
> sole job is to be a hypervisor, typically for large (multi-GiB) VMs, so
> the majority of system RAM would be donated to this fs. We definitely
> don't want 4 KiB struct pages here as it would be a significant
> overhead. That's why guestmemfs carves the memory out in early boot and
> sets memblock flags to avoid struct page allocation. I don't know if
> hugetlbfs does anything fancy to avoid allocating PTE-level struct pages
> for its memory?

Sure, it's called HVO and can optimize out a significant portion of the 
vmemmap.

> 
> 3. guest_memfd interface: For confidential computing use-cases we need
> to provide a guest_memfd style interface so that these FDs can be used
> as a guest_memfd file in KVM memslots. Would there be interest in
> extending hugetlbfs to also support a guest_memfd style interface?
> 

"Extending hugetlbfs" sounds wrong; hugetlbfs is a blast from the past 
and not something people are particularly keen to extend for such use 
cases. :)

Instead, as Jason said, we're looking into letting guest_memfd own and 
manage large chunks of contiguous memory.

> 4. Metadata designed for persistence: guestmemfs will need to keep
> simple internal metadata data structures (limited allocations, limited
> fragmentation) so that pages can easily and efficiently be marked as
> persistent via KHO. Something like slab allocations would probably be a
> no-go as then we'd need to persist and reconstruct the slab allocator. I
> don't know how hugetlbfs structures its fs metadata but I'm guessing it
> uses the slab and does lots of small allocations so trying to retrofit
> persistence via KHO to it may be challenging.
> 
> 5. Integration with persistent IOMMU mappings: to keep DMA running
> across kexec, iommufd needs to know that the backing memory for an IOAS
> is persistent too. The idea is to do some DMA pinning of persistent
> files, which would require iommufd/guestmemfs integration - would we
> want to add this to hugetlbfs?
> 
> 6. Virtualisation-specific APIs: starting to get a bit esoteric here,
> but use-cases like being able to carve out specific chunks of memory
> from a running VM and turn it into memory for another side car VM, or
> doing post-copy LM via DMA by mapping memory into the IOMMU but taking
> page faults on the CPU. This may require virtualisation-specific ioctls
> on the files which wouldn't be generally applicable to hugetlbfs.
> 
> 7. NUMA control: a requirement is to always have correct NUMA affinity.
> While currently not implemented the idea is to extend the guestmemfs
> allocation to support specifying allocation sizes from each NUMA node at
> early boot, and then having multiple mount points, one per NUMA node (or
> something like that...). Unclear if this is something hugetlbfs would
> want.
> 
> There are probably more potential issues, but those are the ones that
> come to mind... That being said, if hugetlbfs maintainers are interested
> in going in this direction then we can definitely look at enhancing
> hugetlbfs.
> 
> I think there are two types of problems: "Would hugetlbfs want this
> functionality?" - that's the majority. An a few are "This would be hard
> with hugetlbfs!" - persistence probably falls into this category.

I'm much rather asking myself if you should instead teach/extend the 
guest_memfd concept by some of what you propose here.

At least "guest_memfd" sounds a lot like the "anonymous fd" based 
variant of guestmemfs ;)

Like we have hugetlbfs and memfd with hugetlb pages.

-- 
Cheers,

David / dhildenb


  reply	other threads:[~2024-08-06 13:43 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-05  9:32 [PATCH 00/10] Introduce guestmemfs: persistent in-memory filesystem James Gowans
2024-08-05  9:32 ` [PATCH 01/10] guestmemfs: Introduce filesystem skeleton James Gowans
2024-08-05 10:20   ` Christian Brauner
2024-08-05  9:32 ` [PATCH 02/10] guestmemfs: add inode store, files and dirs James Gowans
2024-08-05  9:32 ` [PATCH 03/10] guestmemfs: add persistent data block allocator James Gowans
2024-08-05  9:32 ` [PATCH 04/10] guestmemfs: support file truncation James Gowans
2024-08-05  9:32 ` [PATCH 05/10] guestmemfs: add file mmap callback James Gowans
2024-10-29 23:05   ` Elliot Berman
2024-10-30 22:18     ` Frank van der Linden
2024-11-01 12:55       ` Gowans, James
2024-10-31 15:30     ` Gowans, James
2024-10-31 16:06       ` Jason Gunthorpe
2024-11-01 13:01         ` Gowans, James
2024-11-01 13:42           ` Jason Gunthorpe
2024-11-02  8:24             ` Gowans, James
2024-11-04 11:11               ` Mike Rapoport
2024-11-04 14:39               ` Jason Gunthorpe
2024-11-04 10:49             ` Mike Rapoport
2024-08-05  9:32 ` [PATCH 06/10] kexec/kho: Add addr flag to not initialise memory James Gowans
2024-08-05  9:32 ` [PATCH 07/10] guestmemfs: Persist filesystem metadata via KHO James Gowans
2024-08-05  9:32 ` [PATCH 08/10] guestmemfs: Block modifications when serialised James Gowans
2024-08-05  9:32 ` [PATCH 09/10] guestmemfs: Add documentation and usage instructions James Gowans
2024-08-05  9:32 ` [PATCH 10/10] MAINTAINERS: Add maintainers for guestmemfs James Gowans
2024-08-05 14:32 ` [PATCH 00/10] Introduce guestmemfs: persistent in-memory filesystem Theodore Ts'o
2024-08-05 14:41   ` Paolo Bonzini
2024-08-05 19:47     ` Gowans, James
2024-08-05 19:53   ` Gowans, James
2024-08-05 20:01 ` Jan Kara
2024-08-05 23:29   ` Jason Gunthorpe
2024-08-06  8:26     ` Gowans, James
2024-08-06  8:12   ` Gowans, James
2024-08-06 13:43     ` David Hildenbrand [this message]
2024-08-07 23:45 ` David Matlack
2024-10-17  4:53 ` Vishal Annapurve
2024-11-01 12:53   ` Gowans, James

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=883a0f0d-7342-479e-aa3c-13deb7e99338@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anthony.yznaga@oracle.com \
    --cc=brauner@kernel.org \
    --cc=dwmw@amazon.co.uk \
    --cc=graf@amazon.de \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jgowans@amazon.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=nh-open-source@amazon.com \
    --cc=nsaenz@amazon.es \
    --cc=pbonzini@redhat.com \
    --cc=pdurrant@amazon.co.uk \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=steven.sistare@oracle.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).