From: David Hildenbrand <david@redhat.com>
To: "Gowans, James" <jgowans@amazon.com>,
"jack@suse.cz" <jack@suse.cz>,
"muchun.song@linux.dev" <muchun.song@linux.dev>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"rppt@kernel.org" <rppt@kernel.org>,
"brauner@kernel.org" <brauner@kernel.org>, "Graf (AWS),
Alexander" <graf@amazon.de>,
"anthony.yznaga@oracle.com" <anthony.yznaga@oracle.com>,
"steven.sistare@oracle.com" <steven.sistare@oracle.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Durrant, Paul" <pdurrant@amazon.co.uk>,
"seanjc@google.com" <seanjc@google.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"Woodhouse, David" <dwmw@amazon.co.uk>,
"Saenz Julienne, Nicolas" <nsaenz@amazon.es>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"nh-open-source@amazon.com" <nh-open-source@amazon.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"jgg@ziepe.ca" <jgg@ziepe.ca>
Subject: Re: [PATCH 00/10] Introduce guestmemfs: persistent in-memory filesystem
Date: Tue, 6 Aug 2024 15:43:24 +0200 [thread overview]
Message-ID: <883a0f0d-7342-479e-aa3c-13deb7e99338@redhat.com> (raw)
In-Reply-To: <9802ddc299c72b189487fd56668de65a84f7d94b.camel@amazon.com>
> 1. Secret hiding: with guestmemfs all of the memory is out of the kernel
> direct map as an additional defence mechanism. This means no
> read()/write() syscalls to guestmemfs files, and no IO to it. The only
> way to access it is to mmap the file.
There are people interested into similar things for guest_memfd.
>
> 2. No struct page overhead: the intended use case is for systems whose
> sole job is to be a hypervisor, typically for large (multi-GiB) VMs, so
> the majority of system RAM would be donated to this fs. We definitely
> don't want 4 KiB struct pages here as it would be a significant
> overhead. That's why guestmemfs carves the memory out in early boot and
> sets memblock flags to avoid struct page allocation. I don't know if
> hugetlbfs does anything fancy to avoid allocating PTE-level struct pages
> for its memory?
Sure, it's called HVO and can optimize out a significant portion of the
vmemmap.
>
> 3. guest_memfd interface: For confidential computing use-cases we need
> to provide a guest_memfd style interface so that these FDs can be used
> as a guest_memfd file in KVM memslots. Would there be interest in
> extending hugetlbfs to also support a guest_memfd style interface?
>
"Extending hugetlbfs" sounds wrong; hugetlbfs is a blast from the past
and not something people are particularly keen to extend for such use
cases. :)
Instead, as Jason said, we're looking into letting guest_memfd own and
manage large chunks of contiguous memory.
> 4. Metadata designed for persistence: guestmemfs will need to keep
> simple internal metadata data structures (limited allocations, limited
> fragmentation) so that pages can easily and efficiently be marked as
> persistent via KHO. Something like slab allocations would probably be a
> no-go as then we'd need to persist and reconstruct the slab allocator. I
> don't know how hugetlbfs structures its fs metadata but I'm guessing it
> uses the slab and does lots of small allocations so trying to retrofit
> persistence via KHO to it may be challenging.
>
> 5. Integration with persistent IOMMU mappings: to keep DMA running
> across kexec, iommufd needs to know that the backing memory for an IOAS
> is persistent too. The idea is to do some DMA pinning of persistent
> files, which would require iommufd/guestmemfs integration - would we
> want to add this to hugetlbfs?
>
> 6. Virtualisation-specific APIs: starting to get a bit esoteric here,
> but use-cases like being able to carve out specific chunks of memory
> from a running VM and turn it into memory for another side car VM, or
> doing post-copy LM via DMA by mapping memory into the IOMMU but taking
> page faults on the CPU. This may require virtualisation-specific ioctls
> on the files which wouldn't be generally applicable to hugetlbfs.
>
> 7. NUMA control: a requirement is to always have correct NUMA affinity.
> While currently not implemented the idea is to extend the guestmemfs
> allocation to support specifying allocation sizes from each NUMA node at
> early boot, and then having multiple mount points, one per NUMA node (or
> something like that...). Unclear if this is something hugetlbfs would
> want.
>
> There are probably more potential issues, but those are the ones that
> come to mind... That being said, if hugetlbfs maintainers are interested
> in going in this direction then we can definitely look at enhancing
> hugetlbfs.
>
> I think there are two types of problems: "Would hugetlbfs want this
> functionality?" - that's the majority. An a few are "This would be hard
> with hugetlbfs!" - persistence probably falls into this category.
I'm much rather asking myself if you should instead teach/extend the
guest_memfd concept by some of what you propose here.
At least "guest_memfd" sounds a lot like the "anonymous fd" based
variant of guestmemfs ;)
Like we have hugetlbfs and memfd with hugetlb pages.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-08-06 13:43 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-05 9:32 [PATCH 00/10] Introduce guestmemfs: persistent in-memory filesystem James Gowans
2024-08-05 9:32 ` [PATCH 01/10] guestmemfs: Introduce filesystem skeleton James Gowans
2024-08-05 10:20 ` Christian Brauner
2024-08-05 9:32 ` [PATCH 02/10] guestmemfs: add inode store, files and dirs James Gowans
2024-08-05 9:32 ` [PATCH 03/10] guestmemfs: add persistent data block allocator James Gowans
2024-08-05 9:32 ` [PATCH 04/10] guestmemfs: support file truncation James Gowans
2024-08-05 9:32 ` [PATCH 05/10] guestmemfs: add file mmap callback James Gowans
2024-10-29 23:05 ` Elliot Berman
2024-10-30 22:18 ` Frank van der Linden
2024-11-01 12:55 ` Gowans, James
2024-10-31 15:30 ` Gowans, James
2024-10-31 16:06 ` Jason Gunthorpe
2024-11-01 13:01 ` Gowans, James
2024-11-01 13:42 ` Jason Gunthorpe
2024-11-02 8:24 ` Gowans, James
2024-11-04 11:11 ` Mike Rapoport
2024-11-04 14:39 ` Jason Gunthorpe
2024-11-04 10:49 ` Mike Rapoport
2024-08-05 9:32 ` [PATCH 06/10] kexec/kho: Add addr flag to not initialise memory James Gowans
2024-08-05 9:32 ` [PATCH 07/10] guestmemfs: Persist filesystem metadata via KHO James Gowans
2024-08-05 9:32 ` [PATCH 08/10] guestmemfs: Block modifications when serialised James Gowans
2024-08-05 9:32 ` [PATCH 09/10] guestmemfs: Add documentation and usage instructions James Gowans
2024-08-05 9:32 ` [PATCH 10/10] MAINTAINERS: Add maintainers for guestmemfs James Gowans
2024-08-05 14:32 ` [PATCH 00/10] Introduce guestmemfs: persistent in-memory filesystem Theodore Ts'o
2024-08-05 14:41 ` Paolo Bonzini
2024-08-05 19:47 ` Gowans, James
2024-08-05 19:53 ` Gowans, James
2024-08-05 20:01 ` Jan Kara
2024-08-05 23:29 ` Jason Gunthorpe
2024-08-06 8:26 ` Gowans, James
2024-08-06 8:12 ` Gowans, James
2024-08-06 13:43 ` David Hildenbrand [this message]
2024-08-07 23:45 ` David Matlack
2024-10-17 4:53 ` Vishal Annapurve
2024-11-01 12:53 ` Gowans, James
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=883a0f0d-7342-479e-aa3c-13deb7e99338@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anthony.yznaga@oracle.com \
--cc=brauner@kernel.org \
--cc=dwmw@amazon.co.uk \
--cc=graf@amazon.de \
--cc=jack@suse.cz \
--cc=jgg@ziepe.ca \
--cc=jgowans@amazon.com \
--cc=kvm@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=muchun.song@linux.dev \
--cc=nh-open-source@amazon.com \
--cc=nsaenz@amazon.es \
--cc=pbonzini@redhat.com \
--cc=pdurrant@amazon.co.uk \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=steven.sistare@oracle.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).