linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Sean Christopherson <seanjc@google.com>,
	Vishal Annapurve <vannapurve@google.com>
Cc: Quentin Perret <qperret@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Fuad Tabba <tabba@google.com>,
	kvm@vger.kernel.org, kvmarm@lists.linux.dev, pbonzini@redhat.com,
	chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org,
	paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk,
	brauner@kernel.org, akpm@linux-foundation.org,
	xiaoyao.li@intel.com, yilun.xu@intel.com,
	chao.p.peng@linux.intel.com, jarkko@kernel.org,
	amoorthy@google.com, dmatlack@google.com,
	yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com,
	mic@digikod.net, vbabka@suse.cz, ackerleytng@google.com,
	mail@maciej.szmigiero.name, michael.roth@amd.com,
	wei.w.wang@intel.com, liam.merwick@oracle.com,
	isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com,
	suzuki.poulose@arm.com, steven.price@arm.com,
	quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
	quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
	quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
	catalin.marinas@arm.com, james.morse@arm.com,
	yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org,
	will@kernel.org, keirf@google.com, linux-mm@kvack.org
Subject: Re: folio_mmapped
Date: Tue, 19 Mar 2024 14:19:37 +0100	[thread overview]
Message-ID: <0b1e1054-012f-4cc3-9d25-1147fa8cd6f0@redhat.com> (raw)
In-Reply-To: <40f82a61-39b0-4dda-ac32-a7b5da2a31e8@redhat.com>

>>> I had started a discussion for this [2] using an RFC series.
>>
>> David is talking about the host side of things, AFAICT you're talking about the
>> guest side...
>>
>>> challenge here remain:
>>> 1) Unifying all the conversions under one layer
>>> 2) Ensuring shared memory allocations are huge page aligned at boot
>>> time and runtime.
>>>
>>> Using any kind of unified shared memory allocator (today this part is
>>> played by SWIOTLB) will need to support huge page aligned dynamic
>>> increments, which can be only guaranteed by carving out enough memory
>>> at boot time for CMA area and using CMA area for allocation at
>>> runtime.
>>>      - Since it's hard to come up with a maximum amount of shared memory
>>> needed by VM, especially with GPUs/TPUs around, it's difficult to come
>>> up with CMA area size at boot time.
>>
>> ...which is very relevant as carving out memory in the guest is nigh impossible,
>> but carving out memory in the host for systems whose sole purpose is to run VMs
>> is very doable.
>>
>>> I think it's arguable that even if a VM converts 10 % of its memory to
>>> shared using 4k granularity, we still have fewer page table walks on
>>> the rest of the memory when using 1G/2M pages, which is a significant
>>> portion.
>>
>> Performance is a secondary concern.  If this were _just_ about guest performance,
>> I would unequivocally side with David: the guest gets to keep the pieces if it
>> fragments a 1GiB page.
>>
>> The main problem we're trying to solve is that we want to provision a host such
>> that the host can serve 1GiB pages for non-CoCo VMs, and can also simultaneously
>> run CoCo VMs, with 100% fungibility.  I.e. a host could run 100% non-CoCo VMs,
>> 100% CoCo VMs, or more likely, some sliding mix of the two.  Ideally, CoCo VMs
>> would also get the benefits of 1GiB mappings, that's not the driving motiviation
>> for this discussion.
> 
> Supporting 1 GiB mappings there sounds like unnecessary complexity and
> opening a big can of worms, especially if "it's not the driving motivation".
> 
> If I understand you correctly, the scenario is
> 
> (1) We have free 1 GiB hugetlb pages lying around
> (2) We want to start a CoCo VM
> (3) We don't care about 1 GiB mappings for that CoCo VM, but hguetlb
>       pages is all we have.
> (4) We want to be able to use the 1 GiB hugetlb page in the future.
> 
> With hugetlb, it's possible to reserve a CMA area from which to later
> allocate 1 GiB pages. While not allocated, they can be used for movable
> allocations.
> 
> So in the scenario above, free the hugetlb pages back to CMA. Then,
> consume them as 4K pages for the CoCo VM. When wanting to start a
> non-CoCo VM, re-allocate them from CMA.
> 
> One catch with that is that
> (a) CMA pages cannot get longterm-pinned: for obvious reasons, we
>       wouldn't be able to migrate them in order to free up the 1 GiB page.
> (b) guest_memfd pages are not movable and cannot currently end up on CMA
>       memory.
> 
> But maybe that's not actually required in this scenario and we'd like to
> have slightly different semantics: if you were to give the CoCo VM the 1
> GiB pages, they would similarly be unusable until that VM quit and freed
> up the memory!
> 
> So it might be acceptable to get "selected" unmovable allocations (from
> guest_memfd) on selected (hugetlb) CMA area, that the "owner" will free
> up when wanting to re-allocate that memory. Otherwise, the CMA
> allocation will simply fail.
> 
> If we need improvements in that area to support this case, we can talk.
> Just an idea to avoid HGM and friends just to make it somehow work with
> 1 GiB pages ...


Thought about that some more and some cases can also be tricky (avoiding 
fragmenting multiple 1 GiB pages ...).

It's all tricky, especially once multiple (guest_)memfds are involved 
and we really want to avoid most waste. Knowing that large mappings for 
CoCo are rather "optional" and that the challenge is in "reusing" large 
pages is valuable, though.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-03-19 13:19 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240222161047.402609-1-tabba@google.com>
     [not found] ` <20240222141602976-0800.eberman@hu-eberman-lv.qualcomm.com>
2024-02-23  0:35   ` folio_mmapped Matthew Wilcox
2024-02-26  9:28     ` folio_mmapped David Hildenbrand
2024-02-26 21:14       ` folio_mmapped Elliot Berman
2024-02-27 14:59         ` folio_mmapped David Hildenbrand
2024-02-28 10:48           ` folio_mmapped Quentin Perret
2024-02-28 11:11             ` folio_mmapped David Hildenbrand
2024-02-28 12:44               ` folio_mmapped Quentin Perret
2024-02-28 13:00                 ` folio_mmapped David Hildenbrand
2024-02-28 13:34                   ` folio_mmapped Quentin Perret
2024-02-28 18:43                     ` folio_mmapped Elliot Berman
2024-02-28 18:51                       ` Quentin Perret
2024-02-29 10:04                     ` folio_mmapped David Hildenbrand
2024-02-29 19:01                       ` folio_mmapped Fuad Tabba
2024-03-01  0:40                         ` folio_mmapped Elliot Berman
2024-03-01 11:16                           ` folio_mmapped David Hildenbrand
2024-03-04 12:53                             ` folio_mmapped Quentin Perret
2024-03-04 20:22                               ` folio_mmapped David Hildenbrand
2024-03-01 11:06                         ` folio_mmapped David Hildenbrand
2024-03-04 12:36                       ` folio_mmapped Quentin Perret
2024-03-04 19:04                         ` folio_mmapped Sean Christopherson
2024-03-04 20:17                           ` folio_mmapped David Hildenbrand
2024-03-04 21:43                             ` folio_mmapped Elliot Berman
2024-03-04 21:58                               ` folio_mmapped David Hildenbrand
2024-03-19  9:47                                 ` folio_mmapped Quentin Perret
2024-03-19  9:54                                   ` folio_mmapped David Hildenbrand
2024-03-18 17:06                             ` folio_mmapped Vishal Annapurve
2024-03-18 22:02                               ` folio_mmapped David Hildenbrand
     [not found]                                 ` <CAGtprH8B8y0Khrid5X_1twMce7r-Z7wnBiaNOi-QwxVj4D+L3w@mail.gmail.com>
2024-03-19  0:10                                   ` folio_mmapped Sean Christopherson
2024-03-19 10:26                                     ` folio_mmapped David Hildenbrand
2024-03-19 13:19                                       ` David Hildenbrand [this message]
2024-03-19 14:31                                       ` folio_mmapped Will Deacon
2024-03-19 23:54                                         ` folio_mmapped Elliot Berman
2024-03-22 16:36                                           ` Will Deacon
2024-03-22 18:46                                             ` Elliot Berman
2024-03-27 19:31                                               ` Will Deacon
     [not found]                                         ` <2d6fc3c0-a55b-4316-90b8-deabb065d007@redhat.com>
2024-03-22 21:21                                           ` folio_mmapped David Hildenbrand
2024-03-26 22:04                                             ` folio_mmapped Elliot Berman
2024-03-27 19:34                                           ` folio_mmapped Will Deacon
2024-03-28  9:06                                             ` folio_mmapped David Hildenbrand
2024-03-28 10:10                                               ` folio_mmapped Quentin Perret
2024-03-28 10:32                                                 ` folio_mmapped David Hildenbrand
2024-03-28 10:58                                                   ` folio_mmapped Quentin Perret
2024-03-28 11:41                                                     ` folio_mmapped David Hildenbrand
2024-03-29 18:38                                                       ` folio_mmapped Vishal Annapurve
2024-04-04  0:15                                             ` folio_mmapped Sean Christopherson
2024-03-19 15:04                                       ` folio_mmapped Sean Christopherson
2024-03-22 17:16                                         ` folio_mmapped David Hildenbrand
2024-02-26  9:03   ` [RFC PATCH v1 00/26] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0b1e1054-012f-4cc3-9d25-1147fa8cd6f0@redhat.com \
    --to=david@redhat.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=liam.merwick@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=seanjc@google.com \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yu.c.zhang@linux.intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).