Re: folio_mmapped - David Hildenbrand

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Vishal Annapurve <vannapurve@google.com>,
	Quentin Perret <qperret@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Fuad Tabba <tabba@google.com>,
	kvm@vger.kernel.org, kvmarm@lists.linux.dev, pbonzini@redhat.com,
	chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org,
	paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk,
	brauner@kernel.org, akpm@linux-foundation.org,
	xiaoyao.li@intel.com, yilun.xu@intel.com,
	chao.p.peng@linux.intel.com, jarkko@kernel.org,
	amoorthy@google.com, dmatlack@google.com,
	yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com,
	mic@digikod.net, vbabka@suse.cz, ackerleytng@google.com,
	mail@maciej.szmigiero.name, michael.roth@amd.com,
	wei.w.wang@intel.com, liam.merwick@oracle.com,
	isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com,
	suzuki.poulose@arm.com, steven.price@arm.com,
	quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
	quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
	quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
	catalin.marinas@arm.com, james.morse@arm.com,
	yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org,
	will@kernel.org, keirf@google.com, linux-mm@kvack.org
Subject: Re: folio_mmapped
Date: Fri, 22 Mar 2024 18:16:14 +0100	[thread overview]
Message-ID: <e5f1c475-a8c7-461e-9ee8-75bcc000d9a5@redhat.com> (raw)
In-Reply-To: <Zfmpby6i3PfBEcCV@google.com>

On 19.03.24 16:04, Sean Christopherson wrote:
> On Tue, Mar 19, 2024, David Hildenbrand wrote:
>> On 19.03.24 01:10, Sean Christopherson wrote:
>>> Performance is a secondary concern.  If this were _just_ about guest performance,
>>> I would unequivocally side with David: the guest gets to keep the pieces if it
>>> fragments a 1GiB page.
>>>
>>> The main problem we're trying to solve is that we want to provision a host such
>>> that the host can serve 1GiB pages for non-CoCo VMs, and can also simultaneously
>>> run CoCo VMs, with 100% fungibility.  I.e. a host could run 100% non-CoCo VMs,
>>> 100% CoCo VMs, or more likely, some sliding mix of the two.  Ideally, CoCo VMs
>>> would also get the benefits of 1GiB mappings, that's not the driving motiviation
>>> for this discussion.
>>
>> Supporting 1 GiB mappings there sounds like unnecessary complexity and
>> opening a big can of worms, especially if "it's not the driving motivation".
>>
>> If I understand you correctly, the scenario is
>>
>> (1) We have free 1 GiB hugetlb pages lying around
>> (2) We want to start a CoCo VM
>> (3) We don't care about 1 GiB mappings for that CoCo VM,
> 
> We care about 1GiB mappings for CoCo VMs.  My comment about performance being a
> secondary concern was specifically saying that it's the guest's responsilibity
> to play nice with huge mappings if the guest cares about its performance.  For
> guests that are well behaved, we most definitely want to provide a configuration
> that performs as close to non-CoCo VMs as we can reasonably make it.

How does the guest know the granularity? I suspect it's just implicit 
knowledge that "PUD granularity might be nice".

> 
> And we can do that today, but it requires some amount of host memory to NOT be
> in the HugeTLB pool, and instead be kept in reserved so that it can be used for
> shared memory for CoCo VMs.  That approach has many downsides, as the extra memory
> overhead affects CoCo VM shapes, our ability to use a common pool for non-CoCo
> and CoCo VMs, and so on and so forth.

Right. But avoiding memory waste as soon as hugetlb is involved (and we 
have two separate memfds for private/shared memory) is not feasible.

> 
>>      but hguetlb pages is all we have.
>> (4) We want to be able to use the 1 GiB hugetlb page in the future.
> 
> ...
> 
>>> The other big advantage that we should lean into is that we can make assumptions
>>> about guest_memfd usage that would never fly for a general purpose backing stores,
>>> e.g. creating a dedicated memory pool for guest_memfd is acceptable, if not
>>> desirable, for (almost?) all of the CoCo use cases.
>>>
>>> I don't have any concrete ideas at this time, but my gut feeling is that this
>>> won't be _that_ crazy hard to solve if commit hard to guest_memfd _not_ being
>>> general purposes, and if we we account for conversion scenarios when designing
>>> hugepage support for guest_memfd.
>>
>> I'm hoping guest_memfd won't end up being the wild west of hacky MM ideas ;)
> 
> Quite the opposite, I'm saying we should be very deliberate in how we add hugepage
> support and others features to guest_memfd, so that guest_memfd doesn't become a
> hacky mess.

Good.

> 
> And I'm saying say we should stand firm in what guest_memfd _won't_ support, e.g.
> swap/reclaim and probably page migration should get a hard "no".

I thought people wanted to support at least page migration in the 
future? (for example, see the reply from Will)

> 
> In other words, ditch the complexity for features that are well served by existing
> general purpose solutions, so that guest_memfd can take on a bit of complexity to
> serve use cases that are unique to KVM guests, without becoming an unmaintainble
> mess due to cross-products.

And I believed that was true until people started wanting to mmap() this 
thing and brought GUP into the picture ... and then talk about HGM and 
all that. *shivers*

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2024-03-22 17:16 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240222161047.402609-1-tabba@google.com>
     [not found] ` <20240222141602976-0800.eberman@hu-eberman-lv.qualcomm.com>
2024-02-23  0:35   ` folio_mmapped Matthew Wilcox
2024-02-26  9:28     ` folio_mmapped David Hildenbrand
2024-02-26 21:14       ` folio_mmapped Elliot Berman
2024-02-27 14:59         ` folio_mmapped David Hildenbrand
2024-02-28 10:48           ` folio_mmapped Quentin Perret
2024-02-28 11:11             ` folio_mmapped David Hildenbrand
2024-02-28 12:44               ` folio_mmapped Quentin Perret
2024-02-28 13:00                 ` folio_mmapped David Hildenbrand
2024-02-28 13:34                   ` folio_mmapped Quentin Perret
2024-02-28 18:43                     ` folio_mmapped Elliot Berman
2024-02-28 18:51                       ` Quentin Perret
2024-02-29 10:04                     ` folio_mmapped David Hildenbrand
2024-02-29 19:01                       ` folio_mmapped Fuad Tabba
2024-03-01  0:40                         ` folio_mmapped Elliot Berman
2024-03-01 11:16                           ` folio_mmapped David Hildenbrand
2024-03-04 12:53                             ` folio_mmapped Quentin Perret
2024-03-04 20:22                               ` folio_mmapped David Hildenbrand
2024-03-01 11:06                         ` folio_mmapped David Hildenbrand
2024-03-04 12:36                       ` folio_mmapped Quentin Perret
2024-03-04 19:04                         ` folio_mmapped Sean Christopherson
2024-03-04 20:17                           ` folio_mmapped David Hildenbrand
2024-03-04 21:43                             ` folio_mmapped Elliot Berman
2024-03-04 21:58                               ` folio_mmapped David Hildenbrand
2024-03-19  9:47                                 ` folio_mmapped Quentin Perret
2024-03-19  9:54                                   ` folio_mmapped David Hildenbrand
2024-03-18 17:06                             ` folio_mmapped Vishal Annapurve
2024-03-18 22:02                               ` folio_mmapped David Hildenbrand
     [not found]                                 ` <CAGtprH8B8y0Khrid5X_1twMce7r-Z7wnBiaNOi-QwxVj4D+L3w@mail.gmail.com>
2024-03-19  0:10                                   ` folio_mmapped Sean Christopherson
2024-03-19 10:26                                     ` folio_mmapped David Hildenbrand
2024-03-19 13:19                                       ` folio_mmapped David Hildenbrand
2024-03-19 14:31                                       ` folio_mmapped Will Deacon
2024-03-19 23:54                                         ` folio_mmapped Elliot Berman
2024-03-22 16:36                                           ` Will Deacon
2024-03-22 18:46                                             ` Elliot Berman
2024-03-27 19:31                                               ` Will Deacon
     [not found]                                         ` <2d6fc3c0-a55b-4316-90b8-deabb065d007@redhat.com>
2024-03-22 21:21                                           ` folio_mmapped David Hildenbrand
2024-03-26 22:04                                             ` folio_mmapped Elliot Berman
2024-03-27 19:34                                           ` folio_mmapped Will Deacon
2024-03-28  9:06                                             ` folio_mmapped David Hildenbrand
2024-03-28 10:10                                               ` folio_mmapped Quentin Perret
2024-03-28 10:32                                                 ` folio_mmapped David Hildenbrand
2024-03-28 10:58                                                   ` folio_mmapped Quentin Perret
2024-03-28 11:41                                                     ` folio_mmapped David Hildenbrand
2024-03-29 18:38                                                       ` folio_mmapped Vishal Annapurve
2024-04-04  0:15                                             ` folio_mmapped Sean Christopherson
2024-03-19 15:04                                       ` folio_mmapped Sean Christopherson
2024-03-22 17:16                                         ` David Hildenbrand [this message]
2024-02-26  9:03   ` [RFC PATCH v1 00/26] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5f1c475-a8c7-461e-9ee8-75bcc000d9a5@redhat.com \
    --to=david@redhat.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=liam.merwick@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=seanjc@google.com \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yu.c.zhang@linux.intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).