Re: [PATCH] hw/misc: Add a virtual pci device to dynamically attach memory to QEMU

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "david.dai" <david.dai@montage-tech.com>
To: "David Hildenbrand (david@redhat.com)" <david@redhat.com>
Cc: peter.maydell@linaro.org, vsementsov@virtuozzo.com,
	eajames@linux.ibm.com, qemu-devel@nongnu.org,
	changguo.du@montage-tech.com,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>,
	kuhn.chenqun@huawei.com
Subject: Re: [PATCH] hw/misc: Add a virtual pci device to dynamically attach memory to QEMU
Date: Sat, 9 Oct 2021 17:42:33 +0800	[thread overview]
Message-ID: <20211009094233.GA13867@tianmu-host-sw-01> (raw)
In-Reply-To: <5eba1406-4012-481a-b7ed-0090654668d2@redhat.com>

On Thu, Sep 30, 2021 at 12:33:30PM +0200, David Hildenbrand (david@redhat.com) wrote:
> 
> 
> On 30.09.21 11:40, david.dai wrote:
> > On Wed, Sep 29, 2021 at 11:30:53AM +0200, David Hildenbrand (david@redhat.com) wrote:
> > > 
> > > On 27.09.21 14:28, david.dai wrote:
> > > > On Mon, Sep 27, 2021 at 11:07:43AM +0200, David Hildenbrand (david@redhat.com) wrote:
> > > > > 
> > > > > CAUTION: This email originated from outside of the organization. Do not
> > > > > click links or open attachments unless you recognize the sender and know the
> > > > > content is safe.
> > > > > 
> > > > > 
> > > > > On 27.09.21 10:27, Stefan Hajnoczi wrote:
> > > > > > On Sun, Sep 26, 2021 at 10:16:14AM +0800, David Dai wrote:
> > > > > > > Add a virtual pci to QEMU, the pci device is used to dynamically attach memory
> > > > > > > to VM, so driver in guest can apply host memory in fly without virtualization
> > > > > > > management software's help, such as libvirt/manager. The attached memory is
> > > > > 
> > > > > We do have virtio-mem to dynamically attach memory to a VM. It could be
> > > > > extended by a mechanism for the VM to request more/less memory, that's
> > > > > already a planned feature. But yeah, virito-mem memory is exposed as
> > > > > ordinary system RAM, not only via a BAR to mostly be managed by user space
> > > > > completely.
> > > 
> > > There is a virtio-pmem spec proposal to expose the memory region via a PCI
> > > BAR. We could do something similar for virtio-mem, however, we would have to
> > > wire that new model up differently in QEMU (it's no longer a "memory device"
> > > like a DIMM then).
> > > 
> > > > > 
> > > > 
> > > > I wish virtio-mem can solve our problem, but it is a dynamic allocation mechanism
> > > > for system RAM in virtualization. In heterogeneous computing environments, the
> > > > attached memory usually comes from computing device, it should be managed separately.
> > > > we doesn't hope Linux MM controls it.
> > > 
> > > If that heterogeneous memory would have a dedicated node (which usually is
> > > the case IIRC) , and you let it manage by the Linux kernel (dax/kmem), you
> > > can bind the memory backend of virtio-mem to that special NUMA node. So all
> > > memory managed by that virtio-mem device would come from that heterogeneous
> > > memory.
> > > 
> > 
> > Yes, CXL type 2, 3 devices expose memory to host as a dedicated node, the node
> > is marked as soft_reserved_memory, dax/kmem can take over the node to create a
> > dax devcie. This dax device can be regarded as the memory backend of virtio-mem
> > 
> > I don't sure whether a dax device can be open by multiple VMs or host applications.
> 
> virito-mem currently relies on having a single sparse memory region (anon
> mmap, mmaped file, mmaped huge pages, mmap shmem) per VM. Although we can
> share memory with other processes, sharing with other VMs is not intended.
> Instead of actually mmaping parts dynamically (which can be quite
> expensive), virtio-mem relies on punching holes into the backend and
> dynamically allocating memory/file blocks/... on access.
> 
> So the easy way to make it work is:
> 
> a) Exposing the CXL memory to the buddy via dax/kmem, esulting in device
> memory getting managed by the buddy on a separate NUMA node.
>

Linux kernel buddy system? how to guarantee other applications don't apply memory
from it

>
> b) (optional) allocate huge pages on that separate NUMA node.
> c) Use ordinary memory-device-ram or memory-device-memfd (for huge pages),
> *bidning* the memory backend to that special NUMA node.
>
 
"-object memory-backend/device-ram or memory-device-memfd, id=mem0, size=768G"
How to bind backend memory to NUMA node

>
> This will dynamically allocate memory from that special NUMA node, resulting
> in the virtio-mem device completely being backed by that device memory,
> being able to dynamically resize the memory allocation.
> 
> 
> Exposing an actual devdax to the virtio-mem device, shared by multiple VMs
> isn't really what we want and won't work without major design changes. Also,
> I'm not so sure it's a very clean design: exposing memory belonging to other
> VMs to unrelated QEMU processes. This sounds like a serious security hole:
> if you managed to escalate to the QEMU process from inside the VM, you can
> access unrelated VM memory quite happily. You want an abstraction
> in-between, that makes sure each VM/QEMU process only sees private memory:
> for example, the buddy via dax/kmem.
> 
Hi David
Thanks for your suggestion, also sorry for my delayed reply due to my long vacation.
How does current virtio-mem dynamically attach memory to guest, via page fault?

Thanks,
David 


> -- 
> Thanks,
> 
> David / dhildenb
> 
>

next prev parent reply	other threads:[~2021-10-09  9:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-26  2:16 [PATCH] hw/misc: Add a virtual pci device to dynamically attach memory to QEMU David Dai
2021-09-27  8:27 ` Stefan Hajnoczi
2021-09-27  9:07   ` David Hildenbrand
2021-09-27 12:28     ` david.dai
2021-09-29  9:30       ` David Hildenbrand
2021-09-30  9:40         ` david.dai
2021-09-30 10:33           ` David Hildenbrand
2021-10-09  9:42             ` david.dai [this message]
2021-10-11  7:43               ` David Hildenbrand
2021-10-13  8:13                 ` david.dai
2021-10-13  8:33                   ` David Hildenbrand
2021-10-15  9:10                     ` david.dai
2021-10-15  9:27                       ` David Hildenbrand
2021-10-15  9:57                         ` david.dai
2021-09-27 12:17   ` david.dai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211009094233.GA13867@tianmu-host-sw-01 \
    --to=david.dai@montage-tech.com \
    --cc=changguo.du@montage-tech.com \
    --cc=david@redhat.com \
    --cc=eajames@linux.ibm.com \
    --cc=imammedo@redhat.com \
    --cc=kuhn.chenqun@huawei.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).