Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory

Linux CXL
 help / color / mirror / Atom feed

From: Dragan Stancevic <dragan@stancevic.com>
To: Dave Hansen <dave.hansen@intel.com>, lsf-pc@lists.linux-foundation.org
Cc: nil-migration@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
Date: Mon, 1 May 2023 18:49:12 -0500	[thread overview]
Message-ID: <9130d889-7cfe-9040-d887-380be67410d2@stancevic.com> (raw)
In-Reply-To: <14a601ea-8cf8-bb9c-a87a-63567c5aba5b@intel.com>

Hi Dave-

sorry, looks like I've missed your email

On 4/11/23 13:00, Dave Hansen wrote:
> On 4/7/23 14:05, Dragan Stancevic wrote:
>> I'd be interested in doing a small BoF session with some slides and get
>> into a discussion/brainstorming with other people that deal with VM/LM
>> cloud loads. Among other things to discuss would be page migrations over
>> switched CXL memory, shared in-memory ABI to allow VM hand-off between
>> hypervisors, etc...
> 
> How would 'struct page' or other kernel metadata be handled?
> 
> I assume you'd want a really big CXL memory device with as many hosts
> connected to it as is feasible.  But, in order to hand the memory off
> from one host to another, both would need to have metadata for it at
> _some_ point.

To be honest, I have not been thinking of this in terms of a "star" 
connection topology. Where say each host in a rack connects to the same 
memory device, I think I'd get bottle-necked on a singular device. Evac 
of a few hypervisors simultaneously might get a bit dicey.

I've been thinking of it more in terms of multiple memory devices per 
rack, connected to various hypervisors to form a hypervisor traversal 
graph[1]. For example in this graph, a VM would migrate across a single 
hop, or a few hops to reach it's destination hypervisor. And for the 
lack of better word, this would be your "migration namespace" to migrate 
the VM across the rack. The critical connections in the graph are 
hostfoo04 and hostfoo09, and those you'd use if you want to pop the VM 
into a different "migration namespace", for example a different rack or 
maybe even a pod.

Of course, this is quite a ways out since there are no CXL 3.0 devices 
yet. As a first step I would like to get to a point where I can emulate 
this with qemu and just prototype various approaches, but starting with 
a single emulated memory device and two hosts.

> So, do all hosts have metadata for the whole CXL memory device all the
> time?  Or, would they create the metadata (hotplug) when a VM is
> migrated in and destroy it (hot unplug) when a VM is migrated out?

To be honest I have not thought about hot plugging, but might be 
something for me to keep in mind and ponder about it. And if you have 
additional thoughts on this I'd love to hear them.

What I was thinking, and this may or may not be possible, or may be 
possible only to a certain extent, but my preference would be to keep as 
much of the metadata as possible on the memory device itself and have 
the hypervisors cooperate through some kind of ownership mechanism.

> That gets back to the granularity question discussed elsewhere in the
> thread.  How would the metadata allocation granularity interact with the
> page allocation granularity?  How would fragmentation be avoided so that
> hosts don't eat up all their RAM with unused metadata?

Yeah, this is something I am still running through my head. Even if we 
have this "ownership-cooperation", is this based on pages, what happens 
to the sub-page allocations, do we move them through the buckets or do 
we attach ownership to sub-page allocations too. In my ideal world, 
you'd have two hypervisors cooperate over this memory as transparently 
as CPUs in a single system collaborating across NUMA nodes. A lot to 
think about, many problems to solve and a lot of work to do. I don't 
have all the answers yet, but value all input & help

[1]. https://nil-migration.org/VM-Graph.png

--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla

next prev parent reply	other threads:[~2023-05-01 23:51 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20230410030532epcas2p49eae675396bf81658c1a3401796da1d4@epcas2p4.samsung.com>
2023-04-07 21:05 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dragan Stancevic
2023-04-07 22:23   ` James Houghton
2023-04-07 23:17     ` David Rientjes
2023-04-08  1:33       ` Dragan Stancevic
2023-04-08 16:24       ` Dragan Stancevic
2023-04-08  0:05   ` Gregory Price
2023-04-11  0:56     ` Dragan Stancevic
2023-04-11  1:48       ` Gregory Price
2023-04-14  3:32         ` Dragan Stancevic
2023-04-14 13:16           ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Jonathan Cameron
2023-04-11  6:37     ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Huang, Ying
2023-04-11 15:36       ` Gregory Price
2023-04-12  2:54         ` Huang, Ying
2023-04-12  8:38           ` David Hildenbrand
2023-04-12 11:10             ` FW: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-12 11:26               ` David Hildenbrand
2023-04-14  8:41                 ` Kyungsan Kim
2023-04-12 15:40               ` Matthew Wilcox
2023-04-14  8:41                 ` Kyungsan Kim
2023-04-12 15:15             ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory James Bottomley
2023-05-03 23:42               ` Dragan Stancevic
2023-04-12 15:26             ` Gregory Price
2023-04-12 15:50               ` David Hildenbrand
2023-04-12 16:34                 ` Gregory Price
2023-04-14  4:16                   ` Dragan Stancevic
2023-04-14  3:33       ` Dragan Stancevic
2023-04-14  5:35         ` Huang, Ying
2023-04-09 17:40   ` Shreyas Shah
2023-04-11  1:08     ` Dragan Stancevic
2023-04-11  1:17       ` Shreyas Shah
2023-04-11  1:32         ` Dragan Stancevic
2023-04-11  4:33           ` Shreyas Shah
2023-04-14  3:26             ` Dragan Stancevic
2023-04-10  3:05   ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-10 17:46     ` [External] " Viacheslav A.Dubeyko
2023-04-14  3:27     ` Dragan Stancevic
2023-04-11 18:00   ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dave Hansen
2023-05-01 23:49     ` Dragan Stancevic [this message]
2023-04-11 18:16   ` RAGHU H
2023-05-09 15:08   ` Dragan Stancevic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9130d889-7cfe-9040-d887-380be67410d2@stancevic.com \
    --to=dragan@stancevic.com \
    --cc=dave.hansen@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=nil-migration@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox