From: Dragan Stancevic <dragan@stancevic.com>
To: Dave Hansen <dave.hansen@intel.com>, lsf-pc@lists.linux-foundation.org
Cc: nil-migration@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
Date: Mon, 1 May 2023 18:49:12 -0500 [thread overview]
Message-ID: <9130d889-7cfe-9040-d887-380be67410d2@stancevic.com> (raw)
In-Reply-To: <14a601ea-8cf8-bb9c-a87a-63567c5aba5b@intel.com>
Hi Dave-
sorry, looks like I've missed your email
On 4/11/23 13:00, Dave Hansen wrote:
> On 4/7/23 14:05, Dragan Stancevic wrote:
>> I'd be interested in doing a small BoF session with some slides and get
>> into a discussion/brainstorming with other people that deal with VM/LM
>> cloud loads. Among other things to discuss would be page migrations over
>> switched CXL memory, shared in-memory ABI to allow VM hand-off between
>> hypervisors, etc...
>
> How would 'struct page' or other kernel metadata be handled?
>
> I assume you'd want a really big CXL memory device with as many hosts
> connected to it as is feasible. But, in order to hand the memory off
> from one host to another, both would need to have metadata for it at
> _some_ point.
To be honest, I have not been thinking of this in terms of a "star"
connection topology. Where say each host in a rack connects to the same
memory device, I think I'd get bottle-necked on a singular device. Evac
of a few hypervisors simultaneously might get a bit dicey.
I've been thinking of it more in terms of multiple memory devices per
rack, connected to various hypervisors to form a hypervisor traversal
graph[1]. For example in this graph, a VM would migrate across a single
hop, or a few hops to reach it's destination hypervisor. And for the
lack of better word, this would be your "migration namespace" to migrate
the VM across the rack. The critical connections in the graph are
hostfoo04 and hostfoo09, and those you'd use if you want to pop the VM
into a different "migration namespace", for example a different rack or
maybe even a pod.
Of course, this is quite a ways out since there are no CXL 3.0 devices
yet. As a first step I would like to get to a point where I can emulate
this with qemu and just prototype various approaches, but starting with
a single emulated memory device and two hosts.
> So, do all hosts have metadata for the whole CXL memory device all the
> time? Or, would they create the metadata (hotplug) when a VM is
> migrated in and destroy it (hot unplug) when a VM is migrated out?
To be honest I have not thought about hot plugging, but might be
something for me to keep in mind and ponder about it. And if you have
additional thoughts on this I'd love to hear them.
What I was thinking, and this may or may not be possible, or may be
possible only to a certain extent, but my preference would be to keep as
much of the metadata as possible on the memory device itself and have
the hypervisors cooperate through some kind of ownership mechanism.
> That gets back to the granularity question discussed elsewhere in the
> thread. How would the metadata allocation granularity interact with the
> page allocation granularity? How would fragmentation be avoided so that
> hosts don't eat up all their RAM with unused metadata?
Yeah, this is something I am still running through my head. Even if we
have this "ownership-cooperation", is this based on pages, what happens
to the sub-page allocations, do we move them through the buckets or do
we attach ownership to sub-page allocations too. In my ideal world,
you'd have two hypervisors cooperate over this memory as transparently
as CPUs in a single system collaborating across NUMA nodes. A lot to
think about, many problems to solve and a lot of work to do. I don't
have all the answers yet, but value all input & help
[1]. https://nil-migration.org/VM-Graph.png
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
next prev parent reply other threads:[~2023-05-01 23:51 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20230410030532epcas2p49eae675396bf81658c1a3401796da1d4@epcas2p4.samsung.com>
2023-04-07 21:05 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dragan Stancevic
2023-04-07 22:23 ` James Houghton
2023-04-07 23:17 ` David Rientjes
2023-04-08 1:33 ` Dragan Stancevic
2023-04-08 16:24 ` Dragan Stancevic
2023-04-08 0:05 ` Gregory Price
2023-04-11 0:56 ` Dragan Stancevic
2023-04-11 1:48 ` Gregory Price
2023-04-14 3:32 ` Dragan Stancevic
2023-04-14 13:16 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Jonathan Cameron
2023-04-11 6:37 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Huang, Ying
2023-04-11 15:36 ` Gregory Price
2023-04-12 2:54 ` Huang, Ying
2023-04-12 8:38 ` David Hildenbrand
2023-04-12 11:10 ` FW: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-12 11:26 ` David Hildenbrand
2023-04-14 8:41 ` Kyungsan Kim
2023-04-12 15:40 ` Matthew Wilcox
2023-04-14 8:41 ` Kyungsan Kim
2023-04-12 15:15 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory James Bottomley
2023-05-03 23:42 ` Dragan Stancevic
2023-04-12 15:26 ` Gregory Price
2023-04-12 15:50 ` David Hildenbrand
2023-04-12 16:34 ` Gregory Price
2023-04-14 4:16 ` Dragan Stancevic
2023-04-14 3:33 ` Dragan Stancevic
2023-04-14 5:35 ` Huang, Ying
2023-04-09 17:40 ` Shreyas Shah
2023-04-11 1:08 ` Dragan Stancevic
2023-04-11 1:17 ` Shreyas Shah
2023-04-11 1:32 ` Dragan Stancevic
2023-04-11 4:33 ` Shreyas Shah
2023-04-14 3:26 ` Dragan Stancevic
2023-04-10 3:05 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-10 17:46 ` [External] " Viacheslav A.Dubeyko
2023-04-14 3:27 ` Dragan Stancevic
2023-04-11 18:00 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dave Hansen
2023-05-01 23:49 ` Dragan Stancevic [this message]
2023-04-11 18:16 ` RAGHU H
2023-05-09 15:08 ` Dragan Stancevic
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9130d889-7cfe-9040-d887-380be67410d2@stancevic.com \
--to=dragan@stancevic.com \
--cc=dave.hansen@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=nil-migration@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox