From: Dragan Stancevic <dragan@stancevic.com>
To: Gregory Price <gregory.price@memverge.com>,
David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>,
lsf-pc@lists.linux-foundation.org, nil-migration@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
Date: Thu, 13 Apr 2023 23:16:20 -0500 [thread overview]
Message-ID: <30f254de-5bbb-bfb9-7321-62dc70db0ba9@stancevic.com> (raw)
In-Reply-To: <ZDbdnRE39Py4X4bT@memverge.com>
Hi Gregory-
On 4/12/23 11:34, Gregory Price wrote:
> On Wed, Apr 12, 2023 at 05:50:55PM +0200, David Hildenbrand wrote:
>>
>> long-term: possibly forever, controlled by user space. In practice, anything
>> longer than ~10 seconds ( best guess :) ). There can be long-term pinnings
>> that are of very short duration, we just don't know what user space is up to
>> and when it will decide to unpin.
>>
>> Assume user space requests to trigger read/write of a user space page to a
>> file: the page is pinned, DMA is started, once DMA completes the page is
>> unpinned. Short-term. User space does not control how long the page remains
>> pinned.
>>
>> In contrast:
>>
>> Example #1: mapping VM guest memory into an IOMMU using vfio for PCI
>> passthrough requires pinning the pages. Until user space decides to unmap
>> the pages from the IOMMU, the pages will remain pinned. -> long-term
>>
>> Example #2: mapping a user space address range into an IOMMU to repeatedly
>> perform RDMA using that address range requires pinning the pages. Until user
>> space decides to unregister that range, the pages remain pinned. ->
>> long-term
>>
>> Example #3: registering a user space address range with io_uring as a fixed
>> buffer, such that io_uring OPS can avoid the page table walks by simply
>> using the pinned pages that were looked up once. As long as the fixed buffer
>> remains registered, the pages stay pinned. -> long-term
>>
>> --
>> Thanks,
>>
>> David / dhildenb
>>
>
> That pretty much precludes live migration from using CXL as a transport
> mechanism, since live migration would be a user-initiated process, you
> would need what amounts to an atomic move between hosts to ensure pages
> are not left pinned.
Do you really need an atomic-in-between-hots? I mean, it's not really a
failure if you are in the process of migrating pages onto the switched
cxl memory memory and one of the pages is pulled out of cxl and back on
the hypervisor. The running VM cpu can do loads and stores from either.
So it's running, it's not affected. It's just that your migration is
potentially "stalled" or "canceled". You only encounter issues when all
your pages are on cxl and the other hypervisor is pulling pages out.
> The more i'm reading the more i'm somewhat convinced CXL memory should
> not allow pinning at all.
I think you want to be able to somehow pin the pages on one hypervisor
and unpin them on the other hypervisor. Or in some other way "pass
ownership" between the hypervisor. Right? Because of the scenario I
mention above, if your source hypervisor takes a page out of cxl, then
your destination hypervisor has a hole in VMs address space and can't
run it.
> I suppose you could implement a new RDMA feature where the remote host's
> CXL memory is temporarily mapped, data is migrated, and then that area
> is unmapped. Basically the exact same RDMA mechanism, but using memory
> instead of network. This would make the operation a kernel-controlled
> if pin/unpin is required.
That would move us from the shared memory in the CXL 3 spec into the
sections on direct memory placement I think. Which in order of
preference is a #2 for me personally and a "backup" plan if #1 shared
memory doesn't pan out.
> Lots to talk about.
>
> ~Gregory
>
--
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
next prev parent reply other threads:[~2023-04-14 4:16 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20230410030532epcas2p49eae675396bf81658c1a3401796da1d4@epcas2p4.samsung.com>
2023-04-07 21:05 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dragan Stancevic
2023-04-07 22:23 ` James Houghton
2023-04-07 23:17 ` David Rientjes
2023-04-08 1:33 ` Dragan Stancevic
2023-04-08 16:24 ` Dragan Stancevic
2023-04-08 0:05 ` Gregory Price
2023-04-11 0:56 ` Dragan Stancevic
2023-04-11 1:48 ` Gregory Price
2023-04-14 3:32 ` Dragan Stancevic
2023-04-14 13:16 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Jonathan Cameron
2023-04-11 6:37 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Huang, Ying
2023-04-11 15:36 ` Gregory Price
2023-04-12 2:54 ` Huang, Ying
2023-04-12 8:38 ` David Hildenbrand
2023-04-12 11:10 ` FW: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-12 11:26 ` David Hildenbrand
2023-04-14 8:41 ` Kyungsan Kim
2023-04-12 15:40 ` Matthew Wilcox
2023-04-14 8:41 ` Kyungsan Kim
2023-04-12 15:15 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory James Bottomley
2023-05-03 23:42 ` Dragan Stancevic
2023-04-12 15:26 ` Gregory Price
2023-04-12 15:50 ` David Hildenbrand
2023-04-12 16:34 ` Gregory Price
2023-04-14 4:16 ` Dragan Stancevic [this message]
2023-04-14 3:33 ` Dragan Stancevic
2023-04-14 5:35 ` Huang, Ying
2023-04-09 17:40 ` Shreyas Shah
2023-04-11 1:08 ` Dragan Stancevic
2023-04-11 1:17 ` Shreyas Shah
2023-04-11 1:32 ` Dragan Stancevic
2023-04-11 4:33 ` Shreyas Shah
2023-04-14 3:26 ` Dragan Stancevic
2023-04-10 3:05 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-10 17:46 ` [External] " Viacheslav A.Dubeyko
2023-04-14 3:27 ` Dragan Stancevic
2023-04-11 18:00 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dave Hansen
2023-05-01 23:49 ` Dragan Stancevic
2023-04-11 18:16 ` RAGHU H
2023-05-09 15:08 ` Dragan Stancevic
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=30f254de-5bbb-bfb9-7321-62dc70db0ba9@stancevic.com \
--to=dragan@stancevic.com \
--cc=david@redhat.com \
--cc=gregory.price@memverge.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=nil-migration@lists.linux.dev \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox