CXL 2.0 memory pooling emulation

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* CXL 2.0 memory pooling emulation
@ 2023-02-08 22:28 zhiting zhu
  2023-02-15 15:18 ` Jonathan Cameron via
  0 siblings, 1 reply; 12+ messages in thread
From: zhiting zhu @ 2023-02-08 22:28 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 350 bytes --]

Hi,

I saw a PoC:
https://lore.kernel.org/qemu-devel/20220525121422.00003a84@Huawei.com/T/ to
implement memory pooling and fabric manager on qemu. Is there any further
development on this? Can qemu emulate a memory pooling on a simple case
that two virtual machines connected to a CXL switch where some memory
devices are attached to?

Best,
Zhiting

[-- Attachment #2: Type: text/html, Size: 538 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CXL 2.0 memory pooling emulation
  2023-02-08 22:28 CXL 2.0 memory pooling emulation zhiting zhu
@ 2023-02-15 15:18 ` Jonathan Cameron via
  2023-02-15  9:10   ` Gregory Price
  2025-03-10  8:02   ` CXL memory pooling emulation inqury Junjie Fu
  0 siblings, 2 replies; 12+ messages in thread
From: Jonathan Cameron via @ 2023-02-15 15:18 UTC (permalink / raw)
  To: zhiting zhu; +Cc: qemu-devel, linux-cxl, Viacheslav A.Dubeyko

On Wed, 8 Feb 2023 16:28:44 -0600
zhiting zhu <zhitingz@cs.utexas.edu> wrote:

> Hi,
> 
> I saw a PoC:
> https://lore.kernel.org/qemu-devel/20220525121422.00003a84@Huawei.com/T/ to
> implement memory pooling and fabric manager on qemu. Is there any further
> development on this? Can qemu emulate a memory pooling on a simple case
> that two virtual machines connected to a CXL switch where some memory
> devices are attached to?
> 
> Best,
> Zhiting

Hi Zhiting,

+CC linux-cxl as it's not as much of a firehose as qemu-devel
+CC Slava who has been driving discussion around fabric management.
> 

No progress on that particular approach though some discussion on
what the FM architecture itself might look like.

https://lore.kernel.org/linux-cxl/7F001EAF-C512-436A-A9DD-E08730C91214@bytedance.com/

There was a sticky problem with doing MCTP over I2C which is that
there are very few I2C controllers that support the combination of
master and subordinate needed to do MCTP.  The one that was used for
that (aspeed) doesn't have ACPI bindings (and they are non trivial to
add due to clocks etc and likely to be controversial on kernel side
given I just want it for emulation!).  So far we don't have DT bindings for CXL
(either the CFMWS - CXL fixed memory windows or pxb-cxl - the host bridge)
I'll be sending out one of the precursors for that as an RFC soon.

So we are in the fun position that we can either emulate the comms path
to the devices, or we can emulate the host actually using the devices.
I was planning to get back to that eventually but we have other options
now CXL 3.0 has been published.

CXL 3.0 provides two paths forwards that let us test the equivalent
functionality with fewer moving parts.
1) CXL SWCCI which is an extra PCI function next to the switch upstream port
   that provides a mailbox that takes FM-API commands.
PoC Kernel code at:
https://lore.kernel.org/linux-cxl/20221025104243.20836-1-Jonathan.Cameron@huawei.com/
Latest branch in 
gitlab.com/jic23/qemu should have switch CCI emulation support. (branches
are dated) Note we have a lot of stuff outstanding, either out for review
or backed up behind things that are.
2) Multi Headed Devices.  These allow FM-API commands over the normal CXL
   mailbox.

I did a very basic PoC to see how this would fit in with the kernel side
of things but recently there has been far too much we need to enable in
the shorter term. 

Note though that there is a long way to go before we can do what you
want.  The steps I'd expect to see along the way:

1) Emulate an Multi Headed Device.
   Initially connect two heads to different host bridges on a single QEMU
   machine.  That lets us test most of the code flows without needing
   to handle tests that involve multiple machines.
   Later, we could add a means to connect between two instances of QEMU.
2) Add DCD support (we'll need the kernel side of that as well)
3) Wire it all up.
4) Do the same for a Switch with MLDs behind it so we can poke the fun
   corners.

Note that in common with memory emulation in general for CXL on QEMU
the need to do live address decoding will make performance terrible.
There are probably ways to improve that, but whilst we are at the stage
of trying to get as much functional as possible for testing purposes,
I'm not sure anyone will pursue those options.  May not make sense in
the longer term either.  I'm more than happy to offer suggestions
/ feedback on approaches to this and will get back to it myself
once some more pressing requirements are dealt with.

Jonathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CXL 2.0 memory pooling emulation
  2023-02-15 15:18 ` Jonathan Cameron via
@ 2023-02-15  9:10   ` Gregory Price
  2023-02-16 18:00     ` Jonathan Cameron via
  2025-03-10  8:02   ` CXL memory pooling emulation inqury Junjie Fu
  1 sibling, 1 reply; 12+ messages in thread
From: Gregory Price @ 2023-02-15  9:10 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: zhiting zhu, qemu-devel, linux-cxl, Viacheslav A.Dubeyko

On Wed, Feb 15, 2023 at 03:18:54PM +0000, Jonathan Cameron via wrote:
> On Wed, 8 Feb 2023 16:28:44 -0600
> zhiting zhu <zhitingz@cs.utexas.edu> wrote:
> 
> > Hi,
> > 
> > I saw a PoC:
> > https://lore.kernel.org/qemu-devel/20220525121422.00003a84@Huawei.com/T/ to
> > implement memory pooling and fabric manager on qemu. Is there any further
> > development on this? Can qemu emulate a memory pooling on a simple case
> > that two virtual machines connected to a CXL switch where some memory
> > devices are attached to?
> > 
> > Best,
> > Zhiting
> [... snip ...]
> 
> Note though that there is a long way to go before we can do what you
> want.  The steps I'd expect to see along the way:
> 
> 1) Emulate an Multi Headed Device.
>    Initially connect two heads to different host bridges on a single QEMU
>    machine.  That lets us test most of the code flows without needing
>    to handle tests that involve multiple machines.
>    Later, we could add a means to connect between two instances of QEMU.

I've been playing with this a bit.

Hackiest way to do this is to connect the same memory backend to two
type-3 devices, with the obvious caveat that the device state will not
be consistent between views.

But we could, for example, just put the relevant shared state into an
optional shared memory area instead of a normally allocated region.

i can imagine this looking something like

memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true
cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,shm_token=mytoken

then you can have multiple qemu instances hook their relevant devices up
to a a backend that points to the same file, and instantiate their
shared state in the region shmget(mytoken).

Additionally, these devices will require a set of what amounts to
vendor-specific mailbox commands - since the spec doesn't really define
what multi-headed devices "should do" to manage exclusivity.

Not sure if this would be upstream-worthy, or if we'd want to fork
mem/cxl-type3.c into like mem/cxl-type3-multihead.c or something.

The base type3 device is going to end up overloaded at some point i
think, and we'll want to look at trying to abstract it.

~Gregory

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CXL 2.0 memory pooling emulation
  2023-02-15  9:10   ` Gregory Price
@ 2023-02-16 18:00     ` Jonathan Cameron via
  2023-02-16 20:52       ` Gregory Price
  0 siblings, 1 reply; 12+ messages in thread
From: Jonathan Cameron via @ 2023-02-16 18:00 UTC (permalink / raw)
  To: Gregory Price; +Cc: zhiting zhu, qemu-devel, linux-cxl, Viacheslav A.Dubeyko

On Wed, 15 Feb 2023 04:10:20 -0500
Gregory Price <gregory.price@memverge.com> wrote:

> On Wed, Feb 15, 2023 at 03:18:54PM +0000, Jonathan Cameron via wrote:
> > On Wed, 8 Feb 2023 16:28:44 -0600
> > zhiting zhu <zhitingz@cs.utexas.edu> wrote:
> >   
> > > Hi,
> > > 
> > > I saw a PoC:
> > > https://lore.kernel.org/qemu-devel/20220525121422.00003a84@Huawei.com/T/ to
> > > implement memory pooling and fabric manager on qemu. Is there any further
> > > development on this? Can qemu emulate a memory pooling on a simple case
> > > that two virtual machines connected to a CXL switch where some memory
> > > devices are attached to?
> > > 
> > > Best,
> > > Zhiting  
> > [... snip ...]
> > 
> > Note though that there is a long way to go before we can do what you
> > want.  The steps I'd expect to see along the way:
> > 
> > 1) Emulate an Multi Headed Device.
> >    Initially connect two heads to different host bridges on a single QEMU
> >    machine.  That lets us test most of the code flows without needing
> >    to handle tests that involve multiple machines.
> >    Later, we could add a means to connect between two instances of QEMU.  
> 
> I've been playing with this a bit.
> 
> Hackiest way to do this is to connect the same memory backend to two
> type-3 devices, with the obvious caveat that the device state will not
> be consistent between views.
> 
> But we could, for example, just put the relevant shared state into an
> optional shared memory area instead of a normally allocated region.
> 
> i can imagine this looking something like
> 
> memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true
> cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,shm_token=mytoken
> 
> then you can have multiple qemu instances hook their relevant devices up
> to a a backend that points to the same file, and instantiate their
> shared state in the region shmget(mytoken).

That's not pretty.  For local instance I was thinking a primary device
which also has the FM-API tunneled access via mailbox, and secondary devices
that don't.  That would also apply to remote. The secondary device would
then just receive some control commands on what to expose up to it's host.
Not sure what convention on how to do that is in QEMU. Maybe a socket
interface like is done for swtpm? With some ordering constraints on startup.

> 
> Additionally, these devices will require a set of what amounts to
> vendor-specific mailbox commands - since the spec doesn't really define
> what multi-headed devices "should do" to manage exclusivity.

The device shouldn't manage exclusivity.  That's a job for the fabric
manager + DCD presentation of the memory with device enforcing some rules
+ if it supports some of the capacity adding types, it might need a
simple allocator.
If we need vendor specific commands then we need to take that to the
relevant body. I'm not sure what they would be though.

> 
> Not sure if this would be upstream-worthy, or if we'd want to fork
> mem/cxl-type3.c into like mem/cxl-type3-multihead.c or something.
> 
> The base type3 device is going to end up overloaded at some point i
> think, and we'll want to look at trying to abstract it.

Sure.  Though we might end up with the normal type3 implementation being
(optionally) the primary device for a MHD (the one with the FM-API
tunneling available on it's mailbox).  Would need a secondary
device though which you instantiate with a link to the primary one
or with a socket. (assuming primary opens socket as well).

Jonathan

> 
> ~Gregory



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CXL 2.0 memory pooling emulation
  2023-02-16 18:00     ` Jonathan Cameron via
@ 2023-02-16 20:52       ` Gregory Price
  2023-02-17 11:14         ` Jonathan Cameron via
  0 siblings, 1 reply; 12+ messages in thread
From: Gregory Price @ 2023-02-16 20:52 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: zhiting zhu, qemu-devel, linux-cxl, Viacheslav A.Dubeyko

On Thu, Feb 16, 2023 at 06:00:57PM +0000, Jonathan Cameron wrote:
> On Wed, 15 Feb 2023 04:10:20 -0500
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > On Wed, Feb 15, 2023 at 03:18:54PM +0000, Jonathan Cameron via wrote:
> > > On Wed, 8 Feb 2023 16:28:44 -0600
> > > zhiting zhu <zhitingz@cs.utexas.edu> wrote:
> > >   
> > > 1) Emulate an Multi Headed Device.
> > >    Initially connect two heads to different host bridges on a single QEMU
> > >    machine.  That lets us test most of the code flows without needing
> > >    to handle tests that involve multiple machines.
> > >    Later, we could add a means to connect between two instances of QEMU.  
> > 
> > Hackiest way to do this is to connect the same memory backend to two
> > type-3 devices, with the obvious caveat that the device state will not
> > be consistent between views.
> > 
> > But we could, for example, just put the relevant shared state into an
> > optional shared memory area instead of a normally allocated region.
> > 
> > i can imagine this looking something like
> > 
> > memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true
> > cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,shm_token=mytoken
> > 
> > then you can have multiple qemu instances hook their relevant devices up
> > to a a backend that points to the same file, and instantiate their
> > shared state in the region shmget(mytoken).
> 
> That's not pretty.  For local instance I was thinking a primary device
> which also has the FM-API tunneled access via mailbox, and secondary devices
> that don't.  That would also apply to remote. The secondary device would
> then just receive some control commands on what to expose up to it's host.
> Not sure what convention on how to do that is in QEMU. Maybe a socket
> interface like is done for swtpm? With some ordering constraints on startup.
> 

I agree, it's certainly "not pretty".

I'd go so far as to call the baby ugly :].  Like i said: "The Hackiest way"

My understanding from looking around at some road shows is that some
of these early multi-headed devices are basically just SLD's with multiple
heads. Most of these devices had to be developed well before DCD's and
therefore the FM-API were placed in the spec, and we haven't seen or
heard of any of these early devices having any form of switch yet.

I don't see how this type of device is feasible unless it's either statically
provisioned (change firmware settings from bios on reboot) or implements
custom firmware commands to implement some form of exclusivity controls over
memory regions.

The former makes it not really a useful pooling device, so I'm sorta guessing
we'll see most of these early devices implement custom commands.

I'm just not sure these early MHD's are going to have any real form of
FM-API, but it would still be nice to emulate them.

~Gregory


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CXL 2.0 memory pooling emulation
  2023-02-16 20:52       ` Gregory Price
@ 2023-02-17 11:14         ` Jonathan Cameron via
  2023-02-17 11:02           ` Gregory Price
  0 siblings, 1 reply; 12+ messages in thread
From: Jonathan Cameron via @ 2023-02-17 11:14 UTC (permalink / raw)
  To: Gregory Price; +Cc: zhiting zhu, qemu-devel, linux-cxl, Viacheslav A.Dubeyko

On Thu, 16 Feb 2023 15:52:31 -0500
Gregory Price <gregory.price@memverge.com> wrote:

> On Thu, Feb 16, 2023 at 06:00:57PM +0000, Jonathan Cameron wrote:
> > On Wed, 15 Feb 2023 04:10:20 -0500
> > Gregory Price <gregory.price@memverge.com> wrote:
> >   
> > > On Wed, Feb 15, 2023 at 03:18:54PM +0000, Jonathan Cameron via wrote:  
> > > > On Wed, 8 Feb 2023 16:28:44 -0600
> > > > zhiting zhu <zhitingz@cs.utexas.edu> wrote:
> > > >   
> > > > 1) Emulate an Multi Headed Device.
> > > >    Initially connect two heads to different host bridges on a single QEMU
> > > >    machine.  That lets us test most of the code flows without needing
> > > >    to handle tests that involve multiple machines.
> > > >    Later, we could add a means to connect between two instances of QEMU.    
> > > 
> > > Hackiest way to do this is to connect the same memory backend to two
> > > type-3 devices, with the obvious caveat that the device state will not
> > > be consistent between views.
> > > 
> > > But we could, for example, just put the relevant shared state into an
> > > optional shared memory area instead of a normally allocated region.
> > > 
> > > i can imagine this looking something like
> > > 
> > > memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true
> > > cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,shm_token=mytoken
> > > 
> > > then you can have multiple qemu instances hook their relevant devices up
> > > to a a backend that points to the same file, and instantiate their
> > > shared state in the region shmget(mytoken).  
> > 
> > That's not pretty.  For local instance I was thinking a primary device
> > which also has the FM-API tunneled access via mailbox, and secondary devices
> > that don't.  That would also apply to remote. The secondary device would
> > then just receive some control commands on what to expose up to it's host.
> > Not sure what convention on how to do that is in QEMU. Maybe a socket
> > interface like is done for swtpm? With some ordering constraints on startup.
> >   
> 
> I agree, it's certainly "not pretty".
> 
> I'd go so far as to call the baby ugly :].  Like i said: "The Hackiest way"
> 
> My understanding from looking around at some road shows is that some
> of these early multi-headed devices are basically just SLD's with multiple
> heads. Most of these devices had to be developed well before DCD's and
> therefore the FM-API were placed in the spec, and we haven't seen or
> heard of any of these early devices having any form of switch yet.
> 
> I don't see how this type of device is feasible unless it's either statically
> provisioned (change firmware settings from bios on reboot) or implements
> custom firmware commands to implement some form of exclusivity controls over
> memory regions.
> 
> The former makes it not really a useful pooling device, so I'm sorta guessing
> we'll see most of these early devices implement custom commands.
> 
> I'm just not sure these early MHD's are going to have any real form of
> FM-API, but it would still be nice to emulate them.
> 
Makes sense.  I'd be fine with adding any necessary hooks to allow that
in the QEMU emulation, but probably not upstreaming the custom stuff.

Jonathan

> ~Gregory



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CXL 2.0 memory pooling emulation
  2023-02-17 11:14         ` Jonathan Cameron via
@ 2023-02-17 11:02           ` Gregory Price
  0 siblings, 0 replies; 12+ messages in thread
From: Gregory Price @ 2023-02-17 11:02 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: zhiting zhu, qemu-devel, linux-cxl, Viacheslav A.Dubeyko

On Fri, Feb 17, 2023 at 11:14:18AM +0000, Jonathan Cameron wrote:
> On Thu, 16 Feb 2023 15:52:31 -0500
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > 
> > I agree, it's certainly "not pretty".
> > 
> > I'd go so far as to call the baby ugly :].  Like i said: "The Hackiest way"
> > 
> > My understanding from looking around at some road shows is that some
> > of these early multi-headed devices are basically just SLD's with multiple
> > heads. Most of these devices had to be developed well before DCD's and
> > therefore the FM-API were placed in the spec, and we haven't seen or
> > heard of any of these early devices having any form of switch yet.
> > 
> > I don't see how this type of device is feasible unless it's either statically
> > provisioned (change firmware settings from bios on reboot) or implements
> > custom firmware commands to implement some form of exclusivity controls over
> > memory regions.
> > 
> > The former makes it not really a useful pooling device, so I'm sorta guessing
> > we'll see most of these early devices implement custom commands.
> > 
> > I'm just not sure these early MHD's are going to have any real form of
> > FM-API, but it would still be nice to emulate them.
> > 
> Makes sense.  I'd be fine with adding any necessary hooks to allow that
> in the QEMU emulation, but probably not upstreaming the custom stuff.
> 
> Jonathan
> 

I'll have to give it some thought.  The "custom stuff" is mostly init
code, mailbox commands, and the fields those mailbox commands twiddle.

I guess we could create a wrapper-device that hooks raw commands?  Is
that what raw commands are intended for? Notably the kernel has to be
compiled with raw command support, which is disabled by default, but
that's fine.

Dunno, spitballing, but i'm a couple days away from a first pass at a
MHD, though I'll need to spend quite a bit of time cleaning it up before
i can push an RFC.

~Gregory


^ permalink raw reply	[flat|nested] 12+ messages in thread

* CXL memory pooling emulation inqury
  2023-02-15 15:18 ` Jonathan Cameron via
  2023-02-15  9:10   ` Gregory Price
@ 2025-03-10  8:02   ` Junjie Fu
  2025-03-12 18:05     ` Jonathan Cameron via
  1 sibling, 1 reply; 12+ messages in thread
From: Junjie Fu @ 2025-03-10  8:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jonathan.Cameron, linux-cxl, viacheslav.dubeyko, zhitingz

> Note though that there is a long way to go before we can do what you
> want.  The steps I'd expect to see along the way:
>
> 1) Emulate an Multi Headed Device.
>    Initially connect two heads to different host bridges on a single QEMU
>    machine.  That lets us test most of the code flows without needing
>    to handle tests that involve multiple machines.
>    Later, we could add a means to connect between two instances of QEMU.
> 2) Add DCD support (we'll need the kernel side of that as well)
> 3) Wire it all up.
> 4) Do the same for a Switch with MLDs behind it so we can poke the fun
>    corners.


Hi,Jonathan

Given your previous exploration, I would like to ask the following questions:
1.Does QEMU currently support simulating the above CXL memory pooling scenario?

2.If not fully supported yet, are there any available development branches 
or patches that implement this functionality?

3.Are there any guidelines or considerations for configuring and testing CXL memory pooling in QEMU?

I sincerely appreciate your time and guidance on this topic!



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CXL memory pooling emulation inqury
  2025-03-10  8:02   ` CXL memory pooling emulation inqury Junjie Fu
@ 2025-03-12 18:05     ` Jonathan Cameron via
  2025-03-12 19:33       ` Gregory Price
  0 siblings, 1 reply; 12+ messages in thread
From: Jonathan Cameron via @ 2025-03-12 18:05 UTC (permalink / raw)
  To: Junjie Fu
  Cc: qemu-devel, linux-cxl, viacheslav.dubeyko, zhitingz,
	Gregory Price, svetly.todorov

On Mon, 10 Mar 2025 16:02:45 +0800
Junjie Fu <fujunjie1@qq.com> wrote:

> > Note though that there is a long way to go before we can do what you
> > want.  The steps I'd expect to see along the way:
> >
> > 1) Emulate an Multi Headed Device.
> >    Initially connect two heads to different host bridges on a single QEMU
> >    machine.  That lets us test most of the code flows without needing
> >    to handle tests that involve multiple machines.
> >    Later, we could add a means to connect between two instances of QEMU.
> > 2) Add DCD support (we'll need the kernel side of that as well)
> > 3) Wire it all up.
> > 4) Do the same for a Switch with MLDs behind it so we can poke the fun
> >    corners.  
> 
> 
> Hi,Jonathan
> 
> Given your previous exploration, I would like to ask the following questions:
> 1.Does QEMU currently support simulating the above CXL memory pooling scenario?

Not in upstream yet but Gregory posted emulation support last year.
https://lore.kernel.org/qemu-devel/20241018161252.8896-1-gourry@gourry.net/
I'm carrying the patches on my staging tree.

https://gitlab.com/jic23/qemu/-/commits/cxl-2025-02-20?ref_type=heads

Longer term I remain a little unconvinced by whether this is the best approach
because I also want a single management path (so fake CCI etc) and that may
need to be exposed to one of the hosts for tests purposes.  In the current
approach commands are issued to each host directly to surface memory.

> 
> 2.If not fully supported yet, are there any available development branches 
> or patches that implement this functionality?
> 
> 3.Are there any guidelines or considerations for configuring and testing CXL memory pooling in QEMU?

There is some information in that patch series cover letter.

+CC Gregory and Svelty.
> 
> I sincerely appreciate your time and guidance on this topic!
> 
No problem.

Jonathan




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CXL memory pooling emulation inqury
  2025-03-12 18:05     ` Jonathan Cameron via
@ 2025-03-12 19:33       ` Gregory Price
  2025-03-13 16:03         ` Fan Ni
  2025-04-08  4:47         ` Fan Ni
  0 siblings, 2 replies; 12+ messages in thread
From: Gregory Price @ 2025-03-12 19:33 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Junjie Fu, qemu-devel, linux-cxl, viacheslav.dubeyko, zhitingz,
	svetly.todorov

On Wed, Mar 12, 2025 at 06:05:43PM +0000, Jonathan Cameron wrote:
> 
> Longer term I remain a little unconvinced by whether this is the best approach
> because I also want a single management path (so fake CCI etc) and that may
> need to be exposed to one of the hosts for tests purposes.  In the current
> approach commands are issued to each host directly to surface memory.
>

Lets say we implement this

  -----------         -----------
  |  Host 1 |         | Host 2  |
  |    |    |         |         |
  |    v    |   Add   |         |
  |   CCI   | ------> | Evt Log |
  -----------         -----------
                 ^ 
	    What mechanism
	   do you use here?

And how does it not just replicate QMP logic?

Not arguing against it, I just see what amounts to more code than
required to test the functionality.  QMP fits the bill so split the CCI
interface for single-host management testing and the MHSLD interface.

Why not leave the 1-node DCD with inbound CCI interface for testing and
leave QMP interface for development of a reference fabric manager
outside the scope of another host?

TL;DR:  :[ distributed systems are hard to test

> > 
> > 2.If not fully supported yet, are there any available development branches 
> > or patches that implement this functionality?
> > 
> > 3.Are there any guidelines or considerations for configuring and testing CXL memory pooling in QEMU?
> 
> There is some information in that patch series cover letter.
>

The attached series implements an MHSLD, but implementing the pooling
mechanism (i.e. fabric manager logic) is left to the imagination of the
reader.   You will want to look at Fan Ni's DCD patch set to understand
the QMP Add/Remove logic for DCD capacity.  This patch set just enables
you to manage 2+ QEMU Guests sharing a DCD State in shared memory.

So you'll have to send DCD commands individual guest QEMU via QMP, but
the underlying logic manages the shared state via locks to emulate real
MHSLD behavior.
                     QMP|---> Host 1 --------v
               [FM]-----|              [Shared State]
	             QMP|---> Host 2 --------^

This differs from a real DCD in that a real DCD is a single endpoint for
management, rather than N endpoints (1 per vm).

                                  |---> Host 1
                [FM] ---> [DCD] --|
		                  |---> Host 2

However this is an implementation detail on the FM side, so I chose to
do it this way to simplify the QEMU MHSLD implementation.  There's far
fewer interactions this way - with the downside that having one of the
hosts manage the shared state isn't possible via the current emulation.

It could probably be done, but I'm not sure what value it has since the
FM implementation difference is a matter a small amount of python.

It's been a while since I played with this patch set and I do not have a
reference pooling manager available to me any longer unfortunately. But
I'm happy to provide some guidance where I can.

~Gregory

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CXL memory pooling emulation inqury
  2025-03-12 19:33       ` Gregory Price
@ 2025-03-13 16:03         ` Fan Ni
  2025-04-08  4:47         ` Fan Ni
  1 sibling, 0 replies; 12+ messages in thread
From: Fan Ni @ 2025-03-13 16:03 UTC (permalink / raw)
  To: Gregory Price
  Cc: Jonathan Cameron, Junjie Fu, qemu-devel, linux-cxl,
	viacheslav.dubeyko, zhitingz, svetly.todorov, a.manzanares,
	fan.ni, anisa.su887, dave

On Wed, Mar 12, 2025 at 03:33:12PM -0400, Gregory Price wrote:
> On Wed, Mar 12, 2025 at 06:05:43PM +0000, Jonathan Cameron wrote:
> > 
> > Longer term I remain a little unconvinced by whether this is the best approach
> > because I also want a single management path (so fake CCI etc) and that may
> > need to be exposed to one of the hosts for tests purposes.  In the current
> > approach commands are issued to each host directly to surface memory.
> >
> 
> Lets say we implement this
> 
>   -----------         -----------
>   |  Host 1 |         | Host 2  |
>   |    |    |         |         |
>   |    v    |   Add   |         |
>   |   CCI   | ------> | Evt Log |
>   -----------         -----------
>                  ^ 
> 	    What mechanism
> 	   do you use here?
> 
> And how does it not just replicate QMP logic?
> 
> Not arguing against it, I just see what amounts to more code than
> required to test the functionality.  QMP fits the bill so split the CCI
> interface for single-host management testing and the MHSLD interface.

We have recently discussed the approach internally. Our idea is to do
something similar to what you have done with MHSLD emulation, use shmem
dev to share information (mailbox???) between the two devices. 
> 
> Why not leave the 1-node DCD with inbound CCI interface for testing and
> leave QMP interface for development of a reference fabric manager
> outside the scope of another host?

For this two hosts setup, for now I can see benefits,
the two hosts can have different kernel, that is to say, the one served
as FM only need to support for exmaple out-of-band communication with
the hardware (MCTP over i2c), and do not need to evolve with what we
want to test on the target host (boot with kernel with features we care).
That is very important at least for test purpose, as mctp over i2c
support for x86 support is not upstreamed yet, we do not want to rebase
whenever the kernel is updated.

More speficially, let's say, we deploy libcxlmi test framework on the FM
host, and then we can test the target host whatever features needed to
test (DCD etc). Again the FM host does not need to have dcd kernel support.

Compared to qmp interface, since libcxlmi already supports a lot of
commands and more commands are being included. It should be much more
convinient than implementing them with qmp interface.

Fan


> 
> TL;DR:  :[ distributed systems are hard to test
> 
> > > 
> > > 2.If not fully supported yet, are there any available development branches 
> > > or patches that implement this functionality?
> > > 
> > > 3.Are there any guidelines or considerations for configuring and testing CXL memory pooling in QEMU?
> > 
> > There is some information in that patch series cover letter.
> >
> 
> The attached series implements an MHSLD, but implementing the pooling
> mechanism (i.e. fabric manager logic) is left to the imagination of the
> reader.   You will want to look at Fan Ni's DCD patch set to understand
> the QMP Add/Remove logic for DCD capacity.  This patch set just enables
> you to manage 2+ QEMU Guests sharing a DCD State in shared memory.
> 
> So you'll have to send DCD commands individual guest QEMU via QMP, but
> the underlying logic manages the shared state via locks to emulate real
> MHSLD behavior.
>                      QMP|---> Host 1 --------v
>                [FM]-----|              [Shared State]
> 	             QMP|---> Host 2 --------^
> 
> This differs from a real DCD in that a real DCD is a single endpoint for
> management, rather than N endpoints (1 per vm).
> 
>                                   |---> Host 1
>                 [FM] ---> [DCD] --|
> 		                  |---> Host 2
> 
> However this is an implementation detail on the FM side, so I chose to
> do it this way to simplify the QEMU MHSLD implementation.  There's far
> fewer interactions this way - with the downside that having one of the
> hosts manage the shared state isn't possible via the current emulation.
> 
> It could probably be done, but I'm not sure what value it has since the
> FM implementation difference is a matter a small amount of python.
> 
> It's been a while since I played with this patch set and I do not have a
> reference pooling manager available to me any longer unfortunately. But
> I'm happy to provide some guidance where I can.
> 
> ~Gregory


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CXL memory pooling emulation inqury
  2025-03-12 19:33       ` Gregory Price
  2025-03-13 16:03         ` Fan Ni
@ 2025-04-08  4:47         ` Fan Ni
  1 sibling, 0 replies; 12+ messages in thread
From: Fan Ni @ 2025-04-08  4:47 UTC (permalink / raw)
  To: Gregory Price
  Cc: Jonathan Cameron, Junjie Fu, qemu-devel, linux-cxl,
	viacheslav.dubeyko, zhitingz, svetly.todorov

On Wed, Mar 12, 2025 at 03:33:12PM -0400, Gregory Price wrote:
> On Wed, Mar 12, 2025 at 06:05:43PM +0000, Jonathan Cameron wrote:
> > 
> > Longer term I remain a little unconvinced by whether this is the best approach
> > because I also want a single management path (so fake CCI etc) and that may
> > need to be exposed to one of the hosts for tests purposes.  In the current
> > approach commands are issued to each host directly to surface memory.
> >
> 
> Lets say we implement this
> 
>   -----------         -----------
>   |  Host 1 |         | Host 2  |
>   |    |    |         |         |
>   |    v    |   Add   |         |
>   |   CCI   | ------> | Evt Log |
>   -----------         -----------
>                  ^ 
> 	    What mechanism
> 	   do you use here?
> 
> And how does it not just replicate QMP logic?
> 
> Not arguing against it, I just see what amounts to more code than
> required to test the functionality.  QMP fits the bill so split the CCI
> interface for single-host management testing and the MHSLD interface.
> 
> Why not leave the 1-node DCD with inbound CCI interface for testing and
> leave QMP interface for development of a reference fabric manager
> outside the scope of another host?

Hi Gregory,

FYI. Just posted a RFC for FM emulation, the approach used does not need
to replicate QMP logic, but indeed we use one QMP to notify host2 for a
in-coming MCTP message.
https://lore.kernel.org/linux-cxl/20250408043051.430340-1-nifan.cxl@gmail.com/

Fan

> 
> TL;DR:  :[ distributed systems are hard to test
> 
> > > 
> > > 2.If not fully supported yet, are there any available development branches 
> > > or patches that implement this functionality?
> > > 
> > > 3.Are there any guidelines or considerations for configuring and testing CXL memory pooling in QEMU?
> > 
> > There is some information in that patch series cover letter.
> >
> 
> The attached series implements an MHSLD, but implementing the pooling
> mechanism (i.e. fabric manager logic) is left to the imagination of the
> reader.   You will want to look at Fan Ni's DCD patch set to understand
> the QMP Add/Remove logic for DCD capacity.  This patch set just enables
> you to manage 2+ QEMU Guests sharing a DCD State in shared memory.
> 
> So you'll have to send DCD commands individual guest QEMU via QMP, but
> the underlying logic manages the shared state via locks to emulate real
> MHSLD behavior.
>                      QMP|---> Host 1 --------v
>                [FM]-----|              [Shared State]
> 	             QMP|---> Host 2 --------^
> 
> This differs from a real DCD in that a real DCD is a single endpoint for
> management, rather than N endpoints (1 per vm).
> 
>                                   |---> Host 1
>                 [FM] ---> [DCD] --|
> 		                  |---> Host 2
> 
> However this is an implementation detail on the FM side, so I chose to
> do it this way to simplify the QEMU MHSLD implementation.  There's far
> fewer interactions this way - with the downside that having one of the
> hosts manage the shared state isn't possible via the current emulation.
> 
> It could probably be done, but I'm not sure what value it has since the
> FM implementation difference is a matter a small amount of python.
> 
> It's been a while since I played with this patch set and I do not have a
> reference pooling manager available to me any longer unfortunately. But
> I'm happy to provide some guidance where I can.
> 
> ~Gregory


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-04-08  4:48 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-08 22:28 CXL 2.0 memory pooling emulation zhiting zhu
2023-02-15 15:18 ` Jonathan Cameron via
2023-02-15  9:10   ` Gregory Price
2023-02-16 18:00     ` Jonathan Cameron via
2023-02-16 20:52       ` Gregory Price
2023-02-17 11:14         ` Jonathan Cameron via
2023-02-17 11:02           ` Gregory Price
2025-03-10  8:02   ` CXL memory pooling emulation inqury Junjie Fu
2025-03-12 18:05     ` Jonathan Cameron via
2025-03-12 19:33       ` Gregory Price
2025-03-13 16:03         ` Fan Ni
2025-04-08  4:47         ` Fan Ni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).