* CXL 2.0 memory pooling emulation @ 2023-02-08 22:28 zhiting zhu 2023-02-15 15:18 ` Jonathan Cameron via 0 siblings, 1 reply; 12+ messages in thread From: zhiting zhu @ 2023-02-08 22:28 UTC (permalink / raw) To: qemu-devel [-- Attachment #1: Type: text/plain, Size: 350 bytes --] Hi, I saw a PoC: https://lore.kernel.org/qemu-devel/20220525121422.00003a84@Huawei.com/T/ to implement memory pooling and fabric manager on qemu. Is there any further development on this? Can qemu emulate a memory pooling on a simple case that two virtual machines connected to a CXL switch where some memory devices are attached to? Best, Zhiting [-- Attachment #2: Type: text/html, Size: 538 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CXL 2.0 memory pooling emulation 2023-02-08 22:28 CXL 2.0 memory pooling emulation zhiting zhu @ 2023-02-15 15:18 ` Jonathan Cameron via 2023-02-15 9:10 ` Gregory Price 2025-03-10 8:02 ` CXL memory pooling emulation inqury Junjie Fu 0 siblings, 2 replies; 12+ messages in thread From: Jonathan Cameron via @ 2023-02-15 15:18 UTC (permalink / raw) To: zhiting zhu; +Cc: qemu-devel, linux-cxl, Viacheslav A.Dubeyko On Wed, 8 Feb 2023 16:28:44 -0600 zhiting zhu <zhitingz@cs.utexas.edu> wrote: > Hi, > > I saw a PoC: > https://lore.kernel.org/qemu-devel/20220525121422.00003a84@Huawei.com/T/ to > implement memory pooling and fabric manager on qemu. Is there any further > development on this? Can qemu emulate a memory pooling on a simple case > that two virtual machines connected to a CXL switch where some memory > devices are attached to? > > Best, > Zhiting Hi Zhiting, +CC linux-cxl as it's not as much of a firehose as qemu-devel +CC Slava who has been driving discussion around fabric management. > No progress on that particular approach though some discussion on what the FM architecture itself might look like. https://lore.kernel.org/linux-cxl/7F001EAF-C512-436A-A9DD-E08730C91214@bytedance.com/ There was a sticky problem with doing MCTP over I2C which is that there are very few I2C controllers that support the combination of master and subordinate needed to do MCTP. The one that was used for that (aspeed) doesn't have ACPI bindings (and they are non trivial to add due to clocks etc and likely to be controversial on kernel side given I just want it for emulation!). So far we don't have DT bindings for CXL (either the CFMWS - CXL fixed memory windows or pxb-cxl - the host bridge) I'll be sending out one of the precursors for that as an RFC soon. So we are in the fun position that we can either emulate the comms path to the devices, or we can emulate the host actually using the devices. I was planning to get back to that eventually but we have other options now CXL 3.0 has been published. CXL 3.0 provides two paths forwards that let us test the equivalent functionality with fewer moving parts. 1) CXL SWCCI which is an extra PCI function next to the switch upstream port that provides a mailbox that takes FM-API commands. PoC Kernel code at: https://lore.kernel.org/linux-cxl/20221025104243.20836-1-Jonathan.Cameron@huawei.com/ Latest branch in gitlab.com/jic23/qemu should have switch CCI emulation support. (branches are dated) Note we have a lot of stuff outstanding, either out for review or backed up behind things that are. 2) Multi Headed Devices. These allow FM-API commands over the normal CXL mailbox. I did a very basic PoC to see how this would fit in with the kernel side of things but recently there has been far too much we need to enable in the shorter term. Note though that there is a long way to go before we can do what you want. The steps I'd expect to see along the way: 1) Emulate an Multi Headed Device. Initially connect two heads to different host bridges on a single QEMU machine. That lets us test most of the code flows without needing to handle tests that involve multiple machines. Later, we could add a means to connect between two instances of QEMU. 2) Add DCD support (we'll need the kernel side of that as well) 3) Wire it all up. 4) Do the same for a Switch with MLDs behind it so we can poke the fun corners. Note that in common with memory emulation in general for CXL on QEMU the need to do live address decoding will make performance terrible. There are probably ways to improve that, but whilst we are at the stage of trying to get as much functional as possible for testing purposes, I'm not sure anyone will pursue those options. May not make sense in the longer term either. I'm more than happy to offer suggestions / feedback on approaches to this and will get back to it myself once some more pressing requirements are dealt with. Jonathan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CXL 2.0 memory pooling emulation 2023-02-15 15:18 ` Jonathan Cameron via @ 2023-02-15 9:10 ` Gregory Price 2023-02-16 18:00 ` Jonathan Cameron via 2025-03-10 8:02 ` CXL memory pooling emulation inqury Junjie Fu 1 sibling, 1 reply; 12+ messages in thread From: Gregory Price @ 2023-02-15 9:10 UTC (permalink / raw) To: Jonathan Cameron; +Cc: zhiting zhu, qemu-devel, linux-cxl, Viacheslav A.Dubeyko On Wed, Feb 15, 2023 at 03:18:54PM +0000, Jonathan Cameron via wrote: > On Wed, 8 Feb 2023 16:28:44 -0600 > zhiting zhu <zhitingz@cs.utexas.edu> wrote: > > > Hi, > > > > I saw a PoC: > > https://lore.kernel.org/qemu-devel/20220525121422.00003a84@Huawei.com/T/ to > > implement memory pooling and fabric manager on qemu. Is there any further > > development on this? Can qemu emulate a memory pooling on a simple case > > that two virtual machines connected to a CXL switch where some memory > > devices are attached to? > > > > Best, > > Zhiting > [... snip ...] > > Note though that there is a long way to go before we can do what you > want. The steps I'd expect to see along the way: > > 1) Emulate an Multi Headed Device. > Initially connect two heads to different host bridges on a single QEMU > machine. That lets us test most of the code flows without needing > to handle tests that involve multiple machines. > Later, we could add a means to connect between two instances of QEMU. I've been playing with this a bit. Hackiest way to do this is to connect the same memory backend to two type-3 devices, with the obvious caveat that the device state will not be consistent between views. But we could, for example, just put the relevant shared state into an optional shared memory area instead of a normally allocated region. i can imagine this looking something like memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,shm_token=mytoken then you can have multiple qemu instances hook their relevant devices up to a a backend that points to the same file, and instantiate their shared state in the region shmget(mytoken). Additionally, these devices will require a set of what amounts to vendor-specific mailbox commands - since the spec doesn't really define what multi-headed devices "should do" to manage exclusivity. Not sure if this would be upstream-worthy, or if we'd want to fork mem/cxl-type3.c into like mem/cxl-type3-multihead.c or something. The base type3 device is going to end up overloaded at some point i think, and we'll want to look at trying to abstract it. ~Gregory ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CXL 2.0 memory pooling emulation 2023-02-15 9:10 ` Gregory Price @ 2023-02-16 18:00 ` Jonathan Cameron via 2023-02-16 20:52 ` Gregory Price 0 siblings, 1 reply; 12+ messages in thread From: Jonathan Cameron via @ 2023-02-16 18:00 UTC (permalink / raw) To: Gregory Price; +Cc: zhiting zhu, qemu-devel, linux-cxl, Viacheslav A.Dubeyko On Wed, 15 Feb 2023 04:10:20 -0500 Gregory Price <gregory.price@memverge.com> wrote: > On Wed, Feb 15, 2023 at 03:18:54PM +0000, Jonathan Cameron via wrote: > > On Wed, 8 Feb 2023 16:28:44 -0600 > > zhiting zhu <zhitingz@cs.utexas.edu> wrote: > > > > > Hi, > > > > > > I saw a PoC: > > > https://lore.kernel.org/qemu-devel/20220525121422.00003a84@Huawei.com/T/ to > > > implement memory pooling and fabric manager on qemu. Is there any further > > > development on this? Can qemu emulate a memory pooling on a simple case > > > that two virtual machines connected to a CXL switch where some memory > > > devices are attached to? > > > > > > Best, > > > Zhiting > > [... snip ...] > > > > Note though that there is a long way to go before we can do what you > > want. The steps I'd expect to see along the way: > > > > 1) Emulate an Multi Headed Device. > > Initially connect two heads to different host bridges on a single QEMU > > machine. That lets us test most of the code flows without needing > > to handle tests that involve multiple machines. > > Later, we could add a means to connect between two instances of QEMU. > > I've been playing with this a bit. > > Hackiest way to do this is to connect the same memory backend to two > type-3 devices, with the obvious caveat that the device state will not > be consistent between views. > > But we could, for example, just put the relevant shared state into an > optional shared memory area instead of a normally allocated region. > > i can imagine this looking something like > > memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true > cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,shm_token=mytoken > > then you can have multiple qemu instances hook their relevant devices up > to a a backend that points to the same file, and instantiate their > shared state in the region shmget(mytoken). That's not pretty. For local instance I was thinking a primary device which also has the FM-API tunneled access via mailbox, and secondary devices that don't. That would also apply to remote. The secondary device would then just receive some control commands on what to expose up to it's host. Not sure what convention on how to do that is in QEMU. Maybe a socket interface like is done for swtpm? With some ordering constraints on startup. > > Additionally, these devices will require a set of what amounts to > vendor-specific mailbox commands - since the spec doesn't really define > what multi-headed devices "should do" to manage exclusivity. The device shouldn't manage exclusivity. That's a job for the fabric manager + DCD presentation of the memory with device enforcing some rules + if it supports some of the capacity adding types, it might need a simple allocator. If we need vendor specific commands then we need to take that to the relevant body. I'm not sure what they would be though. > > Not sure if this would be upstream-worthy, or if we'd want to fork > mem/cxl-type3.c into like mem/cxl-type3-multihead.c or something. > > The base type3 device is going to end up overloaded at some point i > think, and we'll want to look at trying to abstract it. Sure. Though we might end up with the normal type3 implementation being (optionally) the primary device for a MHD (the one with the FM-API tunneling available on it's mailbox). Would need a secondary device though which you instantiate with a link to the primary one or with a socket. (assuming primary opens socket as well). Jonathan > > ~Gregory ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CXL 2.0 memory pooling emulation 2023-02-16 18:00 ` Jonathan Cameron via @ 2023-02-16 20:52 ` Gregory Price 2023-02-17 11:14 ` Jonathan Cameron via 0 siblings, 1 reply; 12+ messages in thread From: Gregory Price @ 2023-02-16 20:52 UTC (permalink / raw) To: Jonathan Cameron; +Cc: zhiting zhu, qemu-devel, linux-cxl, Viacheslav A.Dubeyko On Thu, Feb 16, 2023 at 06:00:57PM +0000, Jonathan Cameron wrote: > On Wed, 15 Feb 2023 04:10:20 -0500 > Gregory Price <gregory.price@memverge.com> wrote: > > > On Wed, Feb 15, 2023 at 03:18:54PM +0000, Jonathan Cameron via wrote: > > > On Wed, 8 Feb 2023 16:28:44 -0600 > > > zhiting zhu <zhitingz@cs.utexas.edu> wrote: > > > > > > 1) Emulate an Multi Headed Device. > > > Initially connect two heads to different host bridges on a single QEMU > > > machine. That lets us test most of the code flows without needing > > > to handle tests that involve multiple machines. > > > Later, we could add a means to connect between two instances of QEMU. > > > > Hackiest way to do this is to connect the same memory backend to two > > type-3 devices, with the obvious caveat that the device state will not > > be consistent between views. > > > > But we could, for example, just put the relevant shared state into an > > optional shared memory area instead of a normally allocated region. > > > > i can imagine this looking something like > > > > memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true > > cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,shm_token=mytoken > > > > then you can have multiple qemu instances hook their relevant devices up > > to a a backend that points to the same file, and instantiate their > > shared state in the region shmget(mytoken). > > That's not pretty. For local instance I was thinking a primary device > which also has the FM-API tunneled access via mailbox, and secondary devices > that don't. That would also apply to remote. The secondary device would > then just receive some control commands on what to expose up to it's host. > Not sure what convention on how to do that is in QEMU. Maybe a socket > interface like is done for swtpm? With some ordering constraints on startup. > I agree, it's certainly "not pretty". I'd go so far as to call the baby ugly :]. Like i said: "The Hackiest way" My understanding from looking around at some road shows is that some of these early multi-headed devices are basically just SLD's with multiple heads. Most of these devices had to be developed well before DCD's and therefore the FM-API were placed in the spec, and we haven't seen or heard of any of these early devices having any form of switch yet. I don't see how this type of device is feasible unless it's either statically provisioned (change firmware settings from bios on reboot) or implements custom firmware commands to implement some form of exclusivity controls over memory regions. The former makes it not really a useful pooling device, so I'm sorta guessing we'll see most of these early devices implement custom commands. I'm just not sure these early MHD's are going to have any real form of FM-API, but it would still be nice to emulate them. ~Gregory ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CXL 2.0 memory pooling emulation 2023-02-16 20:52 ` Gregory Price @ 2023-02-17 11:14 ` Jonathan Cameron via 2023-02-17 11:02 ` Gregory Price 0 siblings, 1 reply; 12+ messages in thread From: Jonathan Cameron via @ 2023-02-17 11:14 UTC (permalink / raw) To: Gregory Price; +Cc: zhiting zhu, qemu-devel, linux-cxl, Viacheslav A.Dubeyko On Thu, 16 Feb 2023 15:52:31 -0500 Gregory Price <gregory.price@memverge.com> wrote: > On Thu, Feb 16, 2023 at 06:00:57PM +0000, Jonathan Cameron wrote: > > On Wed, 15 Feb 2023 04:10:20 -0500 > > Gregory Price <gregory.price@memverge.com> wrote: > > > > > On Wed, Feb 15, 2023 at 03:18:54PM +0000, Jonathan Cameron via wrote: > > > > On Wed, 8 Feb 2023 16:28:44 -0600 > > > > zhiting zhu <zhitingz@cs.utexas.edu> wrote: > > > > > > > > 1) Emulate an Multi Headed Device. > > > > Initially connect two heads to different host bridges on a single QEMU > > > > machine. That lets us test most of the code flows without needing > > > > to handle tests that involve multiple machines. > > > > Later, we could add a means to connect between two instances of QEMU. > > > > > > Hackiest way to do this is to connect the same memory backend to two > > > type-3 devices, with the obvious caveat that the device state will not > > > be consistent between views. > > > > > > But we could, for example, just put the relevant shared state into an > > > optional shared memory area instead of a normally allocated region. > > > > > > i can imagine this looking something like > > > > > > memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true > > > cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,shm_token=mytoken > > > > > > then you can have multiple qemu instances hook their relevant devices up > > > to a a backend that points to the same file, and instantiate their > > > shared state in the region shmget(mytoken). > > > > That's not pretty. For local instance I was thinking a primary device > > which also has the FM-API tunneled access via mailbox, and secondary devices > > that don't. That would also apply to remote. The secondary device would > > then just receive some control commands on what to expose up to it's host. > > Not sure what convention on how to do that is in QEMU. Maybe a socket > > interface like is done for swtpm? With some ordering constraints on startup. > > > > I agree, it's certainly "not pretty". > > I'd go so far as to call the baby ugly :]. Like i said: "The Hackiest way" > > My understanding from looking around at some road shows is that some > of these early multi-headed devices are basically just SLD's with multiple > heads. Most of these devices had to be developed well before DCD's and > therefore the FM-API were placed in the spec, and we haven't seen or > heard of any of these early devices having any form of switch yet. > > I don't see how this type of device is feasible unless it's either statically > provisioned (change firmware settings from bios on reboot) or implements > custom firmware commands to implement some form of exclusivity controls over > memory regions. > > The former makes it not really a useful pooling device, so I'm sorta guessing > we'll see most of these early devices implement custom commands. > > I'm just not sure these early MHD's are going to have any real form of > FM-API, but it would still be nice to emulate them. > Makes sense. I'd be fine with adding any necessary hooks to allow that in the QEMU emulation, but probably not upstreaming the custom stuff. Jonathan > ~Gregory ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CXL 2.0 memory pooling emulation 2023-02-17 11:14 ` Jonathan Cameron via @ 2023-02-17 11:02 ` Gregory Price 0 siblings, 0 replies; 12+ messages in thread From: Gregory Price @ 2023-02-17 11:02 UTC (permalink / raw) To: Jonathan Cameron; +Cc: zhiting zhu, qemu-devel, linux-cxl, Viacheslav A.Dubeyko On Fri, Feb 17, 2023 at 11:14:18AM +0000, Jonathan Cameron wrote: > On Thu, 16 Feb 2023 15:52:31 -0500 > Gregory Price <gregory.price@memverge.com> wrote: > > > > > I agree, it's certainly "not pretty". > > > > I'd go so far as to call the baby ugly :]. Like i said: "The Hackiest way" > > > > My understanding from looking around at some road shows is that some > > of these early multi-headed devices are basically just SLD's with multiple > > heads. Most of these devices had to be developed well before DCD's and > > therefore the FM-API were placed in the spec, and we haven't seen or > > heard of any of these early devices having any form of switch yet. > > > > I don't see how this type of device is feasible unless it's either statically > > provisioned (change firmware settings from bios on reboot) or implements > > custom firmware commands to implement some form of exclusivity controls over > > memory regions. > > > > The former makes it not really a useful pooling device, so I'm sorta guessing > > we'll see most of these early devices implement custom commands. > > > > I'm just not sure these early MHD's are going to have any real form of > > FM-API, but it would still be nice to emulate them. > > > Makes sense. I'd be fine with adding any necessary hooks to allow that > in the QEMU emulation, but probably not upstreaming the custom stuff. > > Jonathan > I'll have to give it some thought. The "custom stuff" is mostly init code, mailbox commands, and the fields those mailbox commands twiddle. I guess we could create a wrapper-device that hooks raw commands? Is that what raw commands are intended for? Notably the kernel has to be compiled with raw command support, which is disabled by default, but that's fine. Dunno, spitballing, but i'm a couple days away from a first pass at a MHD, though I'll need to spend quite a bit of time cleaning it up before i can push an RFC. ~Gregory ^ permalink raw reply [flat|nested] 12+ messages in thread
* CXL memory pooling emulation inqury 2023-02-15 15:18 ` Jonathan Cameron via 2023-02-15 9:10 ` Gregory Price @ 2025-03-10 8:02 ` Junjie Fu 2025-03-12 18:05 ` Jonathan Cameron via 1 sibling, 1 reply; 12+ messages in thread From: Junjie Fu @ 2025-03-10 8:02 UTC (permalink / raw) To: qemu-devel; +Cc: Jonathan.Cameron, linux-cxl, viacheslav.dubeyko, zhitingz > Note though that there is a long way to go before we can do what you > want. The steps I'd expect to see along the way: > > 1) Emulate an Multi Headed Device. > Initially connect two heads to different host bridges on a single QEMU > machine. That lets us test most of the code flows without needing > to handle tests that involve multiple machines. > Later, we could add a means to connect between two instances of QEMU. > 2) Add DCD support (we'll need the kernel side of that as well) > 3) Wire it all up. > 4) Do the same for a Switch with MLDs behind it so we can poke the fun > corners. Hi,Jonathan Given your previous exploration, I would like to ask the following questions: 1.Does QEMU currently support simulating the above CXL memory pooling scenario? 2.If not fully supported yet, are there any available development branches or patches that implement this functionality? 3.Are there any guidelines or considerations for configuring and testing CXL memory pooling in QEMU? I sincerely appreciate your time and guidance on this topic! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CXL memory pooling emulation inqury 2025-03-10 8:02 ` CXL memory pooling emulation inqury Junjie Fu @ 2025-03-12 18:05 ` Jonathan Cameron via 2025-03-12 19:33 ` Gregory Price 0 siblings, 1 reply; 12+ messages in thread From: Jonathan Cameron via @ 2025-03-12 18:05 UTC (permalink / raw) To: Junjie Fu Cc: qemu-devel, linux-cxl, viacheslav.dubeyko, zhitingz, Gregory Price, svetly.todorov On Mon, 10 Mar 2025 16:02:45 +0800 Junjie Fu <fujunjie1@qq.com> wrote: > > Note though that there is a long way to go before we can do what you > > want. The steps I'd expect to see along the way: > > > > 1) Emulate an Multi Headed Device. > > Initially connect two heads to different host bridges on a single QEMU > > machine. That lets us test most of the code flows without needing > > to handle tests that involve multiple machines. > > Later, we could add a means to connect between two instances of QEMU. > > 2) Add DCD support (we'll need the kernel side of that as well) > > 3) Wire it all up. > > 4) Do the same for a Switch with MLDs behind it so we can poke the fun > > corners. > > > Hi,Jonathan > > Given your previous exploration, I would like to ask the following questions: > 1.Does QEMU currently support simulating the above CXL memory pooling scenario? Not in upstream yet but Gregory posted emulation support last year. https://lore.kernel.org/qemu-devel/20241018161252.8896-1-gourry@gourry.net/ I'm carrying the patches on my staging tree. https://gitlab.com/jic23/qemu/-/commits/cxl-2025-02-20?ref_type=heads Longer term I remain a little unconvinced by whether this is the best approach because I also want a single management path (so fake CCI etc) and that may need to be exposed to one of the hosts for tests purposes. In the current approach commands are issued to each host directly to surface memory. > > 2.If not fully supported yet, are there any available development branches > or patches that implement this functionality? > > 3.Are there any guidelines or considerations for configuring and testing CXL memory pooling in QEMU? There is some information in that patch series cover letter. +CC Gregory and Svelty. > > I sincerely appreciate your time and guidance on this topic! > No problem. Jonathan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CXL memory pooling emulation inqury 2025-03-12 18:05 ` Jonathan Cameron via @ 2025-03-12 19:33 ` Gregory Price 2025-03-13 16:03 ` Fan Ni 2025-04-08 4:47 ` Fan Ni 0 siblings, 2 replies; 12+ messages in thread From: Gregory Price @ 2025-03-12 19:33 UTC (permalink / raw) To: Jonathan Cameron Cc: Junjie Fu, qemu-devel, linux-cxl, viacheslav.dubeyko, zhitingz, svetly.todorov On Wed, Mar 12, 2025 at 06:05:43PM +0000, Jonathan Cameron wrote: > > Longer term I remain a little unconvinced by whether this is the best approach > because I also want a single management path (so fake CCI etc) and that may > need to be exposed to one of the hosts for tests purposes. In the current > approach commands are issued to each host directly to surface memory. > Lets say we implement this ----------- ----------- | Host 1 | | Host 2 | | | | | | | v | Add | | | CCI | ------> | Evt Log | ----------- ----------- ^ What mechanism do you use here? And how does it not just replicate QMP logic? Not arguing against it, I just see what amounts to more code than required to test the functionality. QMP fits the bill so split the CCI interface for single-host management testing and the MHSLD interface. Why not leave the 1-node DCD with inbound CCI interface for testing and leave QMP interface for development of a reference fabric manager outside the scope of another host? TL;DR: :[ distributed systems are hard to test > > > > 2.If not fully supported yet, are there any available development branches > > or patches that implement this functionality? > > > > 3.Are there any guidelines or considerations for configuring and testing CXL memory pooling in QEMU? > > There is some information in that patch series cover letter. > The attached series implements an MHSLD, but implementing the pooling mechanism (i.e. fabric manager logic) is left to the imagination of the reader. You will want to look at Fan Ni's DCD patch set to understand the QMP Add/Remove logic for DCD capacity. This patch set just enables you to manage 2+ QEMU Guests sharing a DCD State in shared memory. So you'll have to send DCD commands individual guest QEMU via QMP, but the underlying logic manages the shared state via locks to emulate real MHSLD behavior. QMP|---> Host 1 --------v [FM]-----| [Shared State] QMP|---> Host 2 --------^ This differs from a real DCD in that a real DCD is a single endpoint for management, rather than N endpoints (1 per vm). |---> Host 1 [FM] ---> [DCD] --| |---> Host 2 However this is an implementation detail on the FM side, so I chose to do it this way to simplify the QEMU MHSLD implementation. There's far fewer interactions this way - with the downside that having one of the hosts manage the shared state isn't possible via the current emulation. It could probably be done, but I'm not sure what value it has since the FM implementation difference is a matter a small amount of python. It's been a while since I played with this patch set and I do not have a reference pooling manager available to me any longer unfortunately. But I'm happy to provide some guidance where I can. ~Gregory ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CXL memory pooling emulation inqury 2025-03-12 19:33 ` Gregory Price @ 2025-03-13 16:03 ` Fan Ni 2025-04-08 4:47 ` Fan Ni 1 sibling, 0 replies; 12+ messages in thread From: Fan Ni @ 2025-03-13 16:03 UTC (permalink / raw) To: Gregory Price Cc: Jonathan Cameron, Junjie Fu, qemu-devel, linux-cxl, viacheslav.dubeyko, zhitingz, svetly.todorov, a.manzanares, fan.ni, anisa.su887, dave On Wed, Mar 12, 2025 at 03:33:12PM -0400, Gregory Price wrote: > On Wed, Mar 12, 2025 at 06:05:43PM +0000, Jonathan Cameron wrote: > > > > Longer term I remain a little unconvinced by whether this is the best approach > > because I also want a single management path (so fake CCI etc) and that may > > need to be exposed to one of the hosts for tests purposes. In the current > > approach commands are issued to each host directly to surface memory. > > > > Lets say we implement this > > ----------- ----------- > | Host 1 | | Host 2 | > | | | | | > | v | Add | | > | CCI | ------> | Evt Log | > ----------- ----------- > ^ > What mechanism > do you use here? > > And how does it not just replicate QMP logic? > > Not arguing against it, I just see what amounts to more code than > required to test the functionality. QMP fits the bill so split the CCI > interface for single-host management testing and the MHSLD interface. We have recently discussed the approach internally. Our idea is to do something similar to what you have done with MHSLD emulation, use shmem dev to share information (mailbox???) between the two devices. > > Why not leave the 1-node DCD with inbound CCI interface for testing and > leave QMP interface for development of a reference fabric manager > outside the scope of another host? For this two hosts setup, for now I can see benefits, the two hosts can have different kernel, that is to say, the one served as FM only need to support for exmaple out-of-band communication with the hardware (MCTP over i2c), and do not need to evolve with what we want to test on the target host (boot with kernel with features we care). That is very important at least for test purpose, as mctp over i2c support for x86 support is not upstreamed yet, we do not want to rebase whenever the kernel is updated. More speficially, let's say, we deploy libcxlmi test framework on the FM host, and then we can test the target host whatever features needed to test (DCD etc). Again the FM host does not need to have dcd kernel support. Compared to qmp interface, since libcxlmi already supports a lot of commands and more commands are being included. It should be much more convinient than implementing them with qmp interface. Fan > > TL;DR: :[ distributed systems are hard to test > > > > > > > 2.If not fully supported yet, are there any available development branches > > > or patches that implement this functionality? > > > > > > 3.Are there any guidelines or considerations for configuring and testing CXL memory pooling in QEMU? > > > > There is some information in that patch series cover letter. > > > > The attached series implements an MHSLD, but implementing the pooling > mechanism (i.e. fabric manager logic) is left to the imagination of the > reader. You will want to look at Fan Ni's DCD patch set to understand > the QMP Add/Remove logic for DCD capacity. This patch set just enables > you to manage 2+ QEMU Guests sharing a DCD State in shared memory. > > So you'll have to send DCD commands individual guest QEMU via QMP, but > the underlying logic manages the shared state via locks to emulate real > MHSLD behavior. > QMP|---> Host 1 --------v > [FM]-----| [Shared State] > QMP|---> Host 2 --------^ > > This differs from a real DCD in that a real DCD is a single endpoint for > management, rather than N endpoints (1 per vm). > > |---> Host 1 > [FM] ---> [DCD] --| > |---> Host 2 > > However this is an implementation detail on the FM side, so I chose to > do it this way to simplify the QEMU MHSLD implementation. There's far > fewer interactions this way - with the downside that having one of the > hosts manage the shared state isn't possible via the current emulation. > > It could probably be done, but I'm not sure what value it has since the > FM implementation difference is a matter a small amount of python. > > It's been a while since I played with this patch set and I do not have a > reference pooling manager available to me any longer unfortunately. But > I'm happy to provide some guidance where I can. > > ~Gregory ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CXL memory pooling emulation inqury 2025-03-12 19:33 ` Gregory Price 2025-03-13 16:03 ` Fan Ni @ 2025-04-08 4:47 ` Fan Ni 1 sibling, 0 replies; 12+ messages in thread From: Fan Ni @ 2025-04-08 4:47 UTC (permalink / raw) To: Gregory Price Cc: Jonathan Cameron, Junjie Fu, qemu-devel, linux-cxl, viacheslav.dubeyko, zhitingz, svetly.todorov On Wed, Mar 12, 2025 at 03:33:12PM -0400, Gregory Price wrote: > On Wed, Mar 12, 2025 at 06:05:43PM +0000, Jonathan Cameron wrote: > > > > Longer term I remain a little unconvinced by whether this is the best approach > > because I also want a single management path (so fake CCI etc) and that may > > need to be exposed to one of the hosts for tests purposes. In the current > > approach commands are issued to each host directly to surface memory. > > > > Lets say we implement this > > ----------- ----------- > | Host 1 | | Host 2 | > | | | | | > | v | Add | | > | CCI | ------> | Evt Log | > ----------- ----------- > ^ > What mechanism > do you use here? > > And how does it not just replicate QMP logic? > > Not arguing against it, I just see what amounts to more code than > required to test the functionality. QMP fits the bill so split the CCI > interface for single-host management testing and the MHSLD interface. > > Why not leave the 1-node DCD with inbound CCI interface for testing and > leave QMP interface for development of a reference fabric manager > outside the scope of another host? Hi Gregory, FYI. Just posted a RFC for FM emulation, the approach used does not need to replicate QMP logic, but indeed we use one QMP to notify host2 for a in-coming MCTP message. https://lore.kernel.org/linux-cxl/20250408043051.430340-1-nifan.cxl@gmail.com/ Fan > > TL;DR: :[ distributed systems are hard to test > > > > > > > 2.If not fully supported yet, are there any available development branches > > > or patches that implement this functionality? > > > > > > 3.Are there any guidelines or considerations for configuring and testing CXL memory pooling in QEMU? > > > > There is some information in that patch series cover letter. > > > > The attached series implements an MHSLD, but implementing the pooling > mechanism (i.e. fabric manager logic) is left to the imagination of the > reader. You will want to look at Fan Ni's DCD patch set to understand > the QMP Add/Remove logic for DCD capacity. This patch set just enables > you to manage 2+ QEMU Guests sharing a DCD State in shared memory. > > So you'll have to send DCD commands individual guest QEMU via QMP, but > the underlying logic manages the shared state via locks to emulate real > MHSLD behavior. > QMP|---> Host 1 --------v > [FM]-----| [Shared State] > QMP|---> Host 2 --------^ > > This differs from a real DCD in that a real DCD is a single endpoint for > management, rather than N endpoints (1 per vm). > > |---> Host 1 > [FM] ---> [DCD] --| > |---> Host 2 > > However this is an implementation detail on the FM side, so I chose to > do it this way to simplify the QEMU MHSLD implementation. There's far > fewer interactions this way - with the downside that having one of the > hosts manage the shared state isn't possible via the current emulation. > > It could probably be done, but I'm not sure what value it has since the > FM implementation difference is a matter a small amount of python. > > It's been a while since I played with this patch set and I do not have a > reference pooling manager available to me any longer unfortunately. But > I'm happy to provide some guidance where I can. > > ~Gregory ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-04-08 4:48 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-02-08 22:28 CXL 2.0 memory pooling emulation zhiting zhu 2023-02-15 15:18 ` Jonathan Cameron via 2023-02-15 9:10 ` Gregory Price 2023-02-16 18:00 ` Jonathan Cameron via 2023-02-16 20:52 ` Gregory Price 2023-02-17 11:14 ` Jonathan Cameron via 2023-02-17 11:02 ` Gregory Price 2025-03-10 8:02 ` CXL memory pooling emulation inqury Junjie Fu 2025-03-12 18:05 ` Jonathan Cameron via 2025-03-12 19:33 ` Gregory Price 2025-03-13 16:03 ` Fan Ni 2025-04-08 4:47 ` Fan Ni
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).