* [PATCH 0/3] Introduce CXL type-2 device emulation @ 2024-12-12 13:04 Zhi Wang 2024-12-12 13:04 ` [PATCH 1/3] hw/cxl: factor out cxl_host_addr_to_dpa() Zhi Wang ` (3 more replies) 0 siblings, 4 replies; 15+ messages in thread From: Zhi Wang @ 2024-12-12 13:04 UTC (permalink / raw) To: qemu-devel Cc: dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny, fan.ni, alex.williamson, alucerop, clg, acurrid, cjia, smitra, ankita, aniketa, kwankhede, targupta, zhiw, zhiwang Hi folks: Per the discussion with Ira/Jonathan in the LPC 2024 and in the CXL discord channel, we are trying to introduce a CXL type-2 device emulation in QEMU, as there are work currently on supporting CXL type-2 device [1] in Linux kernel and CXL type-2 device virtualization [2]. It provides a bare minimum base for folks who would like to: - Contribute and test the CXL type-2 device support in the linux kernel and CXL type-2 virtualization without having an actual HW. - Introduce more emulated features to prototype the kernel CXL type-2 device features and CXL type-2 virtualization. To test this patchset, please refer to steps in [3]. Use this patcheset with the latest QEMU repo to be the QEMU host. It achieves the same output as in the demo video [4]: The VFIO CXL core and VFIO CXL sample variant driver can be attached to the emulated device in the L1 guest and assigned to the L2 guest. The sample driver in the L2 guest can attach to the pass-thrued device and create the CXL region. Tested on the CXL type-2 virtualization RFC patches [3] with an extra fix [5]. [1] https://lore.kernel.org/linux-cxl/20241209185429.54054-1-alejandro.lucero-palau@amd.com/T/#t [2] https://www.youtube.com/watch?v=e5OW1pR84Zs [3] https://lore.kernel.org/kvm/20240920223446.1908673-3-zhiw@nvidia.com/T/ [4] https://youtu.be/zlk_ecX9bxs?si=pf9CttcGT5KwUgiH [5] https://lore.kernel.org/linux-cxl/20241212123959.68514-1-zhiw@nvidia.com/T/#u Zhi Wang (3): hw/cxl: factor out cxl_host_addr_to_dpa() hw/cxl: introduce cxl_component_update_dvsec() hw/cxl: introduce CXL type-2 device emulation MAINTAINERS | 1 + docs/system/devices/cxl.rst | 11 ++ hw/cxl/cxl-component-utils.c | 103 ++++++++++- hw/cxl/cxl-host.c | 19 +- hw/mem/Kconfig | 5 + hw/mem/cxl_accel.c | 319 +++++++++++++++++++++++++++++++++ hw/mem/cxl_type3.c | 61 +------ hw/mem/meson.build | 1 + include/hw/cxl/cxl_component.h | 7 + include/hw/cxl/cxl_device.h | 25 +++ include/hw/pci/pci_ids.h | 1 + 11 files changed, 484 insertions(+), 69 deletions(-) create mode 100644 hw/mem/cxl_accel.c -- 2.43.5 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/3] hw/cxl: factor out cxl_host_addr_to_dpa() 2024-12-12 13:04 [PATCH 0/3] Introduce CXL type-2 device emulation Zhi Wang @ 2024-12-12 13:04 ` Zhi Wang 2025-01-21 15:52 ` Jonathan Cameron via 2024-12-12 13:04 ` [PATCH 2/3] hw/cxl: introduce cxl_component_update_dvsec() Zhi Wang ` (2 subsequent siblings) 3 siblings, 1 reply; 15+ messages in thread From: Zhi Wang @ 2024-12-12 13:04 UTC (permalink / raw) To: qemu-devel Cc: dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny, fan.ni, alex.williamson, alucerop, clg, acurrid, cjia, smitra, ankita, aniketa, kwankhede, targupta, zhiw, zhiwang The emulated CXL type-3 device needs to translate the host_addr to the DPA when a guest accessing a CXL region. It is implemented in cxl_type3_dpa(). However, other type of CXL devices requires the same routine. E.g. an emulated CXL type-2 device. Factor out the routine from the emulated CXL type-3 device. No functional change is intended. Signed-off-by: Zhi Wang <zhiw@nvidia.com> --- hw/cxl/cxl-component-utils.c | 65 ++++++++++++++++++++++++++++++++++ hw/mem/cxl_type3.c | 61 +------------------------------ include/hw/cxl/cxl_component.h | 3 ++ 3 files changed, 69 insertions(+), 60 deletions(-) diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c index cd116c0401..aa5fb20d25 100644 --- a/hw/cxl/cxl-component-utils.c +++ b/hw/cxl/cxl-component-utils.c @@ -531,3 +531,68 @@ uint8_t cxl_interleave_granularity_enc(uint64_t gran, Error **errp) return 0; } } + +bool cxl_host_addr_to_dpa(CXLComponentState *cxl_cstate, hwaddr host_addr, + uint64_t *dpa) +{ + int hdm_inc = R_CXL_HDM_DECODER1_BASE_LO - R_CXL_HDM_DECODER0_BASE_LO; + uint32_t *cache_mem = cxl_cstate->crb.cache_mem_registers; + unsigned int hdm_count; + uint32_t cap; + uint64_t dpa_base = 0; + int i; + + cap = ldl_le_p(cache_mem + R_CXL_HDM_DECODER_CAPABILITY); + hdm_count = cxl_decoder_count_dec(FIELD_EX32(cap, + CXL_HDM_DECODER_CAPABILITY, + DECODER_COUNT)); + + for (i = 0; i < hdm_count; i++) { + uint64_t decoder_base, decoder_size, hpa_offset, skip; + uint32_t hdm_ctrl, low, high; + int ig, iw; + + low = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_LO + i * hdm_inc); + high = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_HI + i * hdm_inc); + decoder_base = ((uint64_t)high << 32) | (low & 0xf0000000); + + low = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_SIZE_LO + i * hdm_inc); + high = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_SIZE_HI + i * hdm_inc); + decoder_size = ((uint64_t)high << 32) | (low & 0xf0000000); + + low = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_DPA_SKIP_LO + + i * hdm_inc); + high = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_DPA_SKIP_HI + + i * hdm_inc); + skip = ((uint64_t)high << 32) | (low & 0xf0000000); + dpa_base += skip; + + hpa_offset = (uint64_t)host_addr - decoder_base; + + hdm_ctrl = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_CTRL + i * hdm_inc); + iw = FIELD_EX32(hdm_ctrl, CXL_HDM_DECODER0_CTRL, IW); + ig = FIELD_EX32(hdm_ctrl, CXL_HDM_DECODER0_CTRL, IG); + if (!FIELD_EX32(hdm_ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) { + return false; + } + if (((uint64_t)host_addr < decoder_base) || + (hpa_offset >= decoder_size)) { + int decoded_iw = cxl_interleave_ways_dec(iw, &error_fatal); + + if (decoded_iw == 0) { + return false; + } + + dpa_base += decoder_size / decoded_iw; + continue; + } + + *dpa = dpa_base + + ((MAKE_64BIT_MASK(0, 8 + ig) & hpa_offset) | + ((MAKE_64BIT_MASK(8 + ig + iw, 64 - 8 - ig - iw) & hpa_offset) + >> iw)); + + return true; + } + return false; +} diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c index 5cf754b38f..6a56b6de64 100644 --- a/hw/mem/cxl_type3.c +++ b/hw/mem/cxl_type3.c @@ -1038,66 +1038,7 @@ void ct3_clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa, static bool cxl_type3_dpa(CXLType3Dev *ct3d, hwaddr host_addr, uint64_t *dpa) { - int hdm_inc = R_CXL_HDM_DECODER1_BASE_LO - R_CXL_HDM_DECODER0_BASE_LO; - uint32_t *cache_mem = ct3d->cxl_cstate.crb.cache_mem_registers; - unsigned int hdm_count; - uint32_t cap; - uint64_t dpa_base = 0; - int i; - - cap = ldl_le_p(cache_mem + R_CXL_HDM_DECODER_CAPABILITY); - hdm_count = cxl_decoder_count_dec(FIELD_EX32(cap, - CXL_HDM_DECODER_CAPABILITY, - DECODER_COUNT)); - - for (i = 0; i < hdm_count; i++) { - uint64_t decoder_base, decoder_size, hpa_offset, skip; - uint32_t hdm_ctrl, low, high; - int ig, iw; - - low = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_LO + i * hdm_inc); - high = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_HI + i * hdm_inc); - decoder_base = ((uint64_t)high << 32) | (low & 0xf0000000); - - low = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_SIZE_LO + i * hdm_inc); - high = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_SIZE_HI + i * hdm_inc); - decoder_size = ((uint64_t)high << 32) | (low & 0xf0000000); - - low = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_DPA_SKIP_LO + - i * hdm_inc); - high = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_DPA_SKIP_HI + - i * hdm_inc); - skip = ((uint64_t)high << 32) | (low & 0xf0000000); - dpa_base += skip; - - hpa_offset = (uint64_t)host_addr - decoder_base; - - hdm_ctrl = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_CTRL + i * hdm_inc); - iw = FIELD_EX32(hdm_ctrl, CXL_HDM_DECODER0_CTRL, IW); - ig = FIELD_EX32(hdm_ctrl, CXL_HDM_DECODER0_CTRL, IG); - if (!FIELD_EX32(hdm_ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) { - return false; - } - if (((uint64_t)host_addr < decoder_base) || - (hpa_offset >= decoder_size)) { - int decoded_iw = cxl_interleave_ways_dec(iw, &error_fatal); - - if (decoded_iw == 0) { - return false; - } - - dpa_base += decoder_size / decoded_iw; - continue; - } - - *dpa = dpa_base + - ((MAKE_64BIT_MASK(0, 8 + ig) & hpa_offset) | - ((MAKE_64BIT_MASK(8 + ig + iw, 64 - 8 - ig - iw) & hpa_offset) - >> iw)); - - return true; - } - return false; + return cxl_host_addr_to_dpa(&ct3d->cxl_cstate, host_addr, dpa); } static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d, diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h index 945ee6ffd0..abb2e874b2 100644 --- a/include/hw/cxl/cxl_component.h +++ b/include/hw/cxl/cxl_component.h @@ -268,6 +268,9 @@ uint8_t cxl_interleave_ways_enc(int iw, Error **errp); int cxl_interleave_ways_dec(uint8_t iw_enc, Error **errp); uint8_t cxl_interleave_granularity_enc(uint64_t gran, Error **errp); +bool cxl_host_addr_to_dpa(CXLComponentState *cxl_cstate, hwaddr host_addr, + uint64_t *dpa); + hwaddr cxl_decode_ig(int ig); CXLComponentState *cxl_get_hb_cstate(PCIHostState *hb); -- 2.43.5 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] hw/cxl: factor out cxl_host_addr_to_dpa() 2024-12-12 13:04 ` [PATCH 1/3] hw/cxl: factor out cxl_host_addr_to_dpa() Zhi Wang @ 2025-01-21 15:52 ` Jonathan Cameron via 0 siblings, 0 replies; 15+ messages in thread From: Jonathan Cameron via @ 2025-01-21 15:52 UTC (permalink / raw) To: Zhi Wang Cc: qemu-devel, dan.j.williams, dave.jiang, ira.weiny, fan.ni, alex.williamson, alucerop, clg, acurrid, cjia, smitra, ankita, aniketa, kwankhede, targupta, zhiwang Hi Zhi, > index 945ee6ffd0..abb2e874b2 100644 > --- a/include/hw/cxl/cxl_component.h > +++ b/include/hw/cxl/cxl_component.h > @@ -268,6 +268,9 @@ uint8_t cxl_interleave_ways_enc(int iw, Error **errp); > int cxl_interleave_ways_dec(uint8_t iw_enc, Error **errp); > uint8_t cxl_interleave_granularity_enc(uint64_t gran, Error **errp); > > +bool cxl_host_addr_to_dpa(CXLComponentState *cxl_cstate, hwaddr host_addr, > + uint64_t *dpa); > + Other than this being a really odd place to put it (in the middle of interleave granularity functions. This change seems like a logical enough change without the reuse. Also, I'm carrying a patch for 3/6/12 way support so I've merged this on top of that in general interest of reducing patches flying around when then are reasonable on their own. Jonathan > hwaddr cxl_decode_ig(int ig); > > CXLComponentState *cxl_get_hb_cstate(PCIHostState *hb); ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/3] hw/cxl: introduce cxl_component_update_dvsec() 2024-12-12 13:04 [PATCH 0/3] Introduce CXL type-2 device emulation Zhi Wang 2024-12-12 13:04 ` [PATCH 1/3] hw/cxl: factor out cxl_host_addr_to_dpa() Zhi Wang @ 2024-12-12 13:04 ` Zhi Wang 2025-01-21 15:57 ` Jonathan Cameron via 2024-12-12 13:04 ` [PATCH 3/3] hw/cxl: introduce CXL type-2 device emulation Zhi Wang 2024-12-12 16:49 ` [PATCH 0/3] Introduce " Alejandro Lucero Palau 3 siblings, 1 reply; 15+ messages in thread From: Zhi Wang @ 2024-12-12 13:04 UTC (permalink / raw) To: qemu-devel Cc: dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny, fan.ni, alex.williamson, alucerop, clg, acurrid, cjia, smitra, ankita, aniketa, kwankhede, targupta, zhiw, zhiwang There are many DVSEC registers in the PCI configuration space that are configurable. E.g. DVS control. They are configured and initalized in cxl_component_create_dvsec(). When the virtual machine reboots, the reset callback in the emulation of the emulated CXL device resets the device states back to default states. So far, there is no decent approach to reset the values of CXL DVSEC registers in the PCI configuation space one for all. Without reseting the values of CXL DVSEC registers, the CXL type-2 driver failing to claim the endpoint: - DVS_CONTROL.MEM_ENABLE is left to be 1 across the system reboot. - Type-2 driver loads. - In the endpoint probe, the kernel CXL core sees the DVS_CONTROL.MEM_ENABLE is set. - The kernel CXL core wrongly thinks the HDM decoder is pre-configured by BIOS/UEFI. - The kernel CXL core uses the garbage in the HDM decoder registers and fails: [ 74.586911] cxl_accel_vfio_pci 0000:0d:00.0: Range register decodes outside platform defined CXL ranges. [ 74.588585] cxl_mem mem0: endpoint2 failed probe [ 74.589478] cxl_accel_vfio_pci 0000:0d:00.0: Fail to acquire CXL endpoint [ 74.591944] pcieport 0000:0c:00.0: unlocked secondary bus reset via: pciehp_reset_slot+0xa8/0x150 Introduce cxl_component_update_dvsec() for the emulation of CXL devices to reset the CXL DVSEC registers in the PCI configuration space. Signed-off-by: Zhi Wang <zhiw@nvidia.com> --- hw/cxl/cxl-component-utils.c | 36 ++++++++++++++++++++++++++++------ include/hw/cxl/cxl_component.h | 3 +++ 2 files changed, 33 insertions(+), 6 deletions(-) diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c index aa5fb20d25..355103d165 100644 --- a/hw/cxl/cxl-component-utils.c +++ b/hw/cxl/cxl-component-utils.c @@ -365,9 +365,13 @@ void cxl_component_register_init_common(uint32_t *reg_state, * Helper to creates a DVSEC header for a CXL entity. The caller is responsible * for tracking the valid offset. * - * This function will build the DVSEC header on behalf of the caller and then - * copy in the remaining data for the vendor specific bits. - * It will also set up appropriate write masks. + * This function will build the DVSEC header on behalf of the caller. It will + * also set up appropriate write masks. + * + * If required, it will copy in the remaining data for the vendor specific bits. + * Or the caller can also fill the remaining data later after the DVSEC header + * is built via cxl_component_update_dvsec(). + * */ void cxl_component_create_dvsec(CXLComponentState *cxl, enum reg_type cxl_dev_type, uint16_t length, @@ -387,9 +391,12 @@ void cxl_component_create_dvsec(CXLComponentState *cxl, pci_set_long(pdev->config + offset + PCIE_DVSEC_HEADER1_OFFSET, (length << 20) | (rev << 16) | CXL_VENDOR_ID); pci_set_word(pdev->config + offset + PCIE_DVSEC_ID_OFFSET, type); - memcpy(pdev->config + offset + sizeof(DVSECHeader), - body + sizeof(DVSECHeader), - length - sizeof(DVSECHeader)); + + if (body) { + memcpy(pdev->config + offset + sizeof(DVSECHeader), + body + sizeof(DVSECHeader), + length - sizeof(DVSECHeader)); + } /* Configure write masks */ switch (type) { @@ -481,6 +488,23 @@ void cxl_component_create_dvsec(CXLComponentState *cxl, cxl->dvsec_offset += length; } +void cxl_component_update_dvsec(CXLComponentState *cxl, uint16_t length, + uint16_t type, uint8_t *body) +{ + PCIDevice *pdev = cxl->pdev; + struct Range *r; + + assert(type < CXL20_MAX_DVSEC); + + r = &cxl->dvsecs[type]; + + assert(range_size(r) == length); + + memcpy(pdev->config + r->lob + sizeof(DVSECHeader), + body + sizeof(DVSECHeader), + length - sizeof(DVSECHeader)); +} + /* CXL r3.1 Section 8.2.4.20.7 CXL HDM Decoder n Control Register */ uint8_t cxl_interleave_ways_enc(int iw, Error **errp) { diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h index abb2e874b2..30fe4bfa24 100644 --- a/include/hw/cxl/cxl_component.h +++ b/include/hw/cxl/cxl_component.h @@ -261,6 +261,9 @@ void cxl_component_create_dvsec(CXLComponentState *cxl_cstate, enum reg_type cxl_dev_type, uint16_t length, uint16_t type, uint8_t rev, uint8_t *body); +void cxl_component_update_dvsec(CXLComponentState *cxl, uint16_t length, + uint16_t type, uint8_t *body); + int cxl_decoder_count_enc(int count); int cxl_decoder_count_dec(int enc_cnt); -- 2.43.5 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] hw/cxl: introduce cxl_component_update_dvsec() 2024-12-12 13:04 ` [PATCH 2/3] hw/cxl: introduce cxl_component_update_dvsec() Zhi Wang @ 2025-01-21 15:57 ` Jonathan Cameron via 0 siblings, 0 replies; 15+ messages in thread From: Jonathan Cameron via @ 2025-01-21 15:57 UTC (permalink / raw) To: Zhi Wang Cc: qemu-devel, dan.j.williams, dave.jiang, ira.weiny, fan.ni, alex.williamson, alucerop, clg, acurrid, cjia, smitra, ankita, aniketa, kwankhede, targupta, zhiwang On Thu, 12 Dec 2024 05:04:21 -0800 Zhi Wang <zhiw@nvidia.com> wrote: > There are many DVSEC registers in the PCI configuration space that are > configurable. E.g. DVS control. They are configured and initalized in > cxl_component_create_dvsec(). When the virtual machine reboots, the > reset callback in the emulation of the emulated CXL device resets the > device states back to default states. > > So far, there is no decent approach to reset the values of CXL DVSEC > registers in the PCI configuation space one for all. Without reseting > the values of CXL DVSEC registers, the CXL type-2 driver failing to > claim the endpoint: > > - DVS_CONTROL.MEM_ENABLE is left to be 1 across the system reboot. > - Type-2 driver loads. > - In the endpoint probe, the kernel CXL core sees the > DVS_CONTROL.MEM_ENABLE is set. > - The kernel CXL core wrongly thinks the HDM decoder is pre-configured > by BIOS/UEFI. > - The kernel CXL core uses the garbage in the HDM decoder registers and > fails: > > [ 74.586911] cxl_accel_vfio_pci 0000:0d:00.0: Range register decodes > outside platform defined CXL ranges. > [ 74.588585] cxl_mem mem0: endpoint2 failed probe > [ 74.589478] cxl_accel_vfio_pci 0000:0d:00.0: Fail to acquire CXL > endpoint > [ 74.591944] pcieport 0000:0c:00.0: unlocked secondary bus reset via: > pciehp_reset_slot+0xa8/0x150 > > Introduce cxl_component_update_dvsec() for the emulation of CXL devices > to reset the CXL DVSEC registers in the PCI configuration space. We know there are issues with this reset path for the type 3 devices. I'd be keen to see that fixed up using a mechanism like this then we can build on top of it for type2 support. I'm not convinced that a generic solution like this makes sense rather than a specific reset of registers as appropriate which allows us to carefully preserve the sticky bits. Reset for the type 3 has proved a little tricky to fix in the past and wasn't really a priority. Given we have to do it for type 2 I'd like to fix it up for both. Jonathan > > Signed-off-by: Zhi Wang <zhiw@nvidia.com> > --- > hw/cxl/cxl-component-utils.c | 36 ++++++++++++++++++++++++++++------ > include/hw/cxl/cxl_component.h | 3 +++ > 2 files changed, 33 insertions(+), 6 deletions(-) > > diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c > index aa5fb20d25..355103d165 100644 > --- a/hw/cxl/cxl-component-utils.c > +++ b/hw/cxl/cxl-component-utils.c > @@ -365,9 +365,13 @@ void cxl_component_register_init_common(uint32_t *reg_state, > * Helper to creates a DVSEC header for a CXL entity. The caller is responsible > * for tracking the valid offset. > * > - * This function will build the DVSEC header on behalf of the caller and then > - * copy in the remaining data for the vendor specific bits. > - * It will also set up appropriate write masks. > + * This function will build the DVSEC header on behalf of the caller. It will > + * also set up appropriate write masks. > + * > + * If required, it will copy in the remaining data for the vendor specific bits. > + * Or the caller can also fill the remaining data later after the DVSEC header > + * is built via cxl_component_update_dvsec(). > + * Pet hate. This blank line adds nothing so drop it. > */ > void cxl_component_create_dvsec(CXLComponentState *cxl, > enum reg_type cxl_dev_type, uint16_t length, > @@ -387,9 +391,12 @@ void cxl_component_create_dvsec(CXLComponentState *cxl, > pci_set_long(pdev->config + offset + PCIE_DVSEC_HEADER1_OFFSET, > (length << 20) | (rev << 16) | CXL_VENDOR_ID); > pci_set_word(pdev->config + offset + PCIE_DVSEC_ID_OFFSET, type); > - memcpy(pdev->config + offset + sizeof(DVSECHeader), > - body + sizeof(DVSECHeader), > - length - sizeof(DVSECHeader)); > + > + if (body) { > + memcpy(pdev->config + offset + sizeof(DVSECHeader), > + body + sizeof(DVSECHeader), > + length - sizeof(DVSECHeader)); > + } > > /* Configure write masks */ > switch (type) { > @@ -481,6 +488,23 @@ void cxl_component_create_dvsec(CXLComponentState *cxl, > cxl->dvsec_offset += length; > } > > +void cxl_component_update_dvsec(CXLComponentState *cxl, uint16_t length, > + uint16_t type, uint8_t *body) > +{ > + PCIDevice *pdev = cxl->pdev; > + struct Range *r; > + > + assert(type < CXL20_MAX_DVSEC); > + > + r = &cxl->dvsecs[type]; > + > + assert(range_size(r) == length); > + > + memcpy(pdev->config + r->lob + sizeof(DVSECHeader), > + body + sizeof(DVSECHeader), > + length - sizeof(DVSECHeader)); > +} > + > /* CXL r3.1 Section 8.2.4.20.7 CXL HDM Decoder n Control Register */ > uint8_t cxl_interleave_ways_enc(int iw, Error **errp) > { > diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h > index abb2e874b2..30fe4bfa24 100644 > --- a/include/hw/cxl/cxl_component.h > +++ b/include/hw/cxl/cxl_component.h > @@ -261,6 +261,9 @@ void cxl_component_create_dvsec(CXLComponentState *cxl_cstate, > enum reg_type cxl_dev_type, uint16_t length, > uint16_t type, uint8_t rev, uint8_t *body); > > +void cxl_component_update_dvsec(CXLComponentState *cxl, uint16_t length, > + uint16_t type, uint8_t *body); > + > int cxl_decoder_count_enc(int count); > int cxl_decoder_count_dec(int enc_cnt); > ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/3] hw/cxl: introduce CXL type-2 device emulation 2024-12-12 13:04 [PATCH 0/3] Introduce CXL type-2 device emulation Zhi Wang 2024-12-12 13:04 ` [PATCH 1/3] hw/cxl: factor out cxl_host_addr_to_dpa() Zhi Wang 2024-12-12 13:04 ` [PATCH 2/3] hw/cxl: introduce cxl_component_update_dvsec() Zhi Wang @ 2024-12-12 13:04 ` Zhi Wang 2024-12-12 17:02 ` Alejandro Lucero Palau 2025-01-21 16:16 ` Jonathan Cameron via 2024-12-12 16:49 ` [PATCH 0/3] Introduce " Alejandro Lucero Palau 3 siblings, 2 replies; 15+ messages in thread From: Zhi Wang @ 2024-12-12 13:04 UTC (permalink / raw) To: qemu-devel Cc: dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny, fan.ni, alex.williamson, alucerop, clg, acurrid, cjia, smitra, ankita, aniketa, kwankhede, targupta, zhiw, zhiwang From: Zhi Wang <zhiwang@kernel.org> Introduce a CXL type-2 device emulation that provides a minimum base for testing kernel CXL core type-2 support and CXL type-2 virtualization. It is also a good base for introducing the more emulated features. Currently, it only supports: - Emulating component registers with HDM decoders. - Volatile memory backend and emualtion of region access. The emulation is aimed to not tightly coupled with the current CXL type-3 emulation since many advanced CXL type-3 emulation features are not implemented in a CXL type-2 device. Co-developed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Zhi Wang <zhiwang@kernel.org> --- MAINTAINERS | 1 + docs/system/devices/cxl.rst | 11 ++ hw/cxl/cxl-component-utils.c | 2 + hw/cxl/cxl-host.c | 19 +- hw/mem/Kconfig | 5 + hw/mem/cxl_accel.c | 319 +++++++++++++++++++++++++++++++++ hw/mem/meson.build | 1 + include/hw/cxl/cxl_component.h | 1 + include/hw/cxl/cxl_device.h | 25 +++ include/hw/pci/pci_ids.h | 1 + 10 files changed, 382 insertions(+), 3 deletions(-) create mode 100644 hw/mem/cxl_accel.c diff --git a/MAINTAINERS b/MAINTAINERS index aaf0505a21..72a6a505eb 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2914,6 +2914,7 @@ R: Fan Ni <fan.ni@samsung.com> S: Supported F: hw/cxl/ F: hw/mem/cxl_type3.c +F: hw/mem/cxl_accel.c F: include/hw/cxl/ F: qapi/cxl.json diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst index 882b036f5e..13cc2417f2 100644 --- a/docs/system/devices/cxl.rst +++ b/docs/system/devices/cxl.rst @@ -332,6 +332,17 @@ The same volatile setup may optionally include an LSA region:: -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,lsa=cxl-lsa0,id=cxl-vmem0 \ -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G +A very simple setup with just one directly attached CXL Type 2 Volatile Memory +Accelerator device:: + + qemu-system-x86_64 -M q35,cxl=on -m 4G,maxmem=8G,slots=8 -smp 4 \ + ... + -object memory-backend-ram,id=vmem0,share=on,size=256M \ + -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ + -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \ + -device cxl-accel,bus=root_port13,volatile-memdev=vmem0,id=cxl-accel0 \ + -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G + A setup suitable for 4 way interleave. Only one fixed window provided, to enable 2 way interleave across 2 CXL host bridges. Each host bridge has 2 CXL Root Ports, with the CXL Type3 device directly attached (no switches).:: diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c index 355103d165..717ef117ac 100644 --- a/hw/cxl/cxl-component-utils.c +++ b/hw/cxl/cxl-component-utils.c @@ -262,6 +262,7 @@ static void hdm_init_common(uint32_t *reg_state, uint32_t *write_msk, write_msk[R_CXL_HDM_DECODER0_CTRL + i * hdm_inc] = 0x13ff; if (type == CXL2_DEVICE || type == CXL2_TYPE3_DEVICE || + type == CXL3_TYPE2_DEVICE || type == CXL2_LOGICAL_DEVICE) { write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_LO + i * hdm_inc] = 0xf0000000; @@ -293,6 +294,7 @@ void cxl_component_register_init_common(uint32_t *reg_state, case CXL2_UPSTREAM_PORT: case CXL2_TYPE3_DEVICE: case CXL2_LOGICAL_DEVICE: + case CXL3_TYPE2_DEVICE: /* + HDM */ caps = 3; break; diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c index e9f2543c43..e603a3f2fc 100644 --- a/hw/cxl/cxl-host.c +++ b/hw/cxl/cxl-host.c @@ -201,7 +201,8 @@ static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr) return NULL; } - if (object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) { + if (object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3) || + object_dynamic_cast(OBJECT(d), TYPE_CXL_ACCEL)) { return d; } @@ -256,7 +257,13 @@ static MemTxResult cxl_read_cfmws(void *opaque, hwaddr addr, uint64_t *data, return MEMTX_ERROR; } - return cxl_type3_read(d, addr + fw->base, data, size, attrs); + if (object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) { + return cxl_type3_read(d, addr + fw->base, data, size, attrs); + } else if (object_dynamic_cast(OBJECT(d), TYPE_CXL_ACCEL)) { + return cxl_accel_read(d, addr + fw->base, data, size, attrs); + } + + return MEMTX_ERROR; } static MemTxResult cxl_write_cfmws(void *opaque, hwaddr addr, @@ -272,7 +279,13 @@ static MemTxResult cxl_write_cfmws(void *opaque, hwaddr addr, return MEMTX_OK; } - return cxl_type3_write(d, addr + fw->base, data, size, attrs); + if (object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) { + return cxl_type3_write(d, addr + fw->base, data, size, attrs); + } else if (object_dynamic_cast(OBJECT(d), TYPE_CXL_ACCEL)) { + return cxl_accel_write(d, addr + fw->base, data, size, attrs); + } + + return MEMTX_ERROR; } const MemoryRegionOps cfmws_ops = { diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig index 73c5ae8ad9..1f7d08c17d 100644 --- a/hw/mem/Kconfig +++ b/hw/mem/Kconfig @@ -16,3 +16,8 @@ config CXL_MEM_DEVICE bool default y if CXL select MEM_DEVICE + +config CXL_ACCEL_DEVICE + bool + default y if CXL + select MEM_DEVICE diff --git a/hw/mem/cxl_accel.c b/hw/mem/cxl_accel.c new file mode 100644 index 0000000000..770072126d --- /dev/null +++ b/hw/mem/cxl_accel.c @@ -0,0 +1,319 @@ +/* + * CXL accel (type-2) device + * + * Copyright(C) 2024 NVIDIA Corporation. + * + * This work is licensed under the terms of the GNU GPL, version 2. See the + * COPYING file in the top-level directory. + * + * SPDX-License-Identifier: GPL-v2-only + */ + +#include "qemu/osdep.h" +#include "qemu/units.h" +#include "qemu/error-report.h" +#include "hw/mem/memory-device.h" +#include "hw/mem/pc-dimm.h" +#include "hw/pci/pci.h" +#include "hw/qdev-properties.h" +#include "hw/qdev-properties-system.h" +#include "qemu/log.h" +#include "qemu/module.h" +#include "qemu/range.h" +#include "sysemu/hostmem.h" +#include "sysemu/numa.h" +#include "hw/cxl/cxl.h" +#include "hw/pci/msix.h" + +static void update_dvsecs(CXLAccelDev *acceld) +{ + CXLComponentState *cxl_cstate = &acceld->cxl_cstate; + uint8_t *dvsec; + uint32_t range1_size_hi = 0, range1_size_lo = 0, + range1_base_hi = 0, range1_base_lo = 0; + + if (acceld->hostvmem) { + range1_size_hi = acceld->hostvmem->size >> 32; + range1_size_lo = (2 << 5) | (2 << 2) | 0x3 | + (acceld->hostvmem->size & 0xF0000000); + } + + dvsec = (uint8_t *)&(CXLDVSECDevice){ + .cap = 0x1e, + .ctrl = 0x2, + .status2 = 0x2, + .range1_size_hi = range1_size_hi, + .range1_size_lo = range1_size_lo, + .range1_base_hi = range1_base_hi, + .range1_base_lo = range1_base_lo, + }; + cxl_component_update_dvsec(cxl_cstate, PCIE_CXL_DEVICE_DVSEC_LENGTH, + PCIE_CXL_DEVICE_DVSEC, dvsec); + + dvsec = (uint8_t *)&(CXLDVSECRegisterLocator){ + .rsvd = 0, + .reg0_base_lo = RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX, + .reg0_base_hi = 0, + }; + cxl_component_update_dvsec(cxl_cstate, REG_LOC_DVSEC_LENGTH, + REG_LOC_DVSEC, dvsec); + + dvsec = (uint8_t *)&(CXLDVSECPortFlexBus){ + .cap = 0x26, /* 68B, IO, Mem, non-MLD */ + .ctrl = 0x02, /* IO always enabled */ + .status = 0x26, /* same as capabilities */ + .rcvd_mod_ts_data_phase1 = 0xef, /* WTF? */ + }; + cxl_component_update_dvsec(cxl_cstate, PCIE_CXL3_FLEXBUS_PORT_DVSEC_LENGTH, + PCIE_FLEXBUS_PORT_DVSEC, dvsec); +} + +static void build_dvsecs(CXLAccelDev *acceld) +{ + CXLComponentState *cxl_cstate = &acceld->cxl_cstate; + + cxl_component_create_dvsec(cxl_cstate, CXL3_TYPE2_DEVICE, + PCIE_CXL_DEVICE_DVSEC_LENGTH, + PCIE_CXL_DEVICE_DVSEC, + PCIE_CXL31_DEVICE_DVSEC_REVID, NULL); + + cxl_component_create_dvsec(cxl_cstate, CXL3_TYPE2_DEVICE, + REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC, + REG_LOC_DVSEC_REVID, NULL); + + cxl_component_create_dvsec(cxl_cstate, CXL3_TYPE2_DEVICE, + PCIE_CXL3_FLEXBUS_PORT_DVSEC_LENGTH, + PCIE_FLEXBUS_PORT_DVSEC, + PCIE_CXL3_FLEXBUS_PORT_DVSEC_REVID, NULL); + update_dvsecs(acceld); +} + +static bool cxl_accel_dpa(CXLAccelDev *acceld, hwaddr host_addr, uint64_t *dpa) +{ + return cxl_host_addr_to_dpa(&acceld->cxl_cstate, host_addr, dpa); +} + +static int cxl_accel_hpa_to_as_and_dpa(CXLAccelDev *acceld, + hwaddr host_addr, + unsigned int size, + AddressSpace **as, + uint64_t *dpa_offset) +{ + MemoryRegion *vmr = NULL; + uint64_t vmr_size = 0; + + if (!acceld->hostvmem) { + return -ENODEV; + } + + vmr = host_memory_backend_get_memory(acceld->hostvmem); + if (!vmr) { + return -ENODEV; + } + + vmr_size = memory_region_size(vmr); + + if (!cxl_accel_dpa(acceld, host_addr, dpa_offset)) { + return -EINVAL; + } + + if (*dpa_offset >= vmr_size) { + return -EINVAL; + } + + *as = &acceld->hostvmem_as; + return 0; +} + +MemTxResult cxl_accel_read(PCIDevice *d, hwaddr host_addr, uint64_t *data, + unsigned size, MemTxAttrs attrs) +{ + CXLAccelDev *acceld = CXL_ACCEL(d); + uint64_t dpa_offset = 0; + AddressSpace *as = NULL; + int res; + + res = cxl_accel_hpa_to_as_and_dpa(acceld, host_addr, size, + &as, &dpa_offset); + if (res) { + return MEMTX_ERROR; + } + + return address_space_read(as, dpa_offset, attrs, data, size); +} + +MemTxResult cxl_accel_write(PCIDevice *d, hwaddr host_addr, uint64_t data, + unsigned size, MemTxAttrs attrs) +{ + CXLAccelDev *acceld = CXL_ACCEL(d); + uint64_t dpa_offset = 0; + AddressSpace *as = NULL; + int res; + + res = cxl_accel_hpa_to_as_and_dpa(acceld, host_addr, size, + &as, &dpa_offset); + if (res) { + return MEMTX_ERROR; + } + + return address_space_write(as, dpa_offset, attrs, &data, size); +} + +static void clean_memory(PCIDevice *pci_dev) +{ + CXLAccelDev *acceld = CXL_ACCEL(pci_dev); + + if (acceld->hostvmem) { + address_space_destroy(&acceld->hostvmem_as); + } +} + +static bool setup_memory(PCIDevice *pci_dev, Error **errp) +{ + CXLAccelDev *acceld = CXL_ACCEL(pci_dev); + + if (acceld->hostvmem) { + MemoryRegion *vmr; + char *v_name; + + vmr = host_memory_backend_get_memory(acceld->hostvmem); + if (!vmr) { + error_setg(errp, "volatile memdev must have backing device"); + return false; + } + if (host_memory_backend_is_mapped(acceld->hostvmem)) { + error_setg(errp, "memory backend %s can't be used multiple times.", + object_get_canonical_path_component(OBJECT(acceld->hostvmem))); + return false; + } + memory_region_set_nonvolatile(vmr, false); + memory_region_set_enabled(vmr, true); + host_memory_backend_set_mapped(acceld->hostvmem, true); + v_name = g_strdup("cxl-accel-dpa-vmem-space"); + address_space_init(&acceld->hostvmem_as, vmr, v_name); + g_free(v_name); + } + return true; +} + +static void setup_cxl_regs(PCIDevice *pci_dev) +{ + CXLAccelDev *acceld = CXL_ACCEL(pci_dev); + CXLComponentState *cxl_cstate = &acceld->cxl_cstate; + ComponentRegisters *regs = &cxl_cstate->crb; + MemoryRegion *mr = ®s->component_registers; + + cxl_cstate->dvsec_offset = 0x100; + cxl_cstate->pdev = pci_dev; + + build_dvsecs(acceld); + + cxl_component_register_block_init(OBJECT(pci_dev), cxl_cstate, + TYPE_CXL_ACCEL); + + pci_register_bar( + pci_dev, CXL_COMPONENT_REG_BAR_IDX, + PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64, mr); +} + +#define MSIX_NUM 6 + +static int setup_msix(PCIDevice *pci_dev) +{ + int i, rc; + + /* MSI(-X) Initialization */ + rc = msix_init_exclusive_bar(pci_dev, MSIX_NUM, 4, NULL); + if (rc) { + return rc; + } + + for (i = 0; i < MSIX_NUM; i++) { + msix_vector_use(pci_dev, i); + } + return 0; +} + +static void cxl_accel_realize(PCIDevice *pci_dev, Error **errp) +{ + ERRP_GUARD(); + int rc; + uint8_t *pci_conf = pci_dev->config; + + if (!setup_memory(pci_dev, errp)) { + return; + } + + pci_config_set_prog_interface(pci_conf, 0x10); + pcie_endpoint_cap_init(pci_dev, 0x80); + + setup_cxl_regs(pci_dev); + + /* MSI(-X) Initialization */ + rc = setup_msix(pci_dev); + if (rc) { + clean_memory(pci_dev); + return; + } +} + +static void cxl_accel_exit(PCIDevice *pci_dev) +{ + clean_memory(pci_dev); +} + +static void cxl_accel_reset(DeviceState *dev) +{ + CXLAccelDev *acceld = CXL_ACCEL(dev); + CXLComponentState *cxl_cstate = &acceld->cxl_cstate; + uint32_t *reg_state = cxl_cstate->crb.cache_mem_registers; + uint32_t *write_msk = cxl_cstate->crb.cache_mem_regs_write_mask; + + update_dvsecs(acceld); + cxl_component_register_init_common(reg_state, write_msk, CXL3_TYPE2_DEVICE); +} + +static Property cxl_accel_props[] = { + DEFINE_PROP_LINK("volatile-memdev", CXLAccelDev, hostvmem, + TYPE_MEMORY_BACKEND, HostMemoryBackend *), + DEFINE_PROP_END_OF_LIST(), +}; + +static void cxl_accel_class_init(ObjectClass *oc, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(oc); + PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc); + + pc->realize = cxl_accel_realize; + pc->exit = cxl_accel_exit; + + pc->class_id = PCI_CLASS_CXL_QEMU_ACCEL; + pc->vendor_id = PCI_VENDOR_ID_INTEL; + pc->device_id = 0xd94; + pc->revision = 1; + + set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); + dc->desc = "CXL Accelerator Device (Type 2)"; + device_class_set_legacy_reset(dc, cxl_accel_reset); + device_class_set_props(dc, cxl_accel_props); +} + +static const TypeInfo cxl_accel_dev_info = { + .name = TYPE_CXL_ACCEL, + .parent = TYPE_PCI_DEVICE, + .class_size = sizeof(struct CXLAccelClass), + .class_init = cxl_accel_class_init, + .instance_size = sizeof(CXLAccelDev), + .interfaces = (InterfaceInfo[]) { + { INTERFACE_CXL_DEVICE }, + { INTERFACE_PCIE_DEVICE }, + {} + }, +}; + +static void cxl_accel_dev_registers(void) +{ + type_register_static(&cxl_accel_dev_info); +} + +type_init(cxl_accel_dev_registers); diff --git a/hw/mem/meson.build b/hw/mem/meson.build index 1c1c6da24b..36a395dbb6 100644 --- a/hw/mem/meson.build +++ b/hw/mem/meson.build @@ -4,6 +4,7 @@ mem_ss.add(when: 'CONFIG_DIMM', if_true: files('pc-dimm.c')) mem_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_mc.c')) mem_ss.add(when: 'CONFIG_NVDIMM', if_true: files('nvdimm.c')) mem_ss.add(when: 'CONFIG_CXL_MEM_DEVICE', if_true: files('cxl_type3.c')) +mem_ss.add(when: 'CONFIG_CXL_ACCEL_DEVICE', if_true: files('cxl_accel.c')) system_ss.add(when: 'CONFIG_CXL_MEM_DEVICE', if_false: files('cxl_type3_stubs.c')) system_ss.add(when: 'CONFIG_MEM_DEVICE', if_false: files('memory-device-stubs.c')) diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h index 30fe4bfa24..0e78db26b8 100644 --- a/include/hw/cxl/cxl_component.h +++ b/include/hw/cxl/cxl_component.h @@ -29,6 +29,7 @@ enum reg_type { CXL2_UPSTREAM_PORT, CXL2_DOWNSTREAM_PORT, CXL3_SWITCH_MAILBOX_CCI, + CXL3_TYPE2_DEVICE, }; /* diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h index 561b375dc8..ac26b264da 100644 --- a/include/hw/cxl/cxl_device.h +++ b/include/hw/cxl/cxl_device.h @@ -630,6 +630,26 @@ struct CSWMBCCIDev { CXLCCI *cci; }; +struct CXLAccelDev { + /* Private */ + PCIDevice parent_obj; + + /* Properties */ + HostMemoryBackend *hostvmem; + + /* State */ + AddressSpace hostvmem_as; + CXLComponentState cxl_cstate; +}; + +struct CXLAccelClass { + /* Private */ + PCIDeviceClass parent_class; +}; + +#define TYPE_CXL_ACCEL "cxl-accel" +OBJECT_DECLARE_TYPE(CXLAccelDev, CXLAccelClass, CXL_ACCEL) + #define TYPE_CXL_SWITCH_MAILBOX_CCI "cxl-switch-mailbox-cci" OBJECT_DECLARE_TYPE(CSWMBCCIDev, CSWMBCCIClass, CXL_SWITCH_MAILBOX_CCI) @@ -638,6 +658,11 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data, MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data, unsigned size, MemTxAttrs attrs); +MemTxResult cxl_accel_read(PCIDevice *d, hwaddr host_addr, uint64_t *data, + unsigned size, MemTxAttrs attrs); +MemTxResult cxl_accel_write(PCIDevice *d, hwaddr host_addr, uint64_t data, + unsigned size, MemTxAttrs attrs); + uint64_t cxl_device_get_timestamp(CXLDeviceState *cxlds); void cxl_event_init(CXLDeviceState *cxlds, int start_msg_num); diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h index f1a53fea8d..08bc469316 100644 --- a/include/hw/pci/pci_ids.h +++ b/include/hw/pci/pci_ids.h @@ -55,6 +55,7 @@ #define PCI_CLASS_MEMORY_RAM 0x0500 #define PCI_CLASS_MEMORY_FLASH 0x0501 #define PCI_CLASS_MEMORY_CXL 0x0502 +#define PCI_CLASS_CXL_QEMU_ACCEL 0x0503 #define PCI_CLASS_MEMORY_OTHER 0x0580 #define PCI_BASE_CLASS_BRIDGE 0x06 -- 2.43.5 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] hw/cxl: introduce CXL type-2 device emulation 2024-12-12 13:04 ` [PATCH 3/3] hw/cxl: introduce CXL type-2 device emulation Zhi Wang @ 2024-12-12 17:02 ` Alejandro Lucero Palau 2024-12-12 18:33 ` Zhi Wang 2025-01-21 16:16 ` Jonathan Cameron via 1 sibling, 1 reply; 15+ messages in thread From: Alejandro Lucero Palau @ 2024-12-12 17:02 UTC (permalink / raw) To: Zhi Wang, qemu-devel Cc: dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny, fan.ni, alex.williamson, clg, acurrid, cjia, smitra, ankita, aniketa, kwankhede, targupta, zhiwang On 12/12/24 13:04, Zhi Wang wrote: > From: Zhi Wang <zhiwang@kernel.org> > > Introduce a CXL type-2 device emulation that provides a minimum base for > testing kernel CXL core type-2 support and CXL type-2 virtualization. It > is also a good base for introducing the more emulated features. > > Currently, it only supports: > > - Emulating component registers with HDM decoders. > - Volatile memory backend and emualtion of region access. > > The emulation is aimed to not tightly coupled with the current CXL type-3 > emulation since many advanced CXL type-3 emulation features are not > implemented in a CXL type-2 device. > > Co-developed-by: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: Zhi Wang <zhiwang@kernel.org> > --- > MAINTAINERS | 1 + > docs/system/devices/cxl.rst | 11 ++ > hw/cxl/cxl-component-utils.c | 2 + > hw/cxl/cxl-host.c | 19 +- > hw/mem/Kconfig | 5 + > hw/mem/cxl_accel.c | 319 +++++++++++++++++++++++++++++++++ > hw/mem/meson.build | 1 + > include/hw/cxl/cxl_component.h | 1 + > include/hw/cxl/cxl_device.h | 25 +++ > include/hw/pci/pci_ids.h | 1 + > 10 files changed, 382 insertions(+), 3 deletions(-) > create mode 100644 hw/mem/cxl_accel.c > > diff --git a/MAINTAINERS b/MAINTAINERS > index aaf0505a21..72a6a505eb 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -2914,6 +2914,7 @@ R: Fan Ni <fan.ni@samsung.com> > S: Supported > F: hw/cxl/ > F: hw/mem/cxl_type3.c > +F: hw/mem/cxl_accel.c > F: include/hw/cxl/ > F: qapi/cxl.json > > diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst > index 882b036f5e..13cc2417f2 100644 > --- a/docs/system/devices/cxl.rst > +++ b/docs/system/devices/cxl.rst > @@ -332,6 +332,17 @@ The same volatile setup may optionally include an LSA region:: > -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,lsa=cxl-lsa0,id=cxl-vmem0 \ > -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G > > +A very simple setup with just one directly attached CXL Type 2 Volatile Memory > +Accelerator device:: > + > + qemu-system-x86_64 -M q35,cxl=on -m 4G,maxmem=8G,slots=8 -smp 4 \ > + ... > + -object memory-backend-ram,id=vmem0,share=on,size=256M \ > + -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ > + -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \ > + -device cxl-accel,bus=root_port13,volatile-memdev=vmem0,id=cxl-accel0 \ > + -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G > + > A setup suitable for 4 way interleave. Only one fixed window provided, to enable 2 way > interleave across 2 CXL host bridges. Each host bridge has 2 CXL Root Ports, with > the CXL Type3 device directly attached (no switches).:: > diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c > index 355103d165..717ef117ac 100644 > --- a/hw/cxl/cxl-component-utils.c > +++ b/hw/cxl/cxl-component-utils.c > @@ -262,6 +262,7 @@ static void hdm_init_common(uint32_t *reg_state, uint32_t *write_msk, > write_msk[R_CXL_HDM_DECODER0_CTRL + i * hdm_inc] = 0x13ff; You are not changing this write, but I did, based on Type3 or Type2: - write_msk[R_CXL_HDM_DECODER0_CTRL + i * 0x20] = 0x13ff; + if (type == CXL2_TYPE2_DEVICE) + /* Bit 12 Target Range Type 0= HDM-D or HDM-DB */ + /* Bit 10 says memory already commited */ + write_msk[R_CXL_HDM_DECODER0_CTRL + i * 0x20] = 0x7ff; + else + /* Bit 12 Target Range Type 1= HDM-H aka Host Only Coherent Address Range */ + write_msk[R_CXL_HDM_DECODER0_CTRL + i * 0x20] = 0x13ff; It has been a while since I did work on this, but I guess I did so because it was needed. But maybe I'm wrong ... Bit 10 was something I needed for emulating what we had in the real device, but bit 12 looks something we should set, although maybe it is only informative. > if (type == CXL2_DEVICE || > type == CXL2_TYPE3_DEVICE || > + type == CXL3_TYPE2_DEVICE || > type == CXL2_LOGICAL_DEVICE) { > write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_LO + i * hdm_inc] = > 0xf0000000; > @@ -293,6 +294,7 @@ void cxl_component_register_init_common(uint32_t *reg_state, > case CXL2_UPSTREAM_PORT: > case CXL2_TYPE3_DEVICE: > case CXL2_LOGICAL_DEVICE: > + case CXL3_TYPE2_DEVICE: > /* + HDM */ > caps = 3; > break; > diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c > index e9f2543c43..e603a3f2fc 100644 > --- a/hw/cxl/cxl-host.c > +++ b/hw/cxl/cxl-host.c > @@ -201,7 +201,8 @@ static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr) > return NULL; > } > > - if (object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) { > + if (object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3) || > + object_dynamic_cast(OBJECT(d), TYPE_CXL_ACCEL)) { > return d; > } > > @@ -256,7 +257,13 @@ static MemTxResult cxl_read_cfmws(void *opaque, hwaddr addr, uint64_t *data, > return MEMTX_ERROR; > } > > - return cxl_type3_read(d, addr + fw->base, data, size, attrs); > + if (object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) { > + return cxl_type3_read(d, addr + fw->base, data, size, attrs); > + } else if (object_dynamic_cast(OBJECT(d), TYPE_CXL_ACCEL)) { > + return cxl_accel_read(d, addr + fw->base, data, size, attrs); > + } > + > + return MEMTX_ERROR; > } > > static MemTxResult cxl_write_cfmws(void *opaque, hwaddr addr, > @@ -272,7 +279,13 @@ static MemTxResult cxl_write_cfmws(void *opaque, hwaddr addr, > return MEMTX_OK; > } > > - return cxl_type3_write(d, addr + fw->base, data, size, attrs); > + if (object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) { > + return cxl_type3_write(d, addr + fw->base, data, size, attrs); > + } else if (object_dynamic_cast(OBJECT(d), TYPE_CXL_ACCEL)) { > + return cxl_accel_write(d, addr + fw->base, data, size, attrs); > + } > + > + return MEMTX_ERROR; > } > > const MemoryRegionOps cfmws_ops = { > diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig > index 73c5ae8ad9..1f7d08c17d 100644 > --- a/hw/mem/Kconfig > +++ b/hw/mem/Kconfig > @@ -16,3 +16,8 @@ config CXL_MEM_DEVICE > bool > default y if CXL > select MEM_DEVICE > + > +config CXL_ACCEL_DEVICE > + bool > + default y if CXL > + select MEM_DEVICE > diff --git a/hw/mem/cxl_accel.c b/hw/mem/cxl_accel.c > new file mode 100644 > index 0000000000..770072126d > --- /dev/null > +++ b/hw/mem/cxl_accel.c > @@ -0,0 +1,319 @@ > +/* > + * CXL accel (type-2) device > + * > + * Copyright(C) 2024 NVIDIA Corporation. > + * > + * This work is licensed under the terms of the GNU GPL, version 2. See the > + * COPYING file in the top-level directory. > + * > + * SPDX-License-Identifier: GPL-v2-only > + */ > + > +#include "qemu/osdep.h" > +#include "qemu/units.h" > +#include "qemu/error-report.h" > +#include "hw/mem/memory-device.h" > +#include "hw/mem/pc-dimm.h" > +#include "hw/pci/pci.h" > +#include "hw/qdev-properties.h" > +#include "hw/qdev-properties-system.h" > +#include "qemu/log.h" > +#include "qemu/module.h" > +#include "qemu/range.h" > +#include "sysemu/hostmem.h" > +#include "sysemu/numa.h" > +#include "hw/cxl/cxl.h" > +#include "hw/pci/msix.h" > + > +static void update_dvsecs(CXLAccelDev *acceld) > +{ > + CXLComponentState *cxl_cstate = &acceld->cxl_cstate; > + uint8_t *dvsec; > + uint32_t range1_size_hi = 0, range1_size_lo = 0, > + range1_base_hi = 0, range1_base_lo = 0; > + > + if (acceld->hostvmem) { > + range1_size_hi = acceld->hostvmem->size >> 32; > + range1_size_lo = (2 << 5) | (2 << 2) | 0x3 | > + (acceld->hostvmem->size & 0xF0000000); > + } > + > + dvsec = (uint8_t *)&(CXLDVSECDevice){ > + .cap = 0x1e, > + .ctrl = 0x2, > + .status2 = 0x2, > + .range1_size_hi = range1_size_hi, > + .range1_size_lo = range1_size_lo, > + .range1_base_hi = range1_base_hi, > + .range1_base_lo = range1_base_lo, > + }; > + cxl_component_update_dvsec(cxl_cstate, PCIE_CXL_DEVICE_DVSEC_LENGTH, > + PCIE_CXL_DEVICE_DVSEC, dvsec); > + > + dvsec = (uint8_t *)&(CXLDVSECRegisterLocator){ > + .rsvd = 0, > + .reg0_base_lo = RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX, > + .reg0_base_hi = 0, > + }; > + cxl_component_update_dvsec(cxl_cstate, REG_LOC_DVSEC_LENGTH, > + REG_LOC_DVSEC, dvsec); > + > + dvsec = (uint8_t *)&(CXLDVSECPortFlexBus){ > + .cap = 0x26, /* 68B, IO, Mem, non-MLD */ > + .ctrl = 0x02, /* IO always enabled */ > + .status = 0x26, /* same as capabilities */ > + .rcvd_mod_ts_data_phase1 = 0xef, /* WTF? */ > + }; > + cxl_component_update_dvsec(cxl_cstate, PCIE_CXL3_FLEXBUS_PORT_DVSEC_LENGTH, > + PCIE_FLEXBUS_PORT_DVSEC, dvsec); > +} > + > +static void build_dvsecs(CXLAccelDev *acceld) > +{ > + CXLComponentState *cxl_cstate = &acceld->cxl_cstate; > + > + cxl_component_create_dvsec(cxl_cstate, CXL3_TYPE2_DEVICE, > + PCIE_CXL_DEVICE_DVSEC_LENGTH, > + PCIE_CXL_DEVICE_DVSEC, > + PCIE_CXL31_DEVICE_DVSEC_REVID, NULL); > + > + cxl_component_create_dvsec(cxl_cstate, CXL3_TYPE2_DEVICE, > + REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC, > + REG_LOC_DVSEC_REVID, NULL); > + > + cxl_component_create_dvsec(cxl_cstate, CXL3_TYPE2_DEVICE, > + PCIE_CXL3_FLEXBUS_PORT_DVSEC_LENGTH, > + PCIE_FLEXBUS_PORT_DVSEC, > + PCIE_CXL3_FLEXBUS_PORT_DVSEC_REVID, NULL); > + update_dvsecs(acceld); > +} > + > +static bool cxl_accel_dpa(CXLAccelDev *acceld, hwaddr host_addr, uint64_t *dpa) > +{ > + return cxl_host_addr_to_dpa(&acceld->cxl_cstate, host_addr, dpa); > +} > + > +static int cxl_accel_hpa_to_as_and_dpa(CXLAccelDev *acceld, > + hwaddr host_addr, > + unsigned int size, > + AddressSpace **as, > + uint64_t *dpa_offset) > +{ > + MemoryRegion *vmr = NULL; > + uint64_t vmr_size = 0; > + > + if (!acceld->hostvmem) { > + return -ENODEV; > + } > + > + vmr = host_memory_backend_get_memory(acceld->hostvmem); > + if (!vmr) { > + return -ENODEV; > + } > + > + vmr_size = memory_region_size(vmr); > + > + if (!cxl_accel_dpa(acceld, host_addr, dpa_offset)) { > + return -EINVAL; > + } > + > + if (*dpa_offset >= vmr_size) { > + return -EINVAL; > + } > + > + *as = &acceld->hostvmem_as; > + return 0; > +} > + > +MemTxResult cxl_accel_read(PCIDevice *d, hwaddr host_addr, uint64_t *data, > + unsigned size, MemTxAttrs attrs) > +{ > + CXLAccelDev *acceld = CXL_ACCEL(d); > + uint64_t dpa_offset = 0; > + AddressSpace *as = NULL; > + int res; > + > + res = cxl_accel_hpa_to_as_and_dpa(acceld, host_addr, size, > + &as, &dpa_offset); > + if (res) { > + return MEMTX_ERROR; > + } > + > + return address_space_read(as, dpa_offset, attrs, data, size); > +} > + > +MemTxResult cxl_accel_write(PCIDevice *d, hwaddr host_addr, uint64_t data, > + unsigned size, MemTxAttrs attrs) > +{ > + CXLAccelDev *acceld = CXL_ACCEL(d); > + uint64_t dpa_offset = 0; > + AddressSpace *as = NULL; > + int res; > + > + res = cxl_accel_hpa_to_as_and_dpa(acceld, host_addr, size, > + &as, &dpa_offset); > + if (res) { > + return MEMTX_ERROR; > + } > + > + return address_space_write(as, dpa_offset, attrs, &data, size); > +} > + > +static void clean_memory(PCIDevice *pci_dev) > +{ > + CXLAccelDev *acceld = CXL_ACCEL(pci_dev); > + > + if (acceld->hostvmem) { > + address_space_destroy(&acceld->hostvmem_as); > + } > +} > + > +static bool setup_memory(PCIDevice *pci_dev, Error **errp) > +{ > + CXLAccelDev *acceld = CXL_ACCEL(pci_dev); > + > + if (acceld->hostvmem) { > + MemoryRegion *vmr; > + char *v_name; > + > + vmr = host_memory_backend_get_memory(acceld->hostvmem); > + if (!vmr) { > + error_setg(errp, "volatile memdev must have backing device"); > + return false; > + } > + if (host_memory_backend_is_mapped(acceld->hostvmem)) { > + error_setg(errp, "memory backend %s can't be used multiple times.", > + object_get_canonical_path_component(OBJECT(acceld->hostvmem))); > + return false; > + } > + memory_region_set_nonvolatile(vmr, false); > + memory_region_set_enabled(vmr, true); > + host_memory_backend_set_mapped(acceld->hostvmem, true); > + v_name = g_strdup("cxl-accel-dpa-vmem-space"); > + address_space_init(&acceld->hostvmem_as, vmr, v_name); > + g_free(v_name); > + } > + return true; > +} > + > +static void setup_cxl_regs(PCIDevice *pci_dev) > +{ > + CXLAccelDev *acceld = CXL_ACCEL(pci_dev); > + CXLComponentState *cxl_cstate = &acceld->cxl_cstate; > + ComponentRegisters *regs = &cxl_cstate->crb; > + MemoryRegion *mr = ®s->component_registers; > + > + cxl_cstate->dvsec_offset = 0x100; > + cxl_cstate->pdev = pci_dev; > + > + build_dvsecs(acceld); > + > + cxl_component_register_block_init(OBJECT(pci_dev), cxl_cstate, > + TYPE_CXL_ACCEL); > + > + pci_register_bar( > + pci_dev, CXL_COMPONENT_REG_BAR_IDX, > + PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64, mr); > +} > + > +#define MSIX_NUM 6 > + > +static int setup_msix(PCIDevice *pci_dev) > +{ > + int i, rc; > + > + /* MSI(-X) Initialization */ > + rc = msix_init_exclusive_bar(pci_dev, MSIX_NUM, 4, NULL); > + if (rc) { > + return rc; > + } > + > + for (i = 0; i < MSIX_NUM; i++) { > + msix_vector_use(pci_dev, i); > + } > + return 0; > +} > + > +static void cxl_accel_realize(PCIDevice *pci_dev, Error **errp) > +{ > + ERRP_GUARD(); > + int rc; > + uint8_t *pci_conf = pci_dev->config; > + > + if (!setup_memory(pci_dev, errp)) { > + return; > + } > + > + pci_config_set_prog_interface(pci_conf, 0x10); > + pcie_endpoint_cap_init(pci_dev, 0x80); > + > + setup_cxl_regs(pci_dev); > + > + /* MSI(-X) Initialization */ > + rc = setup_msix(pci_dev); > + if (rc) { > + clean_memory(pci_dev); > + return; > + } > +} > + > +static void cxl_accel_exit(PCIDevice *pci_dev) > +{ > + clean_memory(pci_dev); > +} > + > +static void cxl_accel_reset(DeviceState *dev) > +{ > + CXLAccelDev *acceld = CXL_ACCEL(dev); > + CXLComponentState *cxl_cstate = &acceld->cxl_cstate; > + uint32_t *reg_state = cxl_cstate->crb.cache_mem_registers; > + uint32_t *write_msk = cxl_cstate->crb.cache_mem_regs_write_mask; > + > + update_dvsecs(acceld); > + cxl_component_register_init_common(reg_state, write_msk, CXL3_TYPE2_DEVICE); > +} > + > +static Property cxl_accel_props[] = { > + DEFINE_PROP_LINK("volatile-memdev", CXLAccelDev, hostvmem, > + TYPE_MEMORY_BACKEND, HostMemoryBackend *), > + DEFINE_PROP_END_OF_LIST(), > +}; > + > +static void cxl_accel_class_init(ObjectClass *oc, void *data) > +{ > + DeviceClass *dc = DEVICE_CLASS(oc); > + PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc); > + > + pc->realize = cxl_accel_realize; > + pc->exit = cxl_accel_exit; > + > + pc->class_id = PCI_CLASS_CXL_QEMU_ACCEL; > + pc->vendor_id = PCI_VENDOR_ID_INTEL; > + pc->device_id = 0xd94; > + pc->revision = 1; > + > + set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); > + dc->desc = "CXL Accelerator Device (Type 2)"; > + device_class_set_legacy_reset(dc, cxl_accel_reset); > + device_class_set_props(dc, cxl_accel_props); > +} > + > +static const TypeInfo cxl_accel_dev_info = { > + .name = TYPE_CXL_ACCEL, > + .parent = TYPE_PCI_DEVICE, > + .class_size = sizeof(struct CXLAccelClass), > + .class_init = cxl_accel_class_init, > + .instance_size = sizeof(CXLAccelDev), > + .interfaces = (InterfaceInfo[]) { > + { INTERFACE_CXL_DEVICE }, > + { INTERFACE_PCIE_DEVICE }, > + {} > + }, > +}; > + > +static void cxl_accel_dev_registers(void) > +{ > + type_register_static(&cxl_accel_dev_info); > +} > + > +type_init(cxl_accel_dev_registers); > diff --git a/hw/mem/meson.build b/hw/mem/meson.build > index 1c1c6da24b..36a395dbb6 100644 > --- a/hw/mem/meson.build > +++ b/hw/mem/meson.build > @@ -4,6 +4,7 @@ mem_ss.add(when: 'CONFIG_DIMM', if_true: files('pc-dimm.c')) > mem_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_mc.c')) > mem_ss.add(when: 'CONFIG_NVDIMM', if_true: files('nvdimm.c')) > mem_ss.add(when: 'CONFIG_CXL_MEM_DEVICE', if_true: files('cxl_type3.c')) > +mem_ss.add(when: 'CONFIG_CXL_ACCEL_DEVICE', if_true: files('cxl_accel.c')) > system_ss.add(when: 'CONFIG_CXL_MEM_DEVICE', if_false: files('cxl_type3_stubs.c')) > > system_ss.add(when: 'CONFIG_MEM_DEVICE', if_false: files('memory-device-stubs.c')) > diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h > index 30fe4bfa24..0e78db26b8 100644 > --- a/include/hw/cxl/cxl_component.h > +++ b/include/hw/cxl/cxl_component.h > @@ -29,6 +29,7 @@ enum reg_type { > CXL2_UPSTREAM_PORT, > CXL2_DOWNSTREAM_PORT, > CXL3_SWITCH_MAILBOX_CCI, > + CXL3_TYPE2_DEVICE, > }; > > /* > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h > index 561b375dc8..ac26b264da 100644 > --- a/include/hw/cxl/cxl_device.h > +++ b/include/hw/cxl/cxl_device.h > @@ -630,6 +630,26 @@ struct CSWMBCCIDev { > CXLCCI *cci; > }; > > +struct CXLAccelDev { > + /* Private */ > + PCIDevice parent_obj; > + > + /* Properties */ > + HostMemoryBackend *hostvmem; > + > + /* State */ > + AddressSpace hostvmem_as; > + CXLComponentState cxl_cstate; > +}; > + > +struct CXLAccelClass { > + /* Private */ > + PCIDeviceClass parent_class; > +}; > + > +#define TYPE_CXL_ACCEL "cxl-accel" > +OBJECT_DECLARE_TYPE(CXLAccelDev, CXLAccelClass, CXL_ACCEL) > + > #define TYPE_CXL_SWITCH_MAILBOX_CCI "cxl-switch-mailbox-cci" > OBJECT_DECLARE_TYPE(CSWMBCCIDev, CSWMBCCIClass, CXL_SWITCH_MAILBOX_CCI) > > @@ -638,6 +658,11 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data, > MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data, > unsigned size, MemTxAttrs attrs); > > +MemTxResult cxl_accel_read(PCIDevice *d, hwaddr host_addr, uint64_t *data, > + unsigned size, MemTxAttrs attrs); > +MemTxResult cxl_accel_write(PCIDevice *d, hwaddr host_addr, uint64_t data, > + unsigned size, MemTxAttrs attrs); > + > uint64_t cxl_device_get_timestamp(CXLDeviceState *cxlds); > > void cxl_event_init(CXLDeviceState *cxlds, int start_msg_num); > diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h > index f1a53fea8d..08bc469316 100644 > --- a/include/hw/pci/pci_ids.h > +++ b/include/hw/pci/pci_ids.h > @@ -55,6 +55,7 @@ > #define PCI_CLASS_MEMORY_RAM 0x0500 > #define PCI_CLASS_MEMORY_FLASH 0x0501 > #define PCI_CLASS_MEMORY_CXL 0x0502 > +#define PCI_CLASS_CXL_QEMU_ACCEL 0x0503 > #define PCI_CLASS_MEMORY_OTHER 0x0580 > > #define PCI_BASE_CLASS_BRIDGE 0x06 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] hw/cxl: introduce CXL type-2 device emulation 2024-12-12 17:02 ` Alejandro Lucero Palau @ 2024-12-12 18:33 ` Zhi Wang 0 siblings, 0 replies; 15+ messages in thread From: Zhi Wang @ 2024-12-12 18:33 UTC (permalink / raw) To: Alejandro Lucero Palau, qemu-devel@nongnu.org Cc: dan.j.williams@intel.com, dave.jiang@intel.com, jonathan.cameron@huawei.com, ira.weiny@intel.com, fan.ni@samsung.com, alex.williamson@redhat.com, clg@redhat.com, Andy Currid, Neo Jia, Surath Mitra, Ankit Agrawal, Aniket Agashe, Kirti Wankhede, Tarun Gupta (SW-GPU), zhiwang@kernel.org On 12/12/2024 19.02, Alejandro Lucero Palau wrote: > > On 12/12/24 13:04, Zhi Wang wrote: >> From: Zhi Wang <zhiwang@kernel.org> >> >> Introduce a CXL type-2 device emulation that provides a minimum base for >> testing kernel CXL core type-2 support and CXL type-2 virtualization. It >> is also a good base for introducing the more emulated features. >> >> Currently, it only supports: >> >> - Emulating component registers with HDM decoders. >> - Volatile memory backend and emualtion of region access. >> >> The emulation is aimed to not tightly coupled with the current CXL type-3 >> emulation since many advanced CXL type-3 emulation features are not >> implemented in a CXL type-2 device. >> >> Co-developed-by: Ira Weiny <ira.weiny@intel.com> >> Signed-off-by: Zhi Wang <zhiwang@kernel.org> >> --- >> MAINTAINERS | 1 + >> docs/system/devices/cxl.rst | 11 ++ >> hw/cxl/cxl-component-utils.c | 2 + >> hw/cxl/cxl-host.c | 19 +- >> hw/mem/Kconfig | 5 + >> hw/mem/cxl_accel.c | 319 +++++++++++++++++++++++++++++++++ >> hw/mem/meson.build | 1 + >> include/hw/cxl/cxl_component.h | 1 + >> include/hw/cxl/cxl_device.h | 25 +++ >> include/hw/pci/pci_ids.h | 1 + >> 10 files changed, 382 insertions(+), 3 deletions(-) >> create mode 100644 hw/mem/cxl_accel.c >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index aaf0505a21..72a6a505eb 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -2914,6 +2914,7 @@ R: Fan Ni <fan.ni@samsung.com> >> S: Supported >> F: hw/cxl/ >> F: hw/mem/cxl_type3.c >> +F: hw/mem/cxl_accel.c >> F: include/hw/cxl/ >> F: qapi/cxl.json >> diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst >> index 882b036f5e..13cc2417f2 100644 >> --- a/docs/system/devices/cxl.rst >> +++ b/docs/system/devices/cxl.rst >> @@ -332,6 +332,17 @@ The same volatile setup may optionally include an >> LSA region:: >> -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,lsa=cxl- >> lsa0,id=cxl-vmem0 \ >> -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G >> +A very simple setup with just one directly attached CXL Type 2 >> Volatile Memory >> +Accelerator device:: >> + >> + qemu-system-x86_64 -M q35,cxl=on -m 4G,maxmem=8G,slots=8 -smp 4 \ >> + ... >> + -object memory-backend-ram,id=vmem0,share=on,size=256M \ >> + -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ >> + -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \ >> + -device cxl-accel,bus=root_port13,volatile-memdev=vmem0,id=cxl- >> accel0 \ >> + -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G >> + >> A setup suitable for 4 way interleave. Only one fixed window >> provided, to enable 2 way >> interleave across 2 CXL host bridges. Each host bridge has 2 CXL >> Root Ports, with >> the CXL Type3 device directly attached (no switches).:: >> diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c >> index 355103d165..717ef117ac 100644 >> --- a/hw/cxl/cxl-component-utils.c >> +++ b/hw/cxl/cxl-component-utils.c >> @@ -262,6 +262,7 @@ static void hdm_init_common(uint32_t *reg_state, >> uint32_t *write_msk, >> write_msk[R_CXL_HDM_DECODER0_CTRL + i * hdm_inc] = 0x13ff; > > > You are not changing this write, but I did, based on Type3 or Type2: > > > - write_msk[R_CXL_HDM_DECODER0_CTRL + i * 0x20] = 0x13ff; > + if (type == CXL2_TYPE2_DEVICE) > + /* Bit 12 Target Range Type 0= HDM-D or HDM-DB */ > + /* Bit 10 says memory already commited */ > + write_msk[R_CXL_HDM_DECODER0_CTRL + i * 0x20] = 0x7ff; > + else > + /* Bit 12 Target Range Type 1= HDM-H aka Host Only > Coherent Address Range */ > + write_msk[R_CXL_HDM_DECODER0_CTRL + i * 0x20] = 0x13ff; > > > It has been a while since I did work on this, but I guess I did so > because it was needed. But maybe I'm wrong ... > > Bit 10 was something I needed for emulating what we had in the real > device, but bit 12 looks something we should set, although maybe it is > only informative. Interesting. We can think about how to custom the mask via params. It might be helpful that you can show a list of the stuff you wish to custom. I understand they are hacks for validation? since Bit 12 is only RWL for HB, USP of switches in the spec. > > >> if (type == CXL2_DEVICE || >> type == CXL2_TYPE3_DEVICE || >> + type == CXL3_TYPE2_DEVICE || >> type == CXL2_LOGICAL_DEVICE) { >> write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_LO + i * >> hdm_inc] = >> 0xf0000000; >> @@ -293,6 +294,7 @@ void cxl_component_register_init_common(uint32_t >> *reg_state, >> case CXL2_UPSTREAM_PORT: >> case CXL2_TYPE3_DEVICE: >> case CXL2_LOGICAL_DEVICE: >> + case CXL3_TYPE2_DEVICE: >> /* + HDM */ >> caps = 3; >> break; >> diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c >> index e9f2543c43..e603a3f2fc 100644 >> --- a/hw/cxl/cxl-host.c >> +++ b/hw/cxl/cxl-host.c >> @@ -201,7 +201,8 @@ static PCIDevice >> *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr) >> return NULL; >> } >> - if (object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) { >> + if (object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3) || >> + object_dynamic_cast(OBJECT(d), TYPE_CXL_ACCEL)) { >> return d; >> } >> @@ -256,7 +257,13 @@ static MemTxResult cxl_read_cfmws(void *opaque, >> hwaddr addr, uint64_t *data, >> return MEMTX_ERROR; >> } >> - return cxl_type3_read(d, addr + fw->base, data, size, attrs); >> + if (object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) { >> + return cxl_type3_read(d, addr + fw->base, data, size, attrs); >> + } else if (object_dynamic_cast(OBJECT(d), TYPE_CXL_ACCEL)) { >> + return cxl_accel_read(d, addr + fw->base, data, size, attrs); >> + } >> + >> + return MEMTX_ERROR; >> } >> static MemTxResult cxl_write_cfmws(void *opaque, hwaddr addr, >> @@ -272,7 +279,13 @@ static MemTxResult cxl_write_cfmws(void *opaque, >> hwaddr addr, >> return MEMTX_OK; >> } >> - return cxl_type3_write(d, addr + fw->base, data, size, attrs); >> + if (object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) { >> + return cxl_type3_write(d, addr + fw->base, data, size, attrs); >> + } else if (object_dynamic_cast(OBJECT(d), TYPE_CXL_ACCEL)) { >> + return cxl_accel_write(d, addr + fw->base, data, size, attrs); >> + } >> + >> + return MEMTX_ERROR; >> } >> const MemoryRegionOps cfmws_ops = { >> diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig >> index 73c5ae8ad9..1f7d08c17d 100644 >> --- a/hw/mem/Kconfig >> +++ b/hw/mem/Kconfig >> @@ -16,3 +16,8 @@ config CXL_MEM_DEVICE >> bool >> default y if CXL >> select MEM_DEVICE >> + >> +config CXL_ACCEL_DEVICE >> + bool >> + default y if CXL >> + select MEM_DEVICE >> diff --git a/hw/mem/cxl_accel.c b/hw/mem/cxl_accel.c >> new file mode 100644 >> index 0000000000..770072126d >> --- /dev/null >> +++ b/hw/mem/cxl_accel.c >> @@ -0,0 +1,319 @@ >> +/* >> + * CXL accel (type-2) device >> + * >> + * Copyright(C) 2024 NVIDIA Corporation. >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2. >> See the >> + * COPYING file in the top-level directory. >> + * >> + * SPDX-License-Identifier: GPL-v2-only >> + */ >> + >> +#include "qemu/osdep.h" >> +#include "qemu/units.h" >> +#include "qemu/error-report.h" >> +#include "hw/mem/memory-device.h" >> +#include "hw/mem/pc-dimm.h" >> +#include "hw/pci/pci.h" >> +#include "hw/qdev-properties.h" >> +#include "hw/qdev-properties-system.h" >> +#include "qemu/log.h" >> +#include "qemu/module.h" >> +#include "qemu/range.h" >> +#include "sysemu/hostmem.h" >> +#include "sysemu/numa.h" >> +#include "hw/cxl/cxl.h" >> +#include "hw/pci/msix.h" >> + >> +static void update_dvsecs(CXLAccelDev *acceld) >> +{ >> + CXLComponentState *cxl_cstate = &acceld->cxl_cstate; >> + uint8_t *dvsec; >> + uint32_t range1_size_hi = 0, range1_size_lo = 0, >> + range1_base_hi = 0, range1_base_lo = 0; >> + >> + if (acceld->hostvmem) { >> + range1_size_hi = acceld->hostvmem->size >> 32; >> + range1_size_lo = (2 << 5) | (2 << 2) | 0x3 | >> + (acceld->hostvmem->size & 0xF0000000); >> + } >> + >> + dvsec = (uint8_t *)&(CXLDVSECDevice){ >> + .cap = 0x1e, >> + .ctrl = 0x2, >> + .status2 = 0x2, >> + .range1_size_hi = range1_size_hi, >> + .range1_size_lo = range1_size_lo, >> + .range1_base_hi = range1_base_hi, >> + .range1_base_lo = range1_base_lo, >> + }; >> + cxl_component_update_dvsec(cxl_cstate, PCIE_CXL_DEVICE_DVSEC_LENGTH, >> + PCIE_CXL_DEVICE_DVSEC, dvsec); >> + >> + dvsec = (uint8_t *)&(CXLDVSECRegisterLocator){ >> + .rsvd = 0, >> + .reg0_base_lo = RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX, >> + .reg0_base_hi = 0, >> + }; >> + cxl_component_update_dvsec(cxl_cstate, REG_LOC_DVSEC_LENGTH, >> + REG_LOC_DVSEC, dvsec); >> + >> + dvsec = (uint8_t *)&(CXLDVSECPortFlexBus){ >> + .cap = 0x26, /* 68B, IO, Mem, non-MLD */ >> + .ctrl = 0x02, /* IO always enabled */ >> + .status = 0x26, /* same as capabilities */ >> + .rcvd_mod_ts_data_phase1 = 0xef, /* WTF? */ >> + }; >> + cxl_component_update_dvsec(cxl_cstate, >> PCIE_CXL3_FLEXBUS_PORT_DVSEC_LENGTH, >> + PCIE_FLEXBUS_PORT_DVSEC, dvsec); >> +} >> + >> +static void build_dvsecs(CXLAccelDev *acceld) >> +{ >> + CXLComponentState *cxl_cstate = &acceld->cxl_cstate; >> + >> + cxl_component_create_dvsec(cxl_cstate, CXL3_TYPE2_DEVICE, >> + PCIE_CXL_DEVICE_DVSEC_LENGTH, >> + PCIE_CXL_DEVICE_DVSEC, >> + PCIE_CXL31_DEVICE_DVSEC_REVID, NULL); >> + >> + cxl_component_create_dvsec(cxl_cstate, CXL3_TYPE2_DEVICE, >> + REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC, >> + REG_LOC_DVSEC_REVID, NULL); >> + >> + cxl_component_create_dvsec(cxl_cstate, CXL3_TYPE2_DEVICE, >> + PCIE_CXL3_FLEXBUS_PORT_DVSEC_LENGTH, >> + PCIE_FLEXBUS_PORT_DVSEC, >> + PCIE_CXL3_FLEXBUS_PORT_DVSEC_REVID, >> NULL); >> + update_dvsecs(acceld); >> +} >> + >> +static bool cxl_accel_dpa(CXLAccelDev *acceld, hwaddr host_addr, >> uint64_t *dpa) >> +{ >> + return cxl_host_addr_to_dpa(&acceld->cxl_cstate, host_addr, dpa); >> +} >> + >> +static int cxl_accel_hpa_to_as_and_dpa(CXLAccelDev *acceld, >> + hwaddr host_addr, >> + unsigned int size, >> + AddressSpace **as, >> + uint64_t *dpa_offset) >> +{ >> + MemoryRegion *vmr = NULL; >> + uint64_t vmr_size = 0; >> + >> + if (!acceld->hostvmem) { >> + return -ENODEV; >> + } >> + >> + vmr = host_memory_backend_get_memory(acceld->hostvmem); >> + if (!vmr) { >> + return -ENODEV; >> + } >> + >> + vmr_size = memory_region_size(vmr); >> + >> + if (!cxl_accel_dpa(acceld, host_addr, dpa_offset)) { >> + return -EINVAL; >> + } >> + >> + if (*dpa_offset >= vmr_size) { >> + return -EINVAL; >> + } >> + >> + *as = &acceld->hostvmem_as; >> + return 0; >> +} >> + >> +MemTxResult cxl_accel_read(PCIDevice *d, hwaddr host_addr, uint64_t >> *data, >> + unsigned size, MemTxAttrs attrs) >> +{ >> + CXLAccelDev *acceld = CXL_ACCEL(d); >> + uint64_t dpa_offset = 0; >> + AddressSpace *as = NULL; >> + int res; >> + >> + res = cxl_accel_hpa_to_as_and_dpa(acceld, host_addr, size, >> + &as, &dpa_offset); >> + if (res) { >> + return MEMTX_ERROR; >> + } >> + >> + return address_space_read(as, dpa_offset, attrs, data, size); >> +} >> + >> +MemTxResult cxl_accel_write(PCIDevice *d, hwaddr host_addr, uint64_t >> data, >> + unsigned size, MemTxAttrs attrs) >> +{ >> + CXLAccelDev *acceld = CXL_ACCEL(d); >> + uint64_t dpa_offset = 0; >> + AddressSpace *as = NULL; >> + int res; >> + >> + res = cxl_accel_hpa_to_as_and_dpa(acceld, host_addr, size, >> + &as, &dpa_offset); >> + if (res) { >> + return MEMTX_ERROR; >> + } >> + >> + return address_space_write(as, dpa_offset, attrs, &data, size); >> +} >> + >> +static void clean_memory(PCIDevice *pci_dev) >> +{ >> + CXLAccelDev *acceld = CXL_ACCEL(pci_dev); >> + >> + if (acceld->hostvmem) { >> + address_space_destroy(&acceld->hostvmem_as); >> + } >> +} >> + >> +static bool setup_memory(PCIDevice *pci_dev, Error **errp) >> +{ >> + CXLAccelDev *acceld = CXL_ACCEL(pci_dev); >> + >> + if (acceld->hostvmem) { >> + MemoryRegion *vmr; >> + char *v_name; >> + >> + vmr = host_memory_backend_get_memory(acceld->hostvmem); >> + if (!vmr) { >> + error_setg(errp, "volatile memdev must have backing >> device"); >> + return false; >> + } >> + if (host_memory_backend_is_mapped(acceld->hostvmem)) { >> + error_setg(errp, "memory backend %s can't be used >> multiple times.", >> + object_get_canonical_path_component(OBJECT(acceld- >> >hostvmem))); >> + return false; >> + } >> + memory_region_set_nonvolatile(vmr, false); >> + memory_region_set_enabled(vmr, true); >> + host_memory_backend_set_mapped(acceld->hostvmem, true); >> + v_name = g_strdup("cxl-accel-dpa-vmem-space"); >> + address_space_init(&acceld->hostvmem_as, vmr, v_name); >> + g_free(v_name); >> + } >> + return true; >> +} >> + >> +static void setup_cxl_regs(PCIDevice *pci_dev) >> +{ >> + CXLAccelDev *acceld = CXL_ACCEL(pci_dev); >> + CXLComponentState *cxl_cstate = &acceld->cxl_cstate; >> + ComponentRegisters *regs = &cxl_cstate->crb; >> + MemoryRegion *mr = ®s->component_registers; >> + >> + cxl_cstate->dvsec_offset = 0x100; >> + cxl_cstate->pdev = pci_dev; >> + >> + build_dvsecs(acceld); >> + >> + cxl_component_register_block_init(OBJECT(pci_dev), cxl_cstate, >> + TYPE_CXL_ACCEL); >> + >> + pci_register_bar( >> + pci_dev, CXL_COMPONENT_REG_BAR_IDX, >> + PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64, >> mr); >> +} >> + >> +#define MSIX_NUM 6 >> + >> +static int setup_msix(PCIDevice *pci_dev) >> +{ >> + int i, rc; >> + >> + /* MSI(-X) Initialization */ >> + rc = msix_init_exclusive_bar(pci_dev, MSIX_NUM, 4, NULL); >> + if (rc) { >> + return rc; >> + } >> + >> + for (i = 0; i < MSIX_NUM; i++) { >> + msix_vector_use(pci_dev, i); >> + } >> + return 0; >> +} >> + >> +static void cxl_accel_realize(PCIDevice *pci_dev, Error **errp) >> +{ >> + ERRP_GUARD(); >> + int rc; >> + uint8_t *pci_conf = pci_dev->config; >> + >> + if (!setup_memory(pci_dev, errp)) { >> + return; >> + } >> + >> + pci_config_set_prog_interface(pci_conf, 0x10); >> + pcie_endpoint_cap_init(pci_dev, 0x80); >> + >> + setup_cxl_regs(pci_dev); >> + >> + /* MSI(-X) Initialization */ >> + rc = setup_msix(pci_dev); >> + if (rc) { >> + clean_memory(pci_dev); >> + return; >> + } >> +} >> + >> +static void cxl_accel_exit(PCIDevice *pci_dev) >> +{ >> + clean_memory(pci_dev); >> +} >> + >> +static void cxl_accel_reset(DeviceState *dev) >> +{ >> + CXLAccelDev *acceld = CXL_ACCEL(dev); >> + CXLComponentState *cxl_cstate = &acceld->cxl_cstate; >> + uint32_t *reg_state = cxl_cstate->crb.cache_mem_registers; >> + uint32_t *write_msk = cxl_cstate->crb.cache_mem_regs_write_mask; >> + >> + update_dvsecs(acceld); >> + cxl_component_register_init_common(reg_state, write_msk, >> CXL3_TYPE2_DEVICE); >> +} >> + >> +static Property cxl_accel_props[] = { >> + DEFINE_PROP_LINK("volatile-memdev", CXLAccelDev, hostvmem, >> + TYPE_MEMORY_BACKEND, HostMemoryBackend *), >> + DEFINE_PROP_END_OF_LIST(), >> +}; >> + >> +static void cxl_accel_class_init(ObjectClass *oc, void *data) >> +{ >> + DeviceClass *dc = DEVICE_CLASS(oc); >> + PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc); >> + >> + pc->realize = cxl_accel_realize; >> + pc->exit = cxl_accel_exit; >> + >> + pc->class_id = PCI_CLASS_CXL_QEMU_ACCEL; >> + pc->vendor_id = PCI_VENDOR_ID_INTEL; >> + pc->device_id = 0xd94; >> + pc->revision = 1; >> + >> + set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); >> + dc->desc = "CXL Accelerator Device (Type 2)"; >> + device_class_set_legacy_reset(dc, cxl_accel_reset); >> + device_class_set_props(dc, cxl_accel_props); >> +} >> + >> +static const TypeInfo cxl_accel_dev_info = { >> + .name = TYPE_CXL_ACCEL, >> + .parent = TYPE_PCI_DEVICE, >> + .class_size = sizeof(struct CXLAccelClass), >> + .class_init = cxl_accel_class_init, >> + .instance_size = sizeof(CXLAccelDev), >> + .interfaces = (InterfaceInfo[]) { >> + { INTERFACE_CXL_DEVICE }, >> + { INTERFACE_PCIE_DEVICE }, >> + {} >> + }, >> +}; >> + >> +static void cxl_accel_dev_registers(void) >> +{ >> + type_register_static(&cxl_accel_dev_info); >> +} >> + >> +type_init(cxl_accel_dev_registers); >> diff --git a/hw/mem/meson.build b/hw/mem/meson.build >> index 1c1c6da24b..36a395dbb6 100644 >> --- a/hw/mem/meson.build >> +++ b/hw/mem/meson.build >> @@ -4,6 +4,7 @@ mem_ss.add(when: 'CONFIG_DIMM', if_true: files('pc- >> dimm.c')) >> mem_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_mc.c')) >> mem_ss.add(when: 'CONFIG_NVDIMM', if_true: files('nvdimm.c')) >> mem_ss.add(when: 'CONFIG_CXL_MEM_DEVICE', if_true: >> files('cxl_type3.c')) >> +mem_ss.add(when: 'CONFIG_CXL_ACCEL_DEVICE', if_true: >> files('cxl_accel.c')) >> system_ss.add(when: 'CONFIG_CXL_MEM_DEVICE', if_false: >> files('cxl_type3_stubs.c')) >> system_ss.add(when: 'CONFIG_MEM_DEVICE', if_false: files('memory- >> device-stubs.c')) >> diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/ >> cxl_component.h >> index 30fe4bfa24..0e78db26b8 100644 >> --- a/include/hw/cxl/cxl_component.h >> +++ b/include/hw/cxl/cxl_component.h >> @@ -29,6 +29,7 @@ enum reg_type { >> CXL2_UPSTREAM_PORT, >> CXL2_DOWNSTREAM_PORT, >> CXL3_SWITCH_MAILBOX_CCI, >> + CXL3_TYPE2_DEVICE, >> }; >> /* >> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h >> index 561b375dc8..ac26b264da 100644 >> --- a/include/hw/cxl/cxl_device.h >> +++ b/include/hw/cxl/cxl_device.h >> @@ -630,6 +630,26 @@ struct CSWMBCCIDev { >> CXLCCI *cci; >> }; >> +struct CXLAccelDev { >> + /* Private */ >> + PCIDevice parent_obj; >> + >> + /* Properties */ >> + HostMemoryBackend *hostvmem; >> + >> + /* State */ >> + AddressSpace hostvmem_as; >> + CXLComponentState cxl_cstate; >> +}; >> + >> +struct CXLAccelClass { >> + /* Private */ >> + PCIDeviceClass parent_class; >> +}; >> + >> +#define TYPE_CXL_ACCEL "cxl-accel" >> +OBJECT_DECLARE_TYPE(CXLAccelDev, CXLAccelClass, CXL_ACCEL) >> + >> #define TYPE_CXL_SWITCH_MAILBOX_CCI "cxl-switch-mailbox-cci" >> OBJECT_DECLARE_TYPE(CSWMBCCIDev, CSWMBCCIClass, CXL_SWITCH_MAILBOX_CCI) >> @@ -638,6 +658,11 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr >> host_addr, uint64_t *data, >> MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t >> data, >> unsigned size, MemTxAttrs attrs); >> +MemTxResult cxl_accel_read(PCIDevice *d, hwaddr host_addr, uint64_t >> *data, >> + unsigned size, MemTxAttrs attrs); >> +MemTxResult cxl_accel_write(PCIDevice *d, hwaddr host_addr, uint64_t >> data, >> + unsigned size, MemTxAttrs attrs); >> + >> uint64_t cxl_device_get_timestamp(CXLDeviceState *cxlds); >> void cxl_event_init(CXLDeviceState *cxlds, int start_msg_num); >> diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h >> index f1a53fea8d..08bc469316 100644 >> --- a/include/hw/pci/pci_ids.h >> +++ b/include/hw/pci/pci_ids.h >> @@ -55,6 +55,7 @@ >> #define PCI_CLASS_MEMORY_RAM 0x0500 >> #define PCI_CLASS_MEMORY_FLASH 0x0501 >> #define PCI_CLASS_MEMORY_CXL 0x0502 >> +#define PCI_CLASS_CXL_QEMU_ACCEL 0x0503 >> #define PCI_CLASS_MEMORY_OTHER 0x0580 >> #define PCI_BASE_CLASS_BRIDGE 0x06 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] hw/cxl: introduce CXL type-2 device emulation 2024-12-12 13:04 ` [PATCH 3/3] hw/cxl: introduce CXL type-2 device emulation Zhi Wang 2024-12-12 17:02 ` Alejandro Lucero Palau @ 2025-01-21 16:16 ` Jonathan Cameron via 2025-01-31 10:45 ` Zhi Wang 1 sibling, 1 reply; 15+ messages in thread From: Jonathan Cameron via @ 2025-01-21 16:16 UTC (permalink / raw) To: Zhi Wang Cc: qemu-devel, dan.j.williams, dave.jiang, ira.weiny, fan.ni, alex.williamson, alucerop, clg, acurrid, cjia, smitra, ankita, aniketa, kwankhede, targupta, zhiwang On Thu, 12 Dec 2024 05:04:22 -0800 Zhi Wang <zhiw@nvidia.com> wrote: > From: Zhi Wang <zhiwang@kernel.org> > > Introduce a CXL type-2 device emulation that provides a minimum base for > testing kernel CXL core type-2 support and CXL type-2 virtualization. It > is also a good base for introducing the more emulated features. > > Currently, it only supports: > > - Emulating component registers with HDM decoders. > - Volatile memory backend and emualtion of region access. > > The emulation is aimed to not tightly coupled with the current CXL type-3 > emulation since many advanced CXL type-3 emulation features are not > implemented in a CXL type-2 device. > > Co-developed-by: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: Zhi Wang <zhiwang@kernel.org> Hi Zhi, A few passing comments. Jonathan > diff --git a/hw/mem/cxl_accel.c b/hw/mem/cxl_accel.c > new file mode 100644 > index 0000000000..770072126d > --- /dev/null > +++ b/hw/mem/cxl_accel.c > @@ -0,0 +1,319 @@ > + > +static void update_dvsecs(CXLAccelDev *acceld) Just to make them easier to search for and void clashes, good to prefix all functions with cxlacc or something like that. > +{ /... > +static Property cxl_accel_props[] = { > + DEFINE_PROP_LINK("volatile-memdev", CXLAccelDev, hostvmem, > + TYPE_MEMORY_BACKEND, HostMemoryBackend *), Does backing a type 2 device with a memdev provide any advantages? I'd have thought a device specific memory allocation would make more sense, like we'd do for a memory BAR on a PCI device. That might complicate the cxl-host handling though so perhaps this is a good way to go for now. > + DEFINE_PROP_END_OF_LIST(), When you get time, rebase as these have gone away recently. I aim to get a fresher staging tree out shortly. > +}; > + > +static void cxl_accel_class_init(ObjectClass *oc, void *data) > +{ > + DeviceClass *dc = DEVICE_CLASS(oc); > + PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc); > + > + pc->realize = cxl_accel_realize; > + pc->exit = cxl_accel_exit; > + > + pc->class_id = PCI_CLASS_CXL_QEMU_ACCEL; > + pc->vendor_id = PCI_VENDOR_ID_INTEL; > + pc->device_id = 0xd94; If you are posting these I hope you have those IDs reserved (which seems unlikely ;) We need to be absolutely sure we never hit an existing ID which generally means you need to find whoever controls those allocations in your company and get them to give you an ID for this. > + pc->revision = 1; > + > + set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); > + dc->desc = "CXL Accelerator Device (Type 2)"; > + device_class_set_legacy_reset(dc, cxl_accel_reset); > + device_class_set_props(dc, cxl_accel_props); > +} > void cxl_event_init(CXLDeviceState *cxlds, int start_msg_num); > diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h > index f1a53fea8d..08bc469316 100644 > --- a/include/hw/pci/pci_ids.h > +++ b/include/hw/pci/pci_ids.h > @@ -55,6 +55,7 @@ > #define PCI_CLASS_MEMORY_RAM 0x0500 > #define PCI_CLASS_MEMORY_FLASH 0x0501 > #define PCI_CLASS_MEMORY_CXL 0x0502 > +#define PCI_CLASS_CXL_QEMU_ACCEL 0x0503 Either this is a real device class (which seems unlikely given the name) or you need to choose something else. PCI maintains a big list of class codes and currently 0x0502 is the highest one define in baseclass 05h (memory controllers) https://members.pcisig.com/wg/PCI-SIG/document/20113 (behind a pcisig login) > #define PCI_CLASS_MEMORY_OTHER 0x0580 > > #define PCI_BASE_CLASS_BRIDGE 0x06 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] hw/cxl: introduce CXL type-2 device emulation 2025-01-21 16:16 ` Jonathan Cameron via @ 2025-01-31 10:45 ` Zhi Wang 2025-01-31 11:52 ` Jonathan Cameron via 0 siblings, 1 reply; 15+ messages in thread From: Zhi Wang @ 2025-01-31 10:45 UTC (permalink / raw) To: Jonathan Cameron Cc: qemu-devel@nongnu.org, dan.j.williams@intel.com, dave.jiang@intel.com, ira.weiny@intel.com, fan.ni@samsung.com, alex.williamson@redhat.com, alucerop@amd.com, clg@redhat.com, Andy Currid, Neo Jia, Surath Mitra, Ankit Agrawal, Aniket Agashe, Kirti Wankhede, Tarun Gupta (SW-GPU), zhiwang@kernel.org On 21/01/2025 18.16, Jonathan Cameron wrote: > On Thu, 12 Dec 2024 05:04:22 -0800 > Zhi Wang <zhiw@nvidia.com> wrote: > >> From: Zhi Wang <zhiwang@kernel.org> >> >> Introduce a CXL type-2 device emulation that provides a minimum base for >> testing kernel CXL core type-2 support and CXL type-2 virtualization. It >> is also a good base for introducing the more emulated features. >> >> Currently, it only supports: >> >> - Emulating component registers with HDM decoders. >> - Volatile memory backend and emualtion of region access. >> >> The emulation is aimed to not tightly coupled with the current CXL type-3 >> emulation since many advanced CXL type-3 emulation features are not >> implemented in a CXL type-2 device. >> >> Co-developed-by: Ira Weiny <ira.weiny@intel.com> >> Signed-off-by: Zhi Wang <zhiwang@kernel.org> > > Hi Zhi, > > A few passing comments. > > Jonathan > >> diff --git a/hw/mem/cxl_accel.c b/hw/mem/cxl_accel.c >> new file mode 100644 >> index 0000000000..770072126d >> --- /dev/null >> +++ b/hw/mem/cxl_accel.c >> @@ -0,0 +1,319 @@ > >> + >> +static void update_dvsecs(CXLAccelDev *acceld) > > Just to make them easier to search for and void clashes, good to prefix > all functions with cxlacc or something like that. > >> +{ > > /... > > >> +static Property cxl_accel_props[] = { >> + DEFINE_PROP_LINK("volatile-memdev", CXLAccelDev, hostvmem, >> + TYPE_MEMORY_BACKEND, HostMemoryBackend *), > > Does backing a type 2 device with a memdev provide any advantages? > I'd have thought a device specific memory allocation would make more > sense, like we'd do for a memory BAR on a PCI device. That might > complicate the cxl-host handling though so perhaps this is a good > way to go for now. Was thinking the same. As my current idea is for getting a emulated device the people can test CXL T2 core in the kernel and keep things as minimum as they can be in v1, this was the simplest idea I can offer. I am open for suggestions.:) > > >> + DEFINE_PROP_END_OF_LIST(), > > When you get time, rebase as these have gone away recently. > I aim to get a fresher staging tree out shortly. > Will do. >> +}; >> + >> +static void cxl_accel_class_init(ObjectClass *oc, void *data) >> +{ >> + DeviceClass *dc = DEVICE_CLASS(oc); >> + PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc); >> + >> + pc->realize = cxl_accel_realize; >> + pc->exit = cxl_accel_exit; >> + >> + pc->class_id = PCI_CLASS_CXL_QEMU_ACCEL; >> + pc->vendor_id = PCI_VENDOR_ID_INTEL; >> + pc->device_id = 0xd94; > The IDs are mostly from Ira's original T2 emulated device patches. I will take a look to see if there is a better option for this. > If you are posting these I hope you have those IDs reserved > (which seems unlikely ;) > We need to be absolutely sure we never hit an existing ID which generally > means you need to find whoever controls those allocations in your company > and get them to give you an ID for this. > >> + pc->revision = 1; >> + >> + set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); >> + dc->desc = "CXL Accelerator Device (Type 2)"; >> + device_class_set_legacy_reset(dc, cxl_accel_reset); >> + device_class_set_props(dc, cxl_accel_props); >> +} > >> void cxl_event_init(CXLDeviceState *cxlds, int start_msg_num); >> diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h >> index f1a53fea8d..08bc469316 100644 >> --- a/include/hw/pci/pci_ids.h >> +++ b/include/hw/pci/pci_ids.h >> @@ -55,6 +55,7 @@ >> #define PCI_CLASS_MEMORY_RAM 0x0500 >> #define PCI_CLASS_MEMORY_FLASH 0x0501 >> #define PCI_CLASS_MEMORY_CXL 0x0502 >> +#define PCI_CLASS_CXL_QEMU_ACCEL 0x0503 > > Either this is a real device class (which seems unlikely given the name) > or you need to choose something else. PCI maintains a big list of > class codes and currently 0x0502 is the highest one define in baseclass 05h > (memory controllers) > > https://members.pcisig.com/wg/PCI-SIG/document/20113 > (behind a pcisig login) > >> #define PCI_CLASS_MEMORY_OTHER 0x0580 >> >> #define PCI_BASE_CLASS_BRIDGE 0x06 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] hw/cxl: introduce CXL type-2 device emulation 2025-01-31 10:45 ` Zhi Wang @ 2025-01-31 11:52 ` Jonathan Cameron via 0 siblings, 0 replies; 15+ messages in thread From: Jonathan Cameron via @ 2025-01-31 11:52 UTC (permalink / raw) To: Zhi Wang Cc: qemu-devel@nongnu.org, dan.j.williams@intel.com, dave.jiang@intel.com, ira.weiny@intel.com, fan.ni@samsung.com, alex.williamson@redhat.com, alucerop@amd.com, clg@redhat.com, Andy Currid, Neo Jia, Surath Mitra, Ankit Agrawal, Aniket Agashe, Kirti Wankhede, Tarun Gupta (SW-GPU), zhiwang@kernel.org > >> +static void cxl_accel_class_init(ObjectClass *oc, void *data) > >> +{ > >> + DeviceClass *dc = DEVICE_CLASS(oc); > >> + PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc); > >> + > >> + pc->realize = cxl_accel_realize; > >> + pc->exit = cxl_accel_exit; > >> + > >> + pc->class_id = PCI_CLASS_CXL_QEMU_ACCEL; > >> + pc->vendor_id = PCI_VENDOR_ID_INTEL; > >> + pc->device_id = 0xd94; > > > > The IDs are mostly from Ira's original T2 emulated device patches. > I will take a look to see if there is a better option for this. I pinged Ira and you on the CXL discord. May be fine to use this and save you figuring out who in holds the magic list at NVidia and persuading them to let you have one ;) > > > If you are posting these I hope you have those IDs reserved > > (which seems unlikely ;) > > We need to be absolutely sure we never hit an existing ID which generally > > means you need to find whoever controls those allocations in your company > > and get them to give you an ID for this. > > > >> + pc->revision = 1; > >> + > >> + set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); > >> + dc->desc = "CXL Accelerator Device (Type 2)"; > >> + device_class_set_legacy_reset(dc, cxl_accel_reset); > >> + device_class_set_props(dc, cxl_accel_props); > >> +} > > > >> void cxl_event_init(CXLDeviceState *cxlds, int start_msg_num); > >> diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h > >> index f1a53fea8d..08bc469316 100644 > >> --- a/include/hw/pci/pci_ids.h > >> +++ b/include/hw/pci/pci_ids.h > >> @@ -55,6 +55,7 @@ > >> #define PCI_CLASS_MEMORY_RAM 0x0500 > >> #define PCI_CLASS_MEMORY_FLASH 0x0501 > >> #define PCI_CLASS_MEMORY_CXL 0x0502 > >> +#define PCI_CLASS_CXL_QEMU_ACCEL 0x0503 > > > > Either this is a real device class (which seems unlikely given the name) > > or you need to choose something else. PCI maintains a big list of > > class codes and currently 0x0502 is the highest one define in baseclass 05h > > (memory controllers) > > > > https://members.pcisig.com/wg/PCI-SIG/document/20113 > > (behind a pcisig login) > > > >> #define PCI_CLASS_MEMORY_OTHER 0x0580 > >> > >> #define PCI_BASE_CLASS_BRIDGE 0x06 > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Introduce CXL type-2 device emulation 2024-12-12 13:04 [PATCH 0/3] Introduce CXL type-2 device emulation Zhi Wang ` (2 preceding siblings ...) 2024-12-12 13:04 ` [PATCH 3/3] hw/cxl: introduce CXL type-2 device emulation Zhi Wang @ 2024-12-12 16:49 ` Alejandro Lucero Palau 2024-12-12 18:10 ` Zhi Wang 3 siblings, 1 reply; 15+ messages in thread From: Alejandro Lucero Palau @ 2024-12-12 16:49 UTC (permalink / raw) To: Zhi Wang, qemu-devel Cc: dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny, fan.ni, alex.williamson, clg, acurrid, cjia, smitra, ankita, aniketa, kwankhede, targupta, zhiwang On 12/12/24 13:04, Zhi Wang wrote: > Hi folks: > > Per the discussion with Ira/Jonathan in the LPC 2024 and in the CXL > discord channel, we are trying to introduce a CXL type-2 device emulation > in QEMU, as there are work currently on supporting CXL type-2 device [1] > in Linux kernel and CXL type-2 device virtualization [2]. > > It provides a bare minimum base for folks who would like to: > > - Contribute and test the CXL type-2 device support in the linux kernel > and CXL type-2 virtualization without having an actual HW. > - Introduce more emulated features to prototype the kernel CXL type-2 > device features and CXL type-2 virtualization. > > To test this patchset, please refer to steps in [3]. Use this patcheset > with the latest QEMU repo to be the QEMU host. It achieves the same output > as in the demo video [4]: The VFIO CXL core and VFIO CXL sample variant > driver can be attached to the emulated device in the L1 guest and assigned > to the L2 guest. The sample driver in the L2 guest can attach to the > pass-thrued device and create the CXL region. > > Tested on the CXL type-2 virtualization RFC patches [3] with an extra > fix [5]. > > [1] https://lore.kernel.org/linux-cxl/20241209185429.54054-1-alejandro.lucero-palau@amd.com/T/#t > [2] https://www.youtube.com/watch?v=e5OW1pR84Zs > [3] https://lore.kernel.org/kvm/20240920223446.1908673-3-zhiw@nvidia.com/T/ > [4] https://youtu.be/zlk_ecX9bxs?si=pf9CttcGT5KwUgiH > [5] https://lore.kernel.org/linux-cxl/20241212123959.68514-1-zhiw@nvidia.com/T/#u > > Zhi Wang (3): > hw/cxl: factor out cxl_host_addr_to_dpa() > hw/cxl: introduce cxl_component_update_dvsec() > hw/cxl: introduce CXL type-2 device emulation > > MAINTAINERS | 1 + > docs/system/devices/cxl.rst | 11 ++ > hw/cxl/cxl-component-utils.c | 103 ++++++++++- > hw/cxl/cxl-host.c | 19 +- > hw/mem/Kconfig | 5 + > hw/mem/cxl_accel.c | 319 +++++++++++++++++++++++++++++++++ > hw/mem/cxl_type3.c | 61 +------ > hw/mem/meson.build | 1 + > include/hw/cxl/cxl_component.h | 7 + > include/hw/cxl/cxl_device.h | 25 +++ > include/hw/pci/pci_ids.h | 1 + > 11 files changed, 484 insertions(+), 69 deletions(-) > create mode 100644 hw/mem/cxl_accel.c > Hi Zhi, Thank you for this patchset. I have a similar work done for helping in the Type2 support work, but it is all quick-and-dirty changes. My main concern here is with the optional features for Type2: how to create an easy way for configuring Type2 devices using some qemu cxl param. I'm afraid I did not work on that so no suggestions at all! Thank you ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Introduce CXL type-2 device emulation 2024-12-12 16:49 ` [PATCH 0/3] Introduce " Alejandro Lucero Palau @ 2024-12-12 18:10 ` Zhi Wang 2025-01-21 15:34 ` Jonathan Cameron via 0 siblings, 1 reply; 15+ messages in thread From: Zhi Wang @ 2024-12-12 18:10 UTC (permalink / raw) To: Alejandro Lucero Palau, qemu-devel@nongnu.org Cc: dan.j.williams@intel.com, dave.jiang@intel.com, jonathan.cameron@huawei.com, ira.weiny@intel.com, fan.ni@samsung.com, alex.williamson@redhat.com, clg@redhat.com, Andy Currid, Neo Jia, Surath Mitra, Ankit Agrawal, Aniket Agashe, Kirti Wankhede, Tarun Gupta (SW-GPU), zhiwang@kernel.org On 12/12/2024 18.49, Alejandro Lucero Palau wrote: > > On 12/12/24 13:04, Zhi Wang wrote: >> Hi folks: >> >> Per the discussion with Ira/Jonathan in the LPC 2024 and in the CXL >> discord channel, we are trying to introduce a CXL type-2 device emulation >> in QEMU, as there are work currently on supporting CXL type-2 device [1] >> in Linux kernel and CXL type-2 device virtualization [2]. >> >> It provides a bare minimum base for folks who would like to: >> >> - Contribute and test the CXL type-2 device support in the linux kernel >> and CXL type-2 virtualization without having an actual HW. >> - Introduce more emulated features to prototype the kernel CXL type-2 >> device features and CXL type-2 virtualization. >> >> To test this patchset, please refer to steps in [3]. Use this patcheset >> with the latest QEMU repo to be the QEMU host. It achieves the same >> output >> as in the demo video [4]: The VFIO CXL core and VFIO CXL sample variant >> driver can be attached to the emulated device in the L1 guest and >> assigned >> to the L2 guest. The sample driver in the L2 guest can attach to the >> pass-thrued device and create the CXL region. >> >> Tested on the CXL type-2 virtualization RFC patches [3] with an extra >> fix [5]. >> >> [1] https://nam11.safelinks.protection.outlook.com/? >> url=https%3A%2F%2Flore.kernel.org%2Flinux- >> cxl%2F20241209185429.54054-1-alejandro.lucero- >> palau%40amd.com%2FT%2F%23t&data=05%7C02%7Czhiw%40nvidia.com%7C3a61139bf3554f4f38f408dd1accf1b9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638696189761390919%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6WziKnwMlZJQ4yxT2jLn7W1So0OfqYss78fOosuLiwA%3D&reserved=0 >> [2] https://nam11.safelinks.protection.outlook.com/? >> url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3De5OW1pR84Zs&data=05%7C02%7Czhiw%40nvidia.com%7C3a61139bf3554f4f38f408dd1accf1b9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638696189761413039%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=hTF%2F1I%2B4fYPQeCz7NhM0uvWd%2FrWfIzaKdcteD5%2BrcZ0%3D&reserved=0 >> [3] https://nam11.safelinks.protection.outlook.com/? >> url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F20240920223446.1908673-3- >> zhiw%40nvidia.com%2FT%2F&data=05%7C02%7Czhiw%40nvidia.com%7C3a61139bf3554f4f38f408dd1accf1b9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638696189761425646%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=Wq3mr0mXZCbG3cXRKlibq%2BksTuwL8RGqiUS9jBFDfDY%3D&reserved=0 >> [4] https://nam11.safelinks.protection.outlook.com/? >> url=https%3A%2F%2Fyoutu.be%2Fzlk_ecX9bxs%3Fsi%3Dpf9CttcGT5KwUgiH&data=05%7C02%7Czhiw%40nvidia.com%7C3a61139bf3554f4f38f408dd1accf1b9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638696189761437780%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=SReTnBUC1bIhBwC%2BvASCXX%2F0ltIYcfWAkHXMmi%2FTRRg%3D&reserved=0 >> [5] https://nam11.safelinks.protection.outlook.com/? >> url=https%3A%2F%2Flore.kernel.org%2Flinux- >> cxl%2F20241212123959.68514-1- >> zhiw%40nvidia.com%2FT%2F%23u&data=05%7C02%7Czhiw%40nvidia.com%7C3a61139bf3554f4f38f408dd1accf1b9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638696189761449589%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=pmZ8JNctUlcLFwQLivNMkHj7fMt2PR24e%2BuHY%2Bk7bNA%3D&reserved=0 >> >> Zhi Wang (3): >> hw/cxl: factor out cxl_host_addr_to_dpa() >> hw/cxl: introduce cxl_component_update_dvsec() >> hw/cxl: introduce CXL type-2 device emulation >> >> MAINTAINERS | 1 + >> docs/system/devices/cxl.rst | 11 ++ >> hw/cxl/cxl-component-utils.c | 103 ++++++++++- >> hw/cxl/cxl-host.c | 19 +- >> hw/mem/Kconfig | 5 + >> hw/mem/cxl_accel.c | 319 +++++++++++++++++++++++++++++++++ >> hw/mem/cxl_type3.c | 61 +------ >> hw/mem/meson.build | 1 + >> include/hw/cxl/cxl_component.h | 7 + >> include/hw/cxl/cxl_device.h | 25 +++ >> include/hw/pci/pci_ids.h | 1 + >> 11 files changed, 484 insertions(+), 69 deletions(-) >> create mode 100644 hw/mem/cxl_accel.c >> > > Hi Zhi, > > > Thank you for this patchset. > > > I have a similar work done for helping in the Type2 support work, but > it is all quick-and-dirty changes. > > > My main concern here is with the optional features for Type2: how to > create an easy way for configuring Type2 devices using some qemu cxl > param. I'm afraid I did not work on that so no suggestions at all! > Hi Alejandro: No worries. The work is to provide a minimum base for CXL folks and CXL type-2 folks to start with, e.g. introducing more emulated features. As the type-3 emulation has been quite complicated and I was thinking maybe having a clean start would help. For re-factoring, I was mostly thinking of a step by step style: E.g. when both emulation of devices are reaching a point to have the common routines, then we re-factor them or draw a glue layer. Also, the patchset is good enough for people to test our works. If folks are OK on this minimum emulation, I think the next thing would be meaningful for us is aligning the plan for what features that we want to plug into this, so that we can share the efforts. The items on my list are: - Locked HDM decoder - CDAT and DOE I remembered you were talking about the configuration params, I think it can be very helpful on prototyping different features in the kernel as well. Feel free to reach out for discussions. Z. > > Thank you > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Introduce CXL type-2 device emulation 2024-12-12 18:10 ` Zhi Wang @ 2025-01-21 15:34 ` Jonathan Cameron via 2025-01-31 10:37 ` Zhi Wang 0 siblings, 1 reply; 15+ messages in thread From: Jonathan Cameron via @ 2025-01-21 15:34 UTC (permalink / raw) To: Zhi Wang Cc: Alejandro Lucero Palau, qemu-devel@nongnu.org, dan.j.williams@intel.com, dave.jiang@intel.com, ira.weiny@intel.com, fan.ni@samsung.com, alex.williamson@redhat.com, clg@redhat.com, Andy Currid, Neo Jia, Surath Mitra, Ankit Agrawal, Aniket Agashe, Kirti Wankhede, Tarun Gupta (SW-GPU), zhiwang@kernel.org On Thu, 12 Dec 2024 18:10:10 +0000 Zhi Wang <zhiw@nvidia.com> wrote: > On 12/12/2024 18.49, Alejandro Lucero Palau wrote: > > > > On 12/12/24 13:04, Zhi Wang wrote: > >> Hi folks: > >> > >> Per the discussion with Ira/Jonathan in the LPC 2024 and in the CXL > >> discord channel, we are trying to introduce a CXL type-2 device emulation > >> in QEMU, as there are work currently on supporting CXL type-2 device [1] > >> in Linux kernel and CXL type-2 device virtualization [2]. > >> > >> It provides a bare minimum base for folks who would like to: > >> > >> - Contribute and test the CXL type-2 device support in the linux kernel > >> and CXL type-2 virtualization without having an actual HW. > >> - Introduce more emulated features to prototype the kernel CXL type-2 > >> device features and CXL type-2 virtualization. > >> > >> To test this patchset, please refer to steps in [3]. Use this patcheset > >> with the latest QEMU repo to be the QEMU host. It achieves the same > >> output > >> as in the demo video [4]: The VFIO CXL core and VFIO CXL sample variant > >> driver can be attached to the emulated device in the L1 guest and > >> assigned > >> to the L2 guest. The sample driver in the L2 guest can attach to the > >> pass-thrued device and create the CXL region. > >> > >> Tested on the CXL type-2 virtualization RFC patches [3] with an extra > >> fix [5]. > >> > >> [1] https://nam11.safelinks.protection.outlook.com/? > >> url=https%3A%2F%2Flore.kernel.org%2Flinux- > >> cxl%2F20241209185429.54054-1-alejandro.lucero- > >> palau%40amd.com%2FT%2F%23t&data=05%7C02%7Czhiw%40nvidia.com%7C3a61139bf3554f4f38f408dd1accf1b9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638696189761390919%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6WziKnwMlZJQ4yxT2jLn7W1So0OfqYss78fOosuLiwA%3D&reserved=0 > >> [2] https://nam11.safelinks.protection.outlook.com/? > >> url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3De5OW1pR84Zs&data=05%7C02%7Czhiw%40nvidia.com%7C3a61139bf3554f4f38f408dd1accf1b9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638696189761413039%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=hTF%2F1I%2B4fYPQeCz7NhM0uvWd%2FrWfIzaKdcteD5%2BrcZ0%3D&reserved=0 > >> [3] https://nam11.safelinks.protection.outlook.com/? > >> url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F20240920223446.1908673-3- > >> zhiw%40nvidia.com%2FT%2F&data=05%7C02%7Czhiw%40nvidia.com%7C3a61139bf3554f4f38f408dd1accf1b9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638696189761425646%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=Wq3mr0mXZCbG3cXRKlibq%2BksTuwL8RGqiUS9jBFDfDY%3D&reserved=0 > >> [4] https://nam11.safelinks.protection.outlook.com/? > >> url=https%3A%2F%2Fyoutu.be%2Fzlk_ecX9bxs%3Fsi%3Dpf9CttcGT5KwUgiH&data=05%7C02%7Czhiw%40nvidia.com%7C3a61139bf3554f4f38f408dd1accf1b9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638696189761437780%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=SReTnBUC1bIhBwC%2BvASCXX%2F0ltIYcfWAkHXMmi%2FTRRg%3D&reserved=0 > >> [5] https://nam11.safelinks.protection.outlook.com/? > >> url=https%3A%2F%2Flore.kernel.org%2Flinux- > >> cxl%2F20241212123959.68514-1- > >> zhiw%40nvidia.com%2FT%2F%23u&data=05%7C02%7Czhiw%40nvidia.com%7C3a61139bf3554f4f38f408dd1accf1b9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638696189761449589%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=pmZ8JNctUlcLFwQLivNMkHj7fMt2PR24e%2BuHY%2Bk7bNA%3D&reserved=0 > >> > >> Zhi Wang (3): > >> hw/cxl: factor out cxl_host_addr_to_dpa() > >> hw/cxl: introduce cxl_component_update_dvsec() > >> hw/cxl: introduce CXL type-2 device emulation > >> > >> MAINTAINERS | 1 + > >> docs/system/devices/cxl.rst | 11 ++ > >> hw/cxl/cxl-component-utils.c | 103 ++++++++++- > >> hw/cxl/cxl-host.c | 19 +- > >> hw/mem/Kconfig | 5 + > >> hw/mem/cxl_accel.c | 319 +++++++++++++++++++++++++++++++++ > >> hw/mem/cxl_type3.c | 61 +------ > >> hw/mem/meson.build | 1 + > >> include/hw/cxl/cxl_component.h | 7 + > >> include/hw/cxl/cxl_device.h | 25 +++ > >> include/hw/pci/pci_ids.h | 1 + > >> 11 files changed, 484 insertions(+), 69 deletions(-) > >> create mode 100644 hw/mem/cxl_accel.c > >> > > > > Hi Zhi, > > > > > > Thank you for this patchset. > > > > > > I have a similar work done for helping in the Type2 support work, but > > it is all quick-and-dirty changes. > > > > > > My main concern here is with the optional features for Type2: how to > > create an easy way for configuring Type2 devices using some qemu cxl > > param. I'm afraid I did not work on that so no suggestions at all! > > > > Hi Alejandro: > > No worries. The work is to provide a minimum base for CXL folks and CXL > type-2 folks to start with, e.g. introducing more emulated features. As > the type-3 emulation has been quite complicated and I was thinking maybe > having a clean start would help. For re-factoring, I was mostly thinking > of a step by step style: E.g. when both emulation of devices are > reaching a point to have the common routines, then we re-factor them or > draw a glue layer. > > Also, the patchset is good enough for people to test our works. > > If folks are OK on this minimum emulation, I think the next thing would > be meaningful for us is aligning the plan for what features that we want > to plug into this, so that we can share the efforts. > > The items on my list are: > > - Locked HDM decoder > - CDAT and DOE > > I remembered you were talking about the configuration params, I think it > can be very helpful on prototyping different features in the kernel as > well. Feel free to reach out for discussions. > Rather than try to support every combination under the sun, I'd suggest a couple of representative choices. Anyone developing the kernel can come and tweak if they need other combinations of features. Typical test cases, so everything on, everything off, a mix or two of features on. Trying to make something really configurable via parameters will end up with nonsense combinations and just revealing bugs in the qemu emulation rather than what we actually want to test. If you want to go really general though feel free to pitch it and we'll see how bad it is. Jonathan > Z. > > > > > Thank you > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Introduce CXL type-2 device emulation 2025-01-21 15:34 ` Jonathan Cameron via @ 2025-01-31 10:37 ` Zhi Wang 0 siblings, 0 replies; 15+ messages in thread From: Zhi Wang @ 2025-01-31 10:37 UTC (permalink / raw) To: Jonathan Cameron Cc: Alejandro Lucero Palau, qemu-devel@nongnu.org, dan.j.williams@intel.com, dave.jiang@intel.com, ira.weiny@intel.com, fan.ni@samsung.com, alex.williamson@redhat.com, clg@redhat.com, Andy Currid, Neo Jia, Surath Mitra, Ankit Agrawal, Aniket Agashe, Kirti Wankhede, Tarun Gupta (SW-GPU), zhiwang@kernel.org On 21/01/2025 17.34, Jonathan Cameron wrote: > On Thu, 12 Dec 2024 18:10:10 +0000 > Zhi Wang <zhiw@nvidia.com> wrote: > >> On 12/12/2024 18.49, Alejandro Lucero Palau wrote: >>> >>> On 12/12/24 13:04, Zhi Wang wrote: >>>> Hi folks: >>>> >>>> Per the discussion with Ira/Jonathan in the LPC 2024 and in the CXL >>>> discord channel, we are trying to introduce a CXL type-2 device emulation >>>> in QEMU, as there are work currently on supporting CXL type-2 device [1] >>>> in Linux kernel and CXL type-2 device virtualization [2]. >>>> >>>> It provides a bare minimum base for folks who would like to: >>>> >>>> - Contribute and test the CXL type-2 device support in the linux kernel >>>> and CXL type-2 virtualization without having an actual HW. >>>> - Introduce more emulated features to prototype the kernel CXL type-2 >>>> device features and CXL type-2 virtualization. >>>> >>>> To test this patchset, please refer to steps in [3]. Use this patcheset >>>> with the latest QEMU repo to be the QEMU host. It achieves the same >>>> output >>>> as in the demo video [4]: The VFIO CXL core and VFIO CXL sample variant >>>> driver can be attached to the emulated device in the L1 guest and >>>> assigned >>>> to the L2 guest. The sample driver in the L2 guest can attach to the >>>> pass-thrued device and create the CXL region. >>>> >>>> Tested on the CXL type-2 virtualization RFC patches [3] with an extra >>>> fix [5]. >>>> >>>> [1] https://nam11.safelinks.protection.outlook.com/? >>>> url=https%3A%2F%2Flore.kernel.org%2Flinux- >>>> cxl%2F20241209185429.54054-1-alejandro.lucero- >>>> palau%40amd.com%2FT%2F%23t&data=05%7C02%7Czhiw%40nvidia.com%7C3a61139bf3554f4f38f408dd1accf1b9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638696189761390919%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6WziKnwMlZJQ4yxT2jLn7W1So0OfqYss78fOosuLiwA%3D&reserved=0 >>>> [2] https://nam11.safelinks.protection.outlook.com/? >>>> url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3De5OW1pR84Zs&data=05%7C02%7Czhiw%40nvidia.com%7C3a61139bf3554f4f38f408dd1accf1b9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638696189761413039%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=hTF%2F1I%2B4fYPQeCz7NhM0uvWd%2FrWfIzaKdcteD5%2BrcZ0%3D&reserved=0 >>>> [3] https://nam11.safelinks.protection.outlook.com/? >>>> url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F20240920223446.1908673-3- >>>> zhiw%40nvidia.com%2FT%2F&data=05%7C02%7Czhiw%40nvidia.com%7C3a61139bf3554f4f38f408dd1accf1b9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638696189761425646%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=Wq3mr0mXZCbG3cXRKlibq%2BksTuwL8RGqiUS9jBFDfDY%3D&reserved=0 >>>> [4] https://nam11.safelinks.protection.outlook.com/? >>>> url=https%3A%2F%2Fyoutu.be%2Fzlk_ecX9bxs%3Fsi%3Dpf9CttcGT5KwUgiH&data=05%7C02%7Czhiw%40nvidia.com%7C3a61139bf3554f4f38f408dd1accf1b9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638696189761437780%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=SReTnBUC1bIhBwC%2BvASCXX%2F0ltIYcfWAkHXMmi%2FTRRg%3D&reserved=0 >>>> [5] https://nam11.safelinks.protection.outlook.com/? >>>> url=https%3A%2F%2Flore.kernel.org%2Flinux- >>>> cxl%2F20241212123959.68514-1- >>>> zhiw%40nvidia.com%2FT%2F%23u&data=05%7C02%7Czhiw%40nvidia.com%7C3a61139bf3554f4f38f408dd1accf1b9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638696189761449589%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=pmZ8JNctUlcLFwQLivNMkHj7fMt2PR24e%2BuHY%2Bk7bNA%3D&reserved=0 >>>> >>>> Zhi Wang (3): >>>> hw/cxl: factor out cxl_host_addr_to_dpa() >>>> hw/cxl: introduce cxl_component_update_dvsec() >>>> hw/cxl: introduce CXL type-2 device emulation >>>> >>>> MAINTAINERS | 1 + >>>> docs/system/devices/cxl.rst | 11 ++ >>>> hw/cxl/cxl-component-utils.c | 103 ++++++++++- >>>> hw/cxl/cxl-host.c | 19 +- >>>> hw/mem/Kconfig | 5 + >>>> hw/mem/cxl_accel.c | 319 +++++++++++++++++++++++++++++++++ >>>> hw/mem/cxl_type3.c | 61 +------ >>>> hw/mem/meson.build | 1 + >>>> include/hw/cxl/cxl_component.h | 7 + >>>> include/hw/cxl/cxl_device.h | 25 +++ >>>> include/hw/pci/pci_ids.h | 1 + >>>> 11 files changed, 484 insertions(+), 69 deletions(-) >>>> create mode 100644 hw/mem/cxl_accel.c >>>> >>> >>> Hi Zhi, >>> >>> >>> Thank you for this patchset. >>> >>> >>> I have a similar work done for helping in the Type2 support work, but >>> it is all quick-and-dirty changes. >>> >>> >>> My main concern here is with the optional features for Type2: how to >>> create an easy way for configuring Type2 devices using some qemu cxl >>> param. I'm afraid I did not work on that so no suggestions at all! >>> >> >> Hi Alejandro: >> >> No worries. The work is to provide a minimum base for CXL folks and CXL >> type-2 folks to start with, e.g. introducing more emulated features. As >> the type-3 emulation has been quite complicated and I was thinking maybe >> having a clean start would help. For re-factoring, I was mostly thinking >> of a step by step style: E.g. when both emulation of devices are >> reaching a point to have the common routines, then we re-factor them or >> draw a glue layer. >> >> Also, the patchset is good enough for people to test our works. >> >> If folks are OK on this minimum emulation, I think the next thing would >> be meaningful for us is aligning the plan for what features that we want >> to plug into this, so that we can share the efforts. >> >> The items on my list are: >> >> - Locked HDM decoder >> - CDAT and DOE >> >> I remembered you were talking about the configuration params, I think it >> can be very helpful on prototyping different features in the kernel as >> well. Feel free to reach out for discussions. >> > Rather than try to support every combination under the sun, I'd suggest > a couple of representative choices. Anyone developing the kernel can > come and tweak if they need other combinations of features. > > Typical test cases, so everything on, everything off, a mix or > two of features on. > > Trying to make something really configurable via parameters will end > up with nonsense combinations and just revealing bugs in the qemu emulation > rather than what we actually want to test. > > If you want to go really general though feel free to pitch it and we'll > see how bad it is. > Agree. I am learning towards the switches should be use-case/device-feature oriented and that would be more aligned with the purpose of test and the requirement in reality. If there are really some special case of adding a quirk or something, we can justify them case by case. Z. > Jonathan > >> Z. >> >>> >>> Thank you >>> >> > ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-01-31 11:53 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-12 13:04 [PATCH 0/3] Introduce CXL type-2 device emulation Zhi Wang 2024-12-12 13:04 ` [PATCH 1/3] hw/cxl: factor out cxl_host_addr_to_dpa() Zhi Wang 2025-01-21 15:52 ` Jonathan Cameron via 2024-12-12 13:04 ` [PATCH 2/3] hw/cxl: introduce cxl_component_update_dvsec() Zhi Wang 2025-01-21 15:57 ` Jonathan Cameron via 2024-12-12 13:04 ` [PATCH 3/3] hw/cxl: introduce CXL type-2 device emulation Zhi Wang 2024-12-12 17:02 ` Alejandro Lucero Palau 2024-12-12 18:33 ` Zhi Wang 2025-01-21 16:16 ` Jonathan Cameron via 2025-01-31 10:45 ` Zhi Wang 2025-01-31 11:52 ` Jonathan Cameron via 2024-12-12 16:49 ` [PATCH 0/3] Introduce " Alejandro Lucero Palau 2024-12-12 18:10 ` Zhi Wang 2025-01-21 15:34 ` Jonathan Cameron via 2025-01-31 10:37 ` Zhi Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).