From: fan <nifan.cxl@gmail.com>
To: ira.weiny@intel.com
Cc: Dave Jiang <dave.jiang@intel.com>, Fan Ni <fan.ni@samsung.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Navneet Singh <navneet.singh@intel.com>,
Dan Williams <dan.j.williams@intel.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Alison Schofield <alison.schofield@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org,
linux-kernel@vger.kernel.org, Chris Mason <clm@fb.com>,
Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>
Subject: Re: [PATCH 00/26] DCD: Add support for Dynamic Capacity Devices (DCD)
Date: Mon, 25 Mar 2024 12:24:02 -0700 [thread overview]
Message-ID: <ZgHPUggTfSCIx8cI@debian> (raw)
In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com>
On Sun, Mar 24, 2024 at 04:18:03PM -0700, ira.weiny@intel.com wrote:
> A git tree of this series can be found here:
>
> https://github.com/weiny2/linux-kernel/tree/dcd-2024-03-24
>
> Pre-requisite:
> ==============
>
> The locking introduced by Vishal for DAX regions:
> https://lore.kernel.org/all/20240124-vv-dax_abi-v7-1-20d16cb8d23d@intel.com/T/#u
>
> Background
> ==========
>
> A Dynamic Capacity Device (DCD) (CXL 3.1 sec 9.13.3) is a CXL memory
> device that allows the memory capacity to change dynamically, without
> the need for resetting the device, reconfiguring HDM decoders, or
> reconfiguring software DAX regions.
>
> One of the biggest use cases for Dynamic Capacity is to allow hosts to
> share memory dynamically within a data center without increasing the
> per-host attached memory.
>
> The general flow for the addition or removal of memory is to have an
> orchestrator coordinate the use of the memory. Generally there are 5
> actors in such a system, the Orchestrator, Fabric Manager, the Device
> the host sees, the Host Kernel, and a Host User.
>
> Typical work flows are shown below.
>
> Orchestrator FM Device Host Kernel Host User
>
> | | | | |
> |-------------- Create region ----------------------->|
> | | | | |
> | | | |<-- Create ---|
> | | | | Region |
> |<------------- Signal done --------------------------|
> | | | | |
> |-- Add ----->|-- Add --->|--- Add --->| |
> | Capacity | Extent | Extent | |
> | | | | |
> | |<- Accept -|<- Accept -| |
> | | Extent | Extent | |
> | | | |<- Create --->|
> | | | | DAX dev |-- Use memory
> | | | | | |
> | | | | | |
> | | | |<- Release ---| <-+
> | | | | DAX dev |
> | | | | |
> |<------------- Signal done --------------------------|
> | | | | |
> |-- Remove -->|- Release->|- Release ->| |
> | Capacity | Extent | Extent | |
> | | | | |
> | |<- Release-|<- Release -| |
> | | Extent | Extent | |
> | | | | |
> |-- Add ----->|-- Add --->|--- Add --->| |
> | Capacity | Extent | Extent | |
> | | | | |
> | |<- Accept -|<- Accept -| |
> | | Extent | Extent | |
> | | | |<- Create ----|
> | | | | DAX dev |-- Use memory
> | | | | | |
> | | | |<- Release ---| <-+
> | | | | DAX dev |
> |<------------- Signal done --------------------------|
> | | | | |
> |-- Remove -->|- Release->|- Release ->| |
> | Capacity | Extent | Extent | |
> | | | | |
> | |<- Release-|<- Release -| |
> | | Extent | Extent | |
> | | | | |
> |-- Add ----->|-- Add --->|--- Add --->| |
> | Capacity | Extent | Extent | |
> | | | |<- Create ----|
> | | | | DAX dev |-- Use memory
> | | | | | |
> |-- Remove -->|- Release->|- Release ->| | |
> | Capacity | Extent | Extent | | |
> | | | | | |
> | | | (Release Ignored) | |
> | | | | | |
> | | | |<- Release ---| <-+
> | | | | DAX dev |
> |<------------- Signal done --------------------------|
> | | | | |
> | |- Release->|- Release ->| |
> | | Extent | Extent | |
> | | | | |
> | |<- Release-|<- Release -| |
> | | Extent | Extent | |
> | | | |<- Destroy ---|
> | | | | Region |
> | | | | |
>
> Previous RFCs of this series[0] resulted in significant architectural
> comments. Previous versions allowed memory capacity to be accepted by
> the host regardless of the existence of a software region being mapped.
>
> With this new patch set the order of the create region and DAX device
> creation must be synchronized with the Orchestrator adding/removing
> capacity. The host kernel will reject an add extent event if the region
> is not created yet. It will also ignore a release if the DAX device is
> created and referencing an extent.
>
> Neither of these synchronizations are anticipated to be an issue with
> real applications.
>
> In order to allow for capacity to be added and removed a new concept of
> a sparse DAX region is introduced. A sparse DAX region may have 0 or
> more bytes of available space. The total space depends on the number
> and size of the extents which have been added.
>
> Initially it is anticipated that users of the memory will carefully
> coordinate the surfacing of additional capacity with the creation of DAX
> devices which use that capacity. Therefore, the allocation of the
> memory to DAX devices does not allow for specific associations between
> DAX device and extent. This keeps allocations very similar to existing
> DAX region behavior.
>
> Great care was taken to greatly simplify extent tracking. Specifically,
> in comparison to previous versions of the patch set, all extent tracking
> xarrays have been eliminated from the code. In addition, most of the
> extra software objects and associated referenced counts have been
> eliminated.
>
> In this version, extents are tracked purely as sub-devices of the
> region. This ensures that the region destruction cleans up all extent
> allocations properly. Device managed callbacks are wired to ensure any
> additional data required for DAX device references are handled
> correctly.
>
> Due to these major changes I'm setting this new series to V1.
>
> In summary the major functionality of this series includes:
>
> - Getting the dynamic capacity (DC) configuration information from cxl
> devices
>
> - Configuring the DC regions reported by hardware
>
> - Enhancing the CXL and DAX regions for dynamic capacity support
> a. Maintain a logical separation between hardware extents and
> software managed region extents. This provides an
> abstraction between the layers and should allow for
> interleaving in the future
>
> - Get hardware extent lists for endpoint decoders upon
> region creation.
>
> - Adjust extent/region memory available on the following events.
> a. Add capacity Events
> b. Release capacity events
>
> - Host response for add capacity
> a. do not accept the extent if:
> If the region does not exist
> or an error occurs realizing the extent
> B. If the region does exist
> realize a DAX region extent with 1:1 mapping (no
> interleave yet)
>
> - Host response for remove capacity
> a. If no DAX devices reference the extent release the extent
> b. If a reference does exist, ignore the request.
> (Require FM to issue release again.)
>
> - Modify DAX device creation/resize to account for extents within a
> sparse DAX region
>
> - Trace Dynamic Capacity events for debugging
>
> - Add cxl-test infrastructure to allow for faster unit testing
> (See new ndctl branch for cxl-dcd.sh test[1])
>
> Fan Ni's latest v5 of Qemu DCD was used for testing.[2]
>
> Remaining work:
>
> 1) Integrate the QoS work from Dave Jiang
> 2) Interleave support
>
> Possible additional work depending on requirements:
>
> 1) Allow mapping to specific extents (perhaps based on
> label/tag)
> 2) Release extents when DAX devices are released if a release
> was previously seen from the device
> 3) Accept a new extent which extends (but overlaps) an existing
> extent(s)
>
> [0] RFC v2: https://lore.kernel.org/r/20230604-dcd-type2-upstream-v2-0-f740c47e7916@intel.com
> [1] https://github.com/weiny2/ndctl/tree/dcd-region2-2024-03-22
> [2] https://lore.kernel.org/all/20240304194331.1586191-1-nifan.cxl@gmail.com/
>
> ---
> Changes for v1:
> - iweiny: Largely new series
> - iweiny: Remove review tags due to the series being a major rework
> - iweiny: Fix authorship for Navneet patches
> - iweiny: Remove extent xarrays
> - iweiny: Remove kreferences, replace with 1 use count protected under dax_rwsem
> - iweiny: Mark all sysfs entries for the 6.10 June 2024 kernel
> - iweiny: Remove gotos
> - iweiny: Fix 0day issues
> - Jonathan Cameron: address comments
> - Navneet Singh: address comments
> - Dan Williams: address comments
> - Dave Jiang: address comments
> - Fan Ni: address comments
> - Jørgen Hansen: address comments
> - Link to RFC v2: https://lore.kernel.org/r/20230604-dcd-type2-upstream-v2-0-f740c47e7916@intel.com
>
Hi Ira,
Have not got a chance to check the code yet, but I noticed one thing
when testing with my DCD emulation code.
Currently, if we do partial release, it seems the whole extent will be
removed. Is it designed intentionally?
Fan
> ---
> Ira Weiny (12):
> cxl/core: Simplify cxl_dpa_set_mode()
> cxl/events: Factor out event msgnum configuration
> cxl/pci: Delay event buffer allocation
> cxl/pci: Factor out interrupt policy check
> range: Add range_overlaps()
> dax/bus: Factor out dev dax resize logic
> dax: Document dax dev range tuple
> dax/region: Prevent range mapping allocation on sparse regions
> dax/region: Support DAX device creation on sparse DAX regions
> tools/testing/cxl: Make event logs dynamic
> tools/testing/cxl: Add DC Regions to mock mem data
> tools/testing/cxl: Add Dynamic Capacity events
>
> Navneet Singh (14):
> cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
> cxl/core: Separate region mode from decoder mode
> cxl/mem: Read dynamic capacity configuration from the device
> cxl/region: Add dynamic capacity decoder and region modes
> cxl/port: Add Dynamic Capacity mode support to endpoint decoders
> cxl/port: Add dynamic capacity size support to endpoint decoders
> cxl/mem: Expose device dynamic capacity capabilities
> cxl/region: Add Dynamic Capacity CXL region support
> cxl/mem: Configure dynamic capacity interrupts
> cxl/region: Read existing extents on region creation
> cxl/extent: Realize extent devices
> dax/region: Create extent resources on DAX region driver load
> cxl/mem: Handle DCD add & release capacity events.
> cxl/mem: Trace Dynamic capacity Event Record
>
> Documentation/ABI/testing/sysfs-bus-cxl | 60 ++-
> drivers/cxl/core/Makefile | 1 +
> drivers/cxl/core/core.h | 10 +
> drivers/cxl/core/extent.c | 145 +++++
> drivers/cxl/core/hdm.c | 254 +++++++--
> drivers/cxl/core/mbox.c | 591 ++++++++++++++++++++-
> drivers/cxl/core/memdev.c | 76 +++
> drivers/cxl/core/port.c | 19 +
> drivers/cxl/core/region.c | 334 +++++++++++-
> drivers/cxl/core/trace.h | 65 +++
> drivers/cxl/cxl.h | 127 ++++-
> drivers/cxl/cxlmem.h | 114 ++++
> drivers/cxl/mem.c | 45 ++
> drivers/cxl/pci.c | 122 +++--
> drivers/dax/bus.c | 353 +++++++++---
> drivers/dax/bus.h | 4 +-
> drivers/dax/cxl.c | 127 ++++-
> drivers/dax/dax-private.h | 40 +-
> drivers/dax/hmem/hmem.c | 2 +-
> drivers/dax/pmem.c | 2 +-
> fs/btrfs/ordered-data.c | 10 +-
> include/linux/cxl-event.h | 31 ++
> include/linux/range.h | 7 +
> tools/testing/cxl/Kbuild | 1 +
> tools/testing/cxl/test/mem.c | 914 ++++++++++++++++++++++++++++----
> 25 files changed, 3152 insertions(+), 302 deletions(-)
> ---
> base-commit: dff54316795991e88a453a095a9322718a34034a
> change-id: 20230604-dcd-type2-upstream-0cd15f6216fd
>
> Best regards,
> --
> Ira Weiny <ira.weiny@intel.com>
>
next prev parent reply other threads:[~2024-03-25 19:24 UTC|newest]
Thread overview: 161+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-24 23:18 [PATCH 00/26] DCD: Add support for Dynamic Capacity Devices (DCD) ira.weiny
2024-03-24 23:18 ` [PATCH 01/26] cxl/mbox: Flag " ira.weiny
2024-03-25 16:11 ` Jonathan Cameron
2024-03-25 22:16 ` fan
2024-03-25 22:56 ` Davidlohr Bueso
2024-04-02 22:26 ` Ira Weiny
2024-03-26 16:34 ` Dave Jiang
2024-04-02 22:30 ` Ira Weiny
2024-04-10 18:15 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 02/26] cxl/core: Separate region mode from decoder mode ira.weiny
2024-03-25 16:20 ` Jonathan Cameron
2024-04-02 23:24 ` Ira Weiny
2024-03-25 23:18 ` Davidlohr Bueso
2024-03-28 5:22 ` Ira Weiny
2024-03-28 20:09 ` Dave Jiang
2024-04-02 23:27 ` Ira Weiny
2024-04-24 17:58 ` Ira Weiny
2024-04-02 23:25 ` Ira Weiny
2024-04-10 18:49 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 03/26] cxl/mem: Read dynamic capacity configuration from the device ira.weiny
2024-03-25 17:40 ` Jonathan Cameron
2024-04-03 22:22 ` Ira Weiny
2024-03-25 23:36 ` fan
2024-04-03 22:41 ` Ira Weiny
2024-04-02 11:41 ` Jørgen Hansen
2024-04-05 18:09 ` Ira Weiny
2024-04-09 8:42 ` Jørgen Hansen
2024-04-09 2:00 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 04/26] cxl/region: Add dynamic capacity decoder and region modes ira.weiny
2024-03-25 17:42 ` Jonathan Cameron
2024-03-26 16:17 ` fan
2024-03-27 15:43 ` Dave Jiang
2024-04-05 18:19 ` Ira Weiny
2024-04-06 0:01 ` Dave Jiang
2024-05-14 2:40 ` Zhijian Li (Fujitsu)
2024-03-24 23:18 ` [PATCH 05/26] cxl/core: Simplify cxl_dpa_set_mode() Ira Weiny
2024-03-25 17:46 ` Jonathan Cameron
2024-03-25 21:38 ` Davidlohr Bueso
2024-03-26 16:25 ` fan
2024-03-26 17:46 ` Dave Jiang
2024-04-05 19:21 ` Ira Weiny
2024-04-06 0:02 ` Dave Jiang
2024-04-09 0:43 ` Alison Schofield
2024-05-03 19:09 ` Ira Weiny
2024-05-03 20:33 ` Alison Schofield
2024-05-04 1:19 ` Dan Williams
2024-05-06 4:06 ` Ira Weiny
2024-05-04 4:13 ` Dan Williams
2024-05-06 3:46 ` Ira Weiny
2024-03-24 23:18 ` [PATCH 06/26] cxl/port: Add Dynamic Capacity mode support to endpoint decoders ira.weiny
2024-03-26 16:35 ` fan
2024-04-05 19:50 ` Ira Weiny
2024-03-26 17:58 ` Dave Jiang
2024-04-05 20:34 ` Ira Weiny
2024-04-04 8:32 ` Jonathan Cameron
2024-04-05 20:56 ` Ira Weiny
2024-05-06 16:22 ` Dan Williams
2024-05-10 5:31 ` Ira Weiny
2024-04-10 20:33 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 07/26] cxl/port: Add dynamic capacity size " ira.weiny
2024-04-05 13:54 ` Jonathan Cameron
2024-05-03 17:09 ` Ira Weiny
2024-05-03 17:21 ` Dan Williams
2024-05-06 4:07 ` Ira Weiny
2024-04-10 22:50 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 08/26] cxl/mem: Expose device dynamic capacity capabilities ira.weiny
2024-03-25 23:40 ` Davidlohr Bueso
2024-03-26 18:30 ` fan
2024-04-04 8:44 ` Jonathan Cameron
2024-04-04 8:51 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 09/26] cxl/region: Add Dynamic Capacity CXL region support ira.weiny
2024-03-26 22:31 ` fan
2024-04-10 4:25 ` Ira Weiny
2024-03-27 17:27 ` Dave Jiang
2024-04-10 4:35 ` Ira Weiny
2024-04-04 10:26 ` Jonathan Cameron
2024-04-10 4:40 ` Ira Weiny
2024-03-24 23:18 ` [PATCH 10/26] cxl/events: Factor out event msgnum configuration Ira Weiny
2024-03-27 17:38 ` Dave Jiang
2024-04-04 15:07 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 11/26] cxl/pci: Delay event buffer allocation Ira Weiny
2024-03-25 22:26 ` Davidlohr Bueso
2024-03-27 17:38 ` Dave Jiang
2024-04-04 15:08 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 12/26] cxl/pci: Factor out interrupt policy check Ira Weiny
2024-03-27 17:41 ` Dave Jiang
2024-04-04 15:10 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 13/26] cxl/mem: Configure dynamic capacity interrupts ira.weiny
2024-03-26 23:12 ` fan
2024-04-10 4:48 ` Ira Weiny
2024-03-27 17:54 ` Dave Jiang
2024-04-10 5:26 ` Ira Weiny
2024-04-04 15:22 ` Jonathan Cameron
2024-04-10 5:34 ` Ira Weiny
2024-04-10 23:23 ` Alison Schofield
2024-05-06 16:56 ` Dan Williams
2024-03-24 23:18 ` [PATCH 14/26] cxl/region: Read existing extents on region creation ira.weiny
2024-03-26 23:27 ` fan
2024-04-10 5:46 ` Ira Weiny
2024-03-27 17:45 ` fan
2024-04-10 6:19 ` Ira Weiny
2024-03-27 18:31 ` Dave Jiang
2024-04-10 6:09 ` Ira Weiny
2024-04-02 13:57 ` Jørgen Hansen
2024-04-10 6:29 ` Ira Weiny
2024-04-04 16:04 ` Jonathan Cameron
2024-04-04 16:13 ` Jonathan Cameron
2024-04-10 17:44 ` Alison Schofield
2024-05-06 18:34 ` Dan Williams
2024-06-29 3:47 ` Ira Weiny
2024-03-24 23:18 ` [PATCH 15/26] range: Add range_overlaps() Ira Weiny
2024-03-25 18:33 ` David Sterba
2024-03-25 21:24 ` Davidlohr Bueso
2024-03-26 12:51 ` Johannes Thumshirn
2024-03-27 17:36 ` fan
2024-03-28 20:09 ` Dave Jiang
2024-04-04 16:06 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 16/26] cxl/extent: Realize extent devices ira.weiny
2024-03-27 22:34 ` fan
2024-03-28 21:11 ` Dave Jiang
2024-04-24 19:57 ` Ira Weiny
2024-04-04 16:32 ` Jonathan Cameron
2024-04-30 3:23 ` Ira Weiny
2024-05-02 21:12 ` Dan Williams
2024-05-06 4:35 ` Ira Weiny
2024-04-11 0:09 ` Alison Schofield
2024-05-07 1:30 ` Dan Williams
2024-03-24 23:18 ` [PATCH 17/26] dax/region: Create extent resources on DAX region driver load ira.weiny
2024-04-04 16:36 ` Jonathan Cameron
2024-04-09 16:22 ` fan
2024-05-07 2:31 ` Dan Williams
2024-03-24 23:18 ` [PATCH 18/26] cxl/mem: Handle DCD add & release capacity events ira.weiny
2024-04-04 17:03 ` Jonathan Cameron
2024-05-07 5:04 ` Dan Williams
2024-03-24 23:18 ` [PATCH 19/26] dax/bus: Factor out dev dax resize logic Ira Weiny
2024-04-04 17:15 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 20/26] dax: Document dax dev range tuple Ira Weiny
2024-04-01 17:06 ` Dave Jiang
2024-04-04 17:19 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 21/26] dax/region: Prevent range mapping allocation on sparse regions Ira Weiny
2024-04-01 17:07 ` Dave Jiang
2024-04-10 23:02 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 22/26] dax/region: Support DAX device creation on sparse DAX regions Ira Weiny
2024-04-04 17:36 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 23/26] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2024-04-01 17:56 ` Dave Jiang
2024-04-04 17:38 ` Jonathan Cameron
2024-04-10 17:03 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 24/26] tools/testing/cxl: Make event logs dynamic Ira Weiny
2024-03-24 23:18 ` [PATCH 25/26] tools/testing/cxl: Add DC Regions to mock mem data Ira Weiny
2024-03-24 23:18 ` [PATCH 26/26] tools/testing/cxl: Add Dynamic Capacity events Ira Weiny
2024-03-25 19:24 ` fan [this message]
2024-03-28 5:20 ` [PATCH 00/26] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny
2024-04-03 20:39 ` Jonathan Cameron
2024-04-04 10:20 ` Jonathan Cameron
2024-04-04 17:49 ` Jonathan Cameron
2024-05-01 23:49 ` Ira Weiny
2024-05-03 9:20 ` Jonathan Cameron
2024-05-06 4:24 ` Ira Weiny
2024-05-08 14:43 ` Jonathan Cameron
2024-04-10 18:01 ` Alison Schofield
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZgHPUggTfSCIx8cI@debian \
--to=nifan.cxl@gmail.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=clm@fb.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=dsterba@suse.com \
--cc=fan.ni@samsung.com \
--cc=ira.weiny@intel.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=navneet.singh@intel.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).