From: ira.weiny@intel.com
To: Dave Jiang <dave.jiang@intel.com>, Fan Ni <fan.ni@samsung.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Navneet Singh <navneet.singh@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Alison Schofield <alison.schofield@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Ira Weiny <ira.weiny@intel.com>,
linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org,
linux-kernel@vger.kernel.org,
Jonathan Cameron <Jonathan.Cameron@Huawei.com>,
Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>
Subject: [PATCH 00/26] DCD: Add support for Dynamic Capacity Devices (DCD)
Date: Sun, 24 Mar 2024 16:18:03 -0700 [thread overview]
Message-ID: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> (raw)
A git tree of this series can be found here:
https://github.com/weiny2/linux-kernel/tree/dcd-2024-03-24
Pre-requisite:
==============
The locking introduced by Vishal for DAX regions:
https://lore.kernel.org/all/20240124-vv-dax_abi-v7-1-20d16cb8d23d@intel.com/T/#u
Background
==========
A Dynamic Capacity Device (DCD) (CXL 3.1 sec 9.13.3) is a CXL memory
device that allows the memory capacity to change dynamically, without
the need for resetting the device, reconfiguring HDM decoders, or
reconfiguring software DAX regions.
One of the biggest use cases for Dynamic Capacity is to allow hosts to
share memory dynamically within a data center without increasing the
per-host attached memory.
The general flow for the addition or removal of memory is to have an
orchestrator coordinate the use of the memory. Generally there are 5
actors in such a system, the Orchestrator, Fabric Manager, the Device
the host sees, the Host Kernel, and a Host User.
Typical work flows are shown below.
Orchestrator FM Device Host Kernel Host User
| | | | |
|-------------- Create region ----------------------->|
| | | | |
| | | |<-- Create ---|
| | | | Region |
|<------------- Signal done --------------------------|
| | | | |
|-- Add ----->|-- Add --->|--- Add --->| |
| Capacity | Extent | Extent | |
| | | | |
| |<- Accept -|<- Accept -| |
| | Extent | Extent | |
| | | |<- Create --->|
| | | | DAX dev |-- Use memory
| | | | | |
| | | | | |
| | | |<- Release ---| <-+
| | | | DAX dev |
| | | | |
|<------------- Signal done --------------------------|
| | | | |
|-- Remove -->|- Release->|- Release ->| |
| Capacity | Extent | Extent | |
| | | | |
| |<- Release-|<- Release -| |
| | Extent | Extent | |
| | | | |
|-- Add ----->|-- Add --->|--- Add --->| |
| Capacity | Extent | Extent | |
| | | | |
| |<- Accept -|<- Accept -| |
| | Extent | Extent | |
| | | |<- Create ----|
| | | | DAX dev |-- Use memory
| | | | | |
| | | |<- Release ---| <-+
| | | | DAX dev |
|<------------- Signal done --------------------------|
| | | | |
|-- Remove -->|- Release->|- Release ->| |
| Capacity | Extent | Extent | |
| | | | |
| |<- Release-|<- Release -| |
| | Extent | Extent | |
| | | | |
|-- Add ----->|-- Add --->|--- Add --->| |
| Capacity | Extent | Extent | |
| | | |<- Create ----|
| | | | DAX dev |-- Use memory
| | | | | |
|-- Remove -->|- Release->|- Release ->| | |
| Capacity | Extent | Extent | | |
| | | | | |
| | | (Release Ignored) | |
| | | | | |
| | | |<- Release ---| <-+
| | | | DAX dev |
|<------------- Signal done --------------------------|
| | | | |
| |- Release->|- Release ->| |
| | Extent | Extent | |
| | | | |
| |<- Release-|<- Release -| |
| | Extent | Extent | |
| | | |<- Destroy ---|
| | | | Region |
| | | | |
Previous RFCs of this series[0] resulted in significant architectural
comments. Previous versions allowed memory capacity to be accepted by
the host regardless of the existence of a software region being mapped.
With this new patch set the order of the create region and DAX device
creation must be synchronized with the Orchestrator adding/removing
capacity. The host kernel will reject an add extent event if the region
is not created yet. It will also ignore a release if the DAX device is
created and referencing an extent.
Neither of these synchronizations are anticipated to be an issue with
real applications.
In order to allow for capacity to be added and removed a new concept of
a sparse DAX region is introduced. A sparse DAX region may have 0 or
more bytes of available space. The total space depends on the number
and size of the extents which have been added.
Initially it is anticipated that users of the memory will carefully
coordinate the surfacing of additional capacity with the creation of DAX
devices which use that capacity. Therefore, the allocation of the
memory to DAX devices does not allow for specific associations between
DAX device and extent. This keeps allocations very similar to existing
DAX region behavior.
Great care was taken to greatly simplify extent tracking. Specifically,
in comparison to previous versions of the patch set, all extent tracking
xarrays have been eliminated from the code. In addition, most of the
extra software objects and associated referenced counts have been
eliminated.
In this version, extents are tracked purely as sub-devices of the
region. This ensures that the region destruction cleans up all extent
allocations properly. Device managed callbacks are wired to ensure any
additional data required for DAX device references are handled
correctly.
Due to these major changes I'm setting this new series to V1.
In summary the major functionality of this series includes:
- Getting the dynamic capacity (DC) configuration information from cxl
devices
- Configuring the DC regions reported by hardware
- Enhancing the CXL and DAX regions for dynamic capacity support
a. Maintain a logical separation between hardware extents and
software managed region extents. This provides an
abstraction between the layers and should allow for
interleaving in the future
- Get hardware extent lists for endpoint decoders upon
region creation.
- Adjust extent/region memory available on the following events.
a. Add capacity Events
b. Release capacity events
- Host response for add capacity
a. do not accept the extent if:
If the region does not exist
or an error occurs realizing the extent
B. If the region does exist
realize a DAX region extent with 1:1 mapping (no
interleave yet)
- Host response for remove capacity
a. If no DAX devices reference the extent release the extent
b. If a reference does exist, ignore the request.
(Require FM to issue release again.)
- Modify DAX device creation/resize to account for extents within a
sparse DAX region
- Trace Dynamic Capacity events for debugging
- Add cxl-test infrastructure to allow for faster unit testing
(See new ndctl branch for cxl-dcd.sh test[1])
Fan Ni's latest v5 of Qemu DCD was used for testing.[2]
Remaining work:
1) Integrate the QoS work from Dave Jiang
2) Interleave support
Possible additional work depending on requirements:
1) Allow mapping to specific extents (perhaps based on
label/tag)
2) Release extents when DAX devices are released if a release
was previously seen from the device
3) Accept a new extent which extends (but overlaps) an existing
extent(s)
[0] RFC v2: https://lore.kernel.org/r/20230604-dcd-type2-upstream-v2-0-f740c47e7916@intel.com
[1] https://github.com/weiny2/ndctl/tree/dcd-region2-2024-03-22
[2] https://lore.kernel.org/all/20240304194331.1586191-1-nifan.cxl@gmail.com/
---
Changes for v1:
- iweiny: Largely new series
- iweiny: Remove review tags due to the series being a major rework
- iweiny: Fix authorship for Navneet patches
- iweiny: Remove extent xarrays
- iweiny: Remove kreferences, replace with 1 use count protected under dax_rwsem
- iweiny: Mark all sysfs entries for the 6.10 June 2024 kernel
- iweiny: Remove gotos
- iweiny: Fix 0day issues
- Jonathan Cameron: address comments
- Navneet Singh: address comments
- Dan Williams: address comments
- Dave Jiang: address comments
- Fan Ni: address comments
- Jørgen Hansen: address comments
- Link to RFC v2: https://lore.kernel.org/r/20230604-dcd-type2-upstream-v2-0-f740c47e7916@intel.com
---
Ira Weiny (12):
cxl/core: Simplify cxl_dpa_set_mode()
cxl/events: Factor out event msgnum configuration
cxl/pci: Delay event buffer allocation
cxl/pci: Factor out interrupt policy check
range: Add range_overlaps()
dax/bus: Factor out dev dax resize logic
dax: Document dax dev range tuple
dax/region: Prevent range mapping allocation on sparse regions
dax/region: Support DAX device creation on sparse DAX regions
tools/testing/cxl: Make event logs dynamic
tools/testing/cxl: Add DC Regions to mock mem data
tools/testing/cxl: Add Dynamic Capacity events
Navneet Singh (14):
cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
cxl/core: Separate region mode from decoder mode
cxl/mem: Read dynamic capacity configuration from the device
cxl/region: Add dynamic capacity decoder and region modes
cxl/port: Add Dynamic Capacity mode support to endpoint decoders
cxl/port: Add dynamic capacity size support to endpoint decoders
cxl/mem: Expose device dynamic capacity capabilities
cxl/region: Add Dynamic Capacity CXL region support
cxl/mem: Configure dynamic capacity interrupts
cxl/region: Read existing extents on region creation
cxl/extent: Realize extent devices
dax/region: Create extent resources on DAX region driver load
cxl/mem: Handle DCD add & release capacity events.
cxl/mem: Trace Dynamic capacity Event Record
Documentation/ABI/testing/sysfs-bus-cxl | 60 ++-
drivers/cxl/core/Makefile | 1 +
drivers/cxl/core/core.h | 10 +
drivers/cxl/core/extent.c | 145 +++++
drivers/cxl/core/hdm.c | 254 +++++++--
drivers/cxl/core/mbox.c | 591 ++++++++++++++++++++-
drivers/cxl/core/memdev.c | 76 +++
drivers/cxl/core/port.c | 19 +
drivers/cxl/core/region.c | 334 +++++++++++-
drivers/cxl/core/trace.h | 65 +++
drivers/cxl/cxl.h | 127 ++++-
drivers/cxl/cxlmem.h | 114 ++++
drivers/cxl/mem.c | 45 ++
drivers/cxl/pci.c | 122 +++--
drivers/dax/bus.c | 353 +++++++++---
drivers/dax/bus.h | 4 +-
drivers/dax/cxl.c | 127 ++++-
drivers/dax/dax-private.h | 40 +-
drivers/dax/hmem/hmem.c | 2 +-
drivers/dax/pmem.c | 2 +-
fs/btrfs/ordered-data.c | 10 +-
include/linux/cxl-event.h | 31 ++
include/linux/range.h | 7 +
tools/testing/cxl/Kbuild | 1 +
tools/testing/cxl/test/mem.c | 914 ++++++++++++++++++++++++++++----
25 files changed, 3152 insertions(+), 302 deletions(-)
---
base-commit: dff54316795991e88a453a095a9322718a34034a
change-id: 20230604-dcd-type2-upstream-0cd15f6216fd
Best regards,
--
Ira Weiny <ira.weiny@intel.com>
next reply other threads:[~2024-03-24 23:18 UTC|newest]
Thread overview: 161+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-24 23:18 ira.weiny [this message]
2024-03-24 23:18 ` [PATCH 01/26] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) ira.weiny
2024-03-25 16:11 ` Jonathan Cameron
2024-03-25 22:16 ` fan
2024-03-25 22:56 ` Davidlohr Bueso
2024-04-02 22:26 ` Ira Weiny
2024-03-26 16:34 ` Dave Jiang
2024-04-02 22:30 ` Ira Weiny
2024-04-10 18:15 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 02/26] cxl/core: Separate region mode from decoder mode ira.weiny
2024-03-25 16:20 ` Jonathan Cameron
2024-04-02 23:24 ` Ira Weiny
2024-03-25 23:18 ` Davidlohr Bueso
2024-03-28 5:22 ` Ira Weiny
2024-03-28 20:09 ` Dave Jiang
2024-04-02 23:27 ` Ira Weiny
2024-04-24 17:58 ` Ira Weiny
2024-04-02 23:25 ` Ira Weiny
2024-04-10 18:49 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 03/26] cxl/mem: Read dynamic capacity configuration from the device ira.weiny
2024-03-25 17:40 ` Jonathan Cameron
2024-04-03 22:22 ` Ira Weiny
2024-03-25 23:36 ` fan
2024-04-03 22:41 ` Ira Weiny
2024-04-02 11:41 ` Jørgen Hansen
2024-04-05 18:09 ` Ira Weiny
2024-04-09 8:42 ` Jørgen Hansen
2024-04-09 2:00 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 04/26] cxl/region: Add dynamic capacity decoder and region modes ira.weiny
2024-03-25 17:42 ` Jonathan Cameron
2024-03-26 16:17 ` fan
2024-03-27 15:43 ` Dave Jiang
2024-04-05 18:19 ` Ira Weiny
2024-04-06 0:01 ` Dave Jiang
2024-05-14 2:40 ` Zhijian Li (Fujitsu)
2024-03-24 23:18 ` [PATCH 05/26] cxl/core: Simplify cxl_dpa_set_mode() Ira Weiny
2024-03-25 17:46 ` Jonathan Cameron
2024-03-25 21:38 ` Davidlohr Bueso
2024-03-26 16:25 ` fan
2024-03-26 17:46 ` Dave Jiang
2024-04-05 19:21 ` Ira Weiny
2024-04-06 0:02 ` Dave Jiang
2024-04-09 0:43 ` Alison Schofield
2024-05-03 19:09 ` Ira Weiny
2024-05-03 20:33 ` Alison Schofield
2024-05-04 1:19 ` Dan Williams
2024-05-06 4:06 ` Ira Weiny
2024-05-04 4:13 ` Dan Williams
2024-05-06 3:46 ` Ira Weiny
2024-03-24 23:18 ` [PATCH 06/26] cxl/port: Add Dynamic Capacity mode support to endpoint decoders ira.weiny
2024-03-26 16:35 ` fan
2024-04-05 19:50 ` Ira Weiny
2024-03-26 17:58 ` Dave Jiang
2024-04-05 20:34 ` Ira Weiny
2024-04-04 8:32 ` Jonathan Cameron
2024-04-05 20:56 ` Ira Weiny
2024-05-06 16:22 ` Dan Williams
2024-05-10 5:31 ` Ira Weiny
2024-04-10 20:33 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 07/26] cxl/port: Add dynamic capacity size " ira.weiny
2024-04-05 13:54 ` Jonathan Cameron
2024-05-03 17:09 ` Ira Weiny
2024-05-03 17:21 ` Dan Williams
2024-05-06 4:07 ` Ira Weiny
2024-04-10 22:50 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 08/26] cxl/mem: Expose device dynamic capacity capabilities ira.weiny
2024-03-25 23:40 ` Davidlohr Bueso
2024-03-26 18:30 ` fan
2024-04-04 8:44 ` Jonathan Cameron
2024-04-04 8:51 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 09/26] cxl/region: Add Dynamic Capacity CXL region support ira.weiny
2024-03-26 22:31 ` fan
2024-04-10 4:25 ` Ira Weiny
2024-03-27 17:27 ` Dave Jiang
2024-04-10 4:35 ` Ira Weiny
2024-04-04 10:26 ` Jonathan Cameron
2024-04-10 4:40 ` Ira Weiny
2024-03-24 23:18 ` [PATCH 10/26] cxl/events: Factor out event msgnum configuration Ira Weiny
2024-03-27 17:38 ` Dave Jiang
2024-04-04 15:07 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 11/26] cxl/pci: Delay event buffer allocation Ira Weiny
2024-03-25 22:26 ` Davidlohr Bueso
2024-03-27 17:38 ` Dave Jiang
2024-04-04 15:08 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 12/26] cxl/pci: Factor out interrupt policy check Ira Weiny
2024-03-27 17:41 ` Dave Jiang
2024-04-04 15:10 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 13/26] cxl/mem: Configure dynamic capacity interrupts ira.weiny
2024-03-26 23:12 ` fan
2024-04-10 4:48 ` Ira Weiny
2024-03-27 17:54 ` Dave Jiang
2024-04-10 5:26 ` Ira Weiny
2024-04-04 15:22 ` Jonathan Cameron
2024-04-10 5:34 ` Ira Weiny
2024-04-10 23:23 ` Alison Schofield
2024-05-06 16:56 ` Dan Williams
2024-03-24 23:18 ` [PATCH 14/26] cxl/region: Read existing extents on region creation ira.weiny
2024-03-26 23:27 ` fan
2024-04-10 5:46 ` Ira Weiny
2024-03-27 17:45 ` fan
2024-04-10 6:19 ` Ira Weiny
2024-03-27 18:31 ` Dave Jiang
2024-04-10 6:09 ` Ira Weiny
2024-04-02 13:57 ` Jørgen Hansen
2024-04-10 6:29 ` Ira Weiny
2024-04-04 16:04 ` Jonathan Cameron
2024-04-04 16:13 ` Jonathan Cameron
2024-04-10 17:44 ` Alison Schofield
2024-05-06 18:34 ` Dan Williams
2024-06-29 3:47 ` Ira Weiny
2024-03-24 23:18 ` [PATCH 15/26] range: Add range_overlaps() Ira Weiny
2024-03-25 18:33 ` David Sterba
2024-03-25 21:24 ` Davidlohr Bueso
2024-03-26 12:51 ` Johannes Thumshirn
2024-03-27 17:36 ` fan
2024-03-28 20:09 ` Dave Jiang
2024-04-04 16:06 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 16/26] cxl/extent: Realize extent devices ira.weiny
2024-03-27 22:34 ` fan
2024-03-28 21:11 ` Dave Jiang
2024-04-24 19:57 ` Ira Weiny
2024-04-04 16:32 ` Jonathan Cameron
2024-04-30 3:23 ` Ira Weiny
2024-05-02 21:12 ` Dan Williams
2024-05-06 4:35 ` Ira Weiny
2024-04-11 0:09 ` Alison Schofield
2024-05-07 1:30 ` Dan Williams
2024-03-24 23:18 ` [PATCH 17/26] dax/region: Create extent resources on DAX region driver load ira.weiny
2024-04-04 16:36 ` Jonathan Cameron
2024-04-09 16:22 ` fan
2024-05-07 2:31 ` Dan Williams
2024-03-24 23:18 ` [PATCH 18/26] cxl/mem: Handle DCD add & release capacity events ira.weiny
2024-04-04 17:03 ` Jonathan Cameron
2024-05-07 5:04 ` Dan Williams
2024-03-24 23:18 ` [PATCH 19/26] dax/bus: Factor out dev dax resize logic Ira Weiny
2024-04-04 17:15 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 20/26] dax: Document dax dev range tuple Ira Weiny
2024-04-01 17:06 ` Dave Jiang
2024-04-04 17:19 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 21/26] dax/region: Prevent range mapping allocation on sparse regions Ira Weiny
2024-04-01 17:07 ` Dave Jiang
2024-04-10 23:02 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 22/26] dax/region: Support DAX device creation on sparse DAX regions Ira Weiny
2024-04-04 17:36 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 23/26] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2024-04-01 17:56 ` Dave Jiang
2024-04-04 17:38 ` Jonathan Cameron
2024-04-10 17:03 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 24/26] tools/testing/cxl: Make event logs dynamic Ira Weiny
2024-03-24 23:18 ` [PATCH 25/26] tools/testing/cxl: Add DC Regions to mock mem data Ira Weiny
2024-03-24 23:18 ` [PATCH 26/26] tools/testing/cxl: Add Dynamic Capacity events Ira Weiny
2024-03-25 19:24 ` [PATCH 00/26] DCD: Add support for Dynamic Capacity Devices (DCD) fan
2024-03-28 5:20 ` Ira Weiny
2024-04-03 20:39 ` Jonathan Cameron
2024-04-04 10:20 ` Jonathan Cameron
2024-04-04 17:49 ` Jonathan Cameron
2024-05-01 23:49 ` Ira Weiny
2024-05-03 9:20 ` Jonathan Cameron
2024-05-06 4:24 ` Ira Weiny
2024-05-08 14:43 ` Jonathan Cameron
2024-04-10 18:01 ` Alison Schofield
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com \
--to=ira.weiny@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=clm@fb.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=dsterba@suse.com \
--cc=fan.ni@samsung.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=navneet.singh@intel.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox