* [PATCH] Documentation/driver-api/cxl: device hotplug section
@ 2025-12-18 14:46 Gregory Price
2025-12-18 15:26 ` Jonathan Cameron
0 siblings, 1 reply; 4+ messages in thread
From: Gregory Price @ 2025-12-18 14:46 UTC (permalink / raw)
To: linux-cxl
Cc: linux-doc, linux-kernel, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
corbet, gourry, kernel-team, alejandro.lucero-palau
Describe cxl memory device hotplug implications, in particular how the
platform CEDT CFMWS must be described to support successful hot-add of
memory devices.
Signed-off-by: Gregory Price <gourry@gourry.net>
---
Documentation/driver-api/cxl/index.rst | 1 +
.../cxl/platform/device-hotplug.rst | 77 +++++++++++++++++++
2 files changed, 78 insertions(+)
create mode 100644 Documentation/driver-api/cxl/platform/device-hotplug.rst
diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-api/cxl/index.rst
index c1106a68b67c..5a734988a5af 100644
--- a/Documentation/driver-api/cxl/index.rst
+++ b/Documentation/driver-api/cxl/index.rst
@@ -30,6 +30,7 @@ that have impacts on each other. The docs here break up configurations steps.
platform/acpi
platform/cdat
platform/example-configs
+ platform/device-hotplug
.. toctree::
:maxdepth: 2
diff --git a/Documentation/driver-api/cxl/platform/device-hotplug.rst b/Documentation/driver-api/cxl/platform/device-hotplug.rst
new file mode 100644
index 000000000000..9af8988bd47a
--- /dev/null
+++ b/Documentation/driver-api/cxl/platform/device-hotplug.rst
@@ -0,0 +1,77 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================
+CXL Device Hotplug
+==================
+
+Device hotplug refers to *physical* hotplug of a device (addition or removal
+of a physical device from the machine).
+
+Hot-Remove
+==========
+Hot removal of a device typically requires careful removal of software
+constructs (memory regions, associated drivers) which manage these devices.
+
+Hard-removing a CXL.mem device without carefully tearing down driver stacks
+is likely to cause the system to machine-check (or at least SIGBUS if memory
+access is limited to user space).
+
+Memory Device Hot-Add
+=====================
+Hot-adding a memory device requires that the memory associated with that
+device fits in a pre-defined (*static*) CXL Fixed Memory Window in the
+:doc:`CEDT<acpi/cedt>`.
+
+There are two basic hot-add scenarios which may occur.
+
+Device Present at Boot
+----------------------
+A device present at boot likely had its capacity reported in the
+:doc:`CEDT<acpi/cedt>`. If a device is removed and a new device hotplugged,
+the capacity of the new device will be limited to the original CFMWS capacity.
+
+Adding a device larger than the original device will cause memory region
+creation to fail if the region size is greater than the CFMWS size.
+
+The CFMWS is *static* and cannot be adjusted. Platforms which may expect
+different sized devices to be hotplugged must allocate sufficient CFMWS space
+*at boot time* to cover all future expected devices.
+
+No CXL Device Present at Boot
+-----------------------------
+When no CXL device is present on boot, most platforms omit the CFMWS in the
+:doc:`CEDT<acpi/cedt>`. When this occurs, hot-add is not possible.
+
+For a platform to support hot-add of a memory device, it must allocate a
+CEDT CFMWS region with sufficient memory capacity to cover all future
+potentially added capacity.
+
+Switches in the fabric should report the max possible memory capacity
+expected to be hot-added so that platform software may construct the
+appropriately sized CFMWS.
+
+Interleave Sets
+===============
+
+Host Bridge Interleave
+----------------------
+Host-bridge interleaved memory regions are defined *statically* in the
+:doc:`CEDT<acpi/cedt>`. To apply cross-host-bridge interleave, a CFMWS entry
+describing that interleave must have been provided *at boot*. Hotplugged
+devices cannot add host-bridge interleave capabilities at hotplug time.
+
+See the :doc:`Flexible CEDT Configuration<example-configurations/flexible>`
+example to see how a platform can provide this kind of flexibility regarding
+hotplugged memory devices.
+
+Platform vendors should work with switch vendors to work out how this
+HPA space reservation should work when one or more interleave options are
+intended to be presented to a host.
+
+HDM Interleave
+--------------
+Decoder-applied interleave can flexibly handle hotplugged devices, as decoders
+can be re-programmed after hotplug.
+
+To add or remove a device to/from an existing HDM-applied interleaved region,
+that region must be torn down an re-created.
--
2.52.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] Documentation/driver-api/cxl: device hotplug section
2025-12-18 14:46 [PATCH] Documentation/driver-api/cxl: device hotplug section Gregory Price
@ 2025-12-18 15:26 ` Jonathan Cameron
2025-12-18 16:02 ` Gregory Price
0 siblings, 1 reply; 4+ messages in thread
From: Jonathan Cameron @ 2025-12-18 15:26 UTC (permalink / raw)
To: Gregory Price
Cc: linux-cxl, linux-doc, linux-kernel, dave, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
corbet, kernel-team, alejandro.lucero-palau
On Thu, 18 Dec 2025 09:46:36 -0500
Gregory Price <gourry@gourry.net> wrote:
> Describe cxl memory device hotplug implications, in particular how the
> platform CEDT CFMWS must be described to support successful hot-add of
> memory devices.
>
> Signed-off-by: Gregory Price <gourry@gourry.net>
Hi Gregory,
Thanks for drawing this up.
> ---
> Documentation/driver-api/cxl/index.rst | 1 +
> .../cxl/platform/device-hotplug.rst | 77 +++++++++++++++++++
> 2 files changed, 78 insertions(+)
> create mode 100644 Documentation/driver-api/cxl/platform/device-hotplug.rst
>
> diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-api/cxl/index.rst
> index c1106a68b67c..5a734988a5af 100644
> --- a/Documentation/driver-api/cxl/index.rst
> +++ b/Documentation/driver-api/cxl/index.rst
> @@ -30,6 +30,7 @@ that have impacts on each other. The docs here break up configurations steps.
> platform/acpi
> platform/cdat
> platform/example-configs
> + platform/device-hotplug
>
> .. toctree::
> :maxdepth: 2
> diff --git a/Documentation/driver-api/cxl/platform/device-hotplug.rst b/Documentation/driver-api/cxl/platform/device-hotplug.rst
> new file mode 100644
> index 000000000000..9af8988bd47a
> --- /dev/null
> +++ b/Documentation/driver-api/cxl/platform/device-hotplug.rst
> @@ -0,0 +1,77 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==================
> +CXL Device Hotplug
> +==================
> +
> +Device hotplug refers to *physical* hotplug of a device (addition or removal
> +of a physical device from the machine).
> +
> +Hot-Remove
> +==========
> +Hot removal of a device typically requires careful removal of software
> +constructs (memory regions, associated drivers) which manage these devices.
> +
> +Hard-removing a CXL.mem device without carefully tearing down driver stacks
> +is likely to cause the system to machine-check (or at least SIGBUS if memory
> +access is limited to user space).
> +
> +Memory Device Hot-Add
> +=====================
> +Hot-adding a memory device requires that the memory associated with that
> +device fits in a pre-defined (*static*) CXL Fixed Memory Window in the
> +:doc:`CEDT<acpi/cedt>`.
> +
> +There are two basic hot-add scenarios which may occur.
> +
> +Device Present at Boot
> +----------------------
> +A device present at boot likely had its capacity reported in the
> +:doc:`CEDT<acpi/cedt>`. If a device is removed and a new device hotplugged,
The concept of reporting in CEDT is a little vague. Perhaps expand on that a little
with something like:
A device present at boot will be associated with a CFMWS reported in
@doc:`CEDT<acpi/cedt>` and that CFMWS may match the size of the device.
> +the capacity of the new device will be limited to the original CFMWS capacity.
> +
> +Adding a device larger than the original device will cause memory region
> +creation to fail if the region size is greater than the CFMWS size.
Adding capacity larger than the original device
(can add a subset of the new device capacity)
> +
> +The CFMWS is *static* and cannot be adjusted. Platforms which may expect
> +different sized devices to be hotplugged must allocate sufficient CFMWS space
> +*at boot time* to cover all future expected devices.
> +
> +No CXL Device Present at Boot
> +-----------------------------
> +When no CXL device is present on boot, most platforms omit the CFMWS in the
> +:doc:`CEDT<acpi/cedt>`. When this occurs, hot-add is not possible.
Relax to 'some platforms'
Just to future proof the doc for when people start mostly doing the sensible thing.
> +
> +For a platform to support hot-add of a memory device, it must allocate a
For a platofmr to support hot-add of a full memory device
(see above for partial capacity being fine)
> +CEDT CFMWS region with sufficient memory capacity to cover all future
> +potentially added capacity.
> +
> +Switches in the fabric should report the max possible memory capacity
> +expected to be hot-added so that platform software may construct the
> +appropriately sized CFMWS.
How do switches report this? I don't think they can as it really has nothing
to do with the switch beyond maybe how many DSPs it has (which incidentally
is what is used to work out space for PCI HP where the code divides up space
left over space between HP DSPs.).
Obviously this excludes the weird switches that are out there than pretend
to be a single memory device as those are not switches at all as far
as Linux is concerned.
> +
> +Interleave Sets
> +===============
> +
> +Host Bridge Interleave
> +----------------------
> +Host-bridge interleaved memory regions are defined *statically* in the
> +:doc:`CEDT<acpi/cedt>`. To apply cross-host-bridge interleave, a CFMWS entry
> +describing that interleave must have been provided *at boot*. Hotplugged
> +devices cannot add host-bridge interleave capabilities at hotplug time.
> +
> +See the :doc:`Flexible CEDT Configuration<example-configurations/flexible>`
> +example to see how a platform can provide this kind of flexibility regarding
> +hotplugged memory devices.
> +
> +Platform vendors should work with switch vendors to work out how this
> +HPA space reservation should work when one or more interleave options are
> +intended to be presented to a host.
Same as above. Nothing to do with switches as far as I understand things
beyond them providing fan out. So if you have
HB0 HB1
RP0 RP1 RP2
| | |
Empty Empty USP
_______|_______
| | | |
DSP DSP DSP DSP
| | | |
All empty
You might provide more room for devices below HB1 than HB0 if you don't expect
to see switches being hot added.
Jonathan
> +
> +HDM Interleave
> +--------------
> +Decoder-applied interleave can flexibly handle hotplugged devices, as decoders
> +can be re-programmed after hotplug.
> +
> +To add or remove a device to/from an existing HDM-applied interleaved region,
> +that region must be torn down an re-created.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Documentation/driver-api/cxl: device hotplug section
2025-12-18 15:26 ` Jonathan Cameron
@ 2025-12-18 16:02 ` Gregory Price
2025-12-19 10:49 ` Jonathan Cameron
0 siblings, 1 reply; 4+ messages in thread
From: Gregory Price @ 2025-12-18 16:02 UTC (permalink / raw)
To: Jonathan Cameron
Cc: linux-cxl, linux-doc, linux-kernel, dave, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
corbet, kernel-team, alejandro.lucero-palau
On Thu, Dec 18, 2025 at 03:26:16PM +0000, Jonathan Cameron wrote:
> On Thu, 18 Dec 2025 09:46:36 -0500
> Gregory Price <gourry@gourry.net> wrote:
>
> > Describe cxl memory device hotplug implications, in particular how the
> > platform CEDT CFMWS must be described to support successful hot-add of
> > memory devices.
> >
> > Signed-off-by: Gregory Price <gourry@gourry.net>
>
> Hi Gregory,
>
> Thanks for drawing this up.
ack on most of your notes, discussion on platform/switch stuff
> > +CEDT CFMWS region with sufficient memory capacity to cover all future
> > +potentially added capacity.
> > +
> > +Switches in the fabric should report the max possible memory capacity
> > +expected to be hot-added so that platform software may construct the
> > +appropriately sized CFMWS.
>
> How do switches report this? I don't think they can as it really has nothing
> to do with the switch beyond maybe how many DSPs it has (which incidentally
> is what is used to work out space for PCI HP where the code divides up space
> left over space between HP DSPs.).
>
> Obviously this excludes the weird switches that are out there than pretend
> to be a single memory device as those are not switches at all as far
> as Linux is concerned.
>
Good point - in reality, it probably should say something like:
```
A hot-plug capable CXL memory device should report the maximum possible
capacity for the device in the CEDT CFMWS, rather than the CFMWS memory
region to the capacity present at boot time.
To support memory device hotplug directly on the host bridge (or on a
switch downstream of a HB without built-in memory device capabilities),
a platform must construct a CEDT CFMWS at boot with sufficient resources
to support the max possible (or expected) hotplug memory capacity.
```
In one case, an attached device which supports hotplug (which somewhat
implies a switch is present), is responsible for presenting the platform
the resources. In theory, at least, a platform doesn't need to do
anything here if the device vendor has set things up correctly.
In the second case, the platform is responsible for making that decision,
at it's on the ODM+CPU manufacturers to make sufficient BIOS/EFI/etc
options available to support this kind of pre-allocation lacking any
attached device at boot. (not sure whether i should add this explicitly
above).
> > +Platform vendors should work with switch vendors to work out how this
> > +HPA space reservation should work when one or more interleave options are
> > +intended to be presented to a host.
>
> Same as above. Nothing to do with switches as far as I understand things
> beyond them providing fan out. So if you have
> HB0 HB1
> RP0 RP1 RP2
> | | |
> Empty Empty USP
> _______|_______
> | | | |
> DSP DSP DSP DSP
> | | | |
> All empty
>
> You might provide more room for devices below HB1 than HB0 if you don't expect
> to see switches being hot added.
>
Same note from above
also *yoink* your ascii :]
~Gregory
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Documentation/driver-api/cxl: device hotplug section
2025-12-18 16:02 ` Gregory Price
@ 2025-12-19 10:49 ` Jonathan Cameron
0 siblings, 0 replies; 4+ messages in thread
From: Jonathan Cameron @ 2025-12-19 10:49 UTC (permalink / raw)
To: Gregory Price
Cc: linux-cxl, linux-doc, linux-kernel, dave, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
corbet, kernel-team, alejandro.lucero-palau
On Thu, 18 Dec 2025 11:02:30 -0500
Gregory Price <gourry@gourry.net> wrote:
> On Thu, Dec 18, 2025 at 03:26:16PM +0000, Jonathan Cameron wrote:
> > On Thu, 18 Dec 2025 09:46:36 -0500
> > Gregory Price <gourry@gourry.net> wrote:
> >
> > > Describe cxl memory device hotplug implications, in particular how the
> > > platform CEDT CFMWS must be described to support successful hot-add of
> > > memory devices.
> > >
> > > Signed-off-by: Gregory Price <gourry@gourry.net>
> >
> > Hi Gregory,
> >
> > Thanks for drawing this up.
>
> ack on most of your notes, discussion on platform/switch stuff
>
> > > +CEDT CFMWS region with sufficient memory capacity to cover all future
> > > +potentially added capacity.
> > > +
> > > +Switches in the fabric should report the max possible memory capacity
> > > +expected to be hot-added so that platform software may construct the
> > > +appropriately sized CFMWS.
> >
> > How do switches report this? I don't think they can as it really has nothing
> > to do with the switch beyond maybe how many DSPs it has (which incidentally
> > is what is used to work out space for PCI HP where the code divides up space
> > left over space between HP DSPs.).
> >
> > Obviously this excludes the weird switches that are out there than pretend
> > to be a single memory device as those are not switches at all as far
> > as Linux is concerned.
> >
>
> Good point - in reality, it probably should say something like:
>
> ```
> A hot-plug capable CXL memory device should report the maximum possible
> capacity for the device in the CEDT CFMWS, rather than the CFMWS memory
> region to the capacity present at boot time.
Might want to broaden to "the device or possible hot replacements"
>
> To support memory device hotplug directly on the host bridge (or on a
> switch downstream of a HB without built-in memory device capabilities),
> a platform must construct a CEDT CFMWS at boot with sufficient resources
> to support the max possible (or expected) hotplug memory capacity.
> ```
>
> In one case, an attached device which supports hotplug (which somewhat
> implies a switch is present), is responsible for presenting the platform
I'd write this to allow for RP hotplug as well. Might not be common yet
but who wants to remember to update the doc when that changes :)
> the resources. In theory, at least, a platform doesn't need to do
> anything here if the device vendor has set things up correctly.
>
> In the second case, the platform is responsible for making that decision,
> at it's on the ODM+CPU manufacturers to make sufficient BIOS/EFI/etc
> options available to support this kind of pre-allocation lacking any
> attached device at boot. (not sure whether i should add this explicitly
> above).
Yeah, this is will be bios menu / reflashing the bios stuff or config
files in flash. Similar to happens for the big PCI storage servers with lots
of hotplug ports where we need to make space in PCI enumeration for stuff
that isn't there yet. CXL brings a few extra corner cases but fundamentally
it's a similar problem.
>
> > > +Platform vendors should work with switch vendors to work out how this
> > > +HPA space reservation should work when one or more interleave options are
> > > +intended to be presented to a host.
> >
> > Same as above. Nothing to do with switches as far as I understand things
> > beyond them providing fan out. So if you have
> > HB0 HB1
> > RP0 RP1 RP2
> > | | |
> > Empty Empty USP
> > _______|_______
> > | | | |
> > DSP DSP DSP DSP
> > | | | |
> > All empty
> >
> > You might provide more room for devices below HB1 than HB0 if you don't expect
> > to see switches being hot added.
> >
>
> Same note from above
>
> also *yoink* your ascii :]
I hope you tidied it up! That's really ugly :(
Jonathan
>
> ~Gregory
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-12-19 10:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-18 14:46 [PATCH] Documentation/driver-api/cxl: device hotplug section Gregory Price
2025-12-18 15:26 ` Jonathan Cameron
2025-12-18 16:02 ` Gregory Price
2025-12-19 10:49 ` Jonathan Cameron
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).