Linux CXL
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: <alucerop@amd.com>, <linux-cxl@vger.kernel.org>,
	<dan.j.williams@intel.com>, <pieter.jansen-van-vuuren@amd.com>,
	<richard.hughes@amd.com>, <dinan.gunawardena@amd.com>
Cc: Alejandro Lucero <alejandro.lucero-palau@amd.com>
Subject: RE: [RFC PATCH 00/13] RFC: add Type2 device support
Date: Thu, 16 May 2024 17:08:17 -0700	[thread overview]
Message-ID: <66469ff1b8fbc_2c2629427@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <20240516081202.27023-1-alucerop@amd.com>

alucerop@ wrote:
> From: Alejandro Lucero <alejandro.lucero-palau@amd.com>
> 
> I need to start this RFC explaining not what the patchset does, but what we
> expected to obtain and currently is under doubt: to configure a CXL memory
> range advertised by our CXL network device and to use it "privately".
> 
> The main reason behind that privacy, but not the only one, is to avoid any
> arbitrary use by any user-space "client" with enough privileges for using a
> public interface to map the device memory and use it as regular memory. The
> reason is obvious: the device expects writes to such a memory in a specific
> format.
> 
> The doubt comes from the fact that after implementing the functionality exposed
> in this patchset, we realized the current expectation seems to be a BIOS/UEFI
> configuring HDM decoders or HPA ranges and passing memory ranges to the kernel,
> and with the kernel having a default action on that memory range based on the
> flag EFI_MEMORY_SP being set or not.

Lets clear up this confusion first.

My expectation for CXL accelerator provided memory is that it is marked "EFI
Reserved" in the memory map (or CDAT), not "EFI Conventional + Special Purpose
Attribute". Recall that the EFI_MEMORY_SP attribute is just a *hint* that OS may
*not* want to consider a memory range as part of the general purpose memory
pool. CXL Accelerator Memory is far from that definition. If a driver needs to
be loaded to effectively use a memory range that is not a range that can be
transparently converted to "EFI Conventional Memory".

> If it is not set, the memory range will be part of the kernel memory
> management, what we do not want for sure. If the flag is set, a DAX device
> will be created which allows an user-space client to map such a memory through
> the DAX device API, what we also prefer to avoid. I know this is likely going
> to face opposition, but we see this RFC as the opportunity for discussing the
> matter and, if it turns out to be the case, to be guided towards the proper
> solution accepted by the maintainers/community. This patchset does not tackle
> this default kernel behaviour although we already have some ideas and
> workarounds for the short term. We'll be happy to discuss this openly.

It may have been subtle, but even in my initial RFC [1]. I was explicitly
skipping exposing accelerator memory via device-dax:

@@ -3085,6 +3183,15 @@ static int cxl_region_probe(struct device *dev)
 					p->res->start, p->res->end, cxlr,
 					is_system_ram) > 0)
 			return 0;
+
+		/*
+		 * HDM-D[B] (device-memory) regions have accelerator
+		 * specific usage, skip device-dax registration.
+		 */
+		if (cxlr->type == CXL_DECODER_DEVMEM)
+			return 0;
+
+		/* HDM-H routes to device-dax */
 		return devm_cxl_add_dax_region(cxlr);
 	default:
 		dev_dbg(&cxlr->dev, "unsupported region mode: %d\n",

[1] https://lore.kernel.org/r/168592159835.1948938.1647215579839222774.stgit@dwillia2-xfh.jf.intel.com

> So, this patchset assumes a BIOS/UEFI not programming the HDM decoder or HPA
> ranges for a CXL Type2 device.

At least for Linux it should not care whether BIOS maps it or not. The
expectation is that whichever agent maps it, BIOS or Linux CXL core, that agent
honors the memory type specified in the device CDAT. I.e. the "Device Scoped EFI
Memory Type Structure (DSEMTS)" in the accelerator CDAT should pass "2" in the
"EFI Memory Type and Attribute" field, where "2" indicates "EFI Reserved" memory
type for any HPA that maps this device's DPA.

> above, a Type2 device added after boot, that is hotplugged, will likely need the
> changes added here. Exporting some of the CXL core for vendor drivers is also
> required for avoiding code duplication when reading DVSEC or decoder registers.
> Finally if there is no such HPA range assigned, whatever the reason, this patchset
> offers some way of obtaining one and/or finding out what is the problem behind the
> lack of such HPA range.

Yes, that was the whole point of the "Device Memory Setup" thread [2] to show a
plausible way to reuse the common core infrastructure for accelerators with
CXL.mem capability.

[2] https://lore.kernel.org/all/168592149709.1948938.8663425987110396027.stgit@dwillia2-xfh.jf.intel.com/

It would help me if you can summarize what you did and did not adopt from that
proposal, i.e. where it helped and where it missed the mark for your use case.

[..]
> Keeping with kernel rules of having a client using any new functionality added,
> a CXL type2 driver is increasingly added. This is not a driver supporting a real
> device but an emulated one in QEMU, with the QEMU patch following this patchset.

Lets start with the deltas compared to [2], the observation there is that much
of the CXL core is reusable for this type-2 use case, and even improves the
organization of the core. The consumer of the refactoring just ends up being a
few new helpers around the core.

> The reason for adding such a Type2 driver instead of changes to a current kernel
> driver or a new driver is threefold:
> 
> 1) the expected kernel driver to use the functionality added is a netdev one.
>    Current internal CXL support is a codesign effort, therefore software and
>    hardware evolving in lockstep. Adding changes to a netdev driver requires the
>    full functionality and doing things following the netdev standard which is
>    not the best option for this development stage.

Not sure what "netdev standard" means, and not sure it matters for a discussion
that is constrained to how the CXL core should evolve to accommodate
accelerator-local memory.

> 2) Waiting for completing the development will delay the required Type2 support,
>    and most of the required changes are unrelated to specific CXL usage by any
>    vendor driver.

Again, [2] already path cleared the idea that early plumbing of the CXL core for
type-2 is in scope. Just need to make sure this effort is focused on general CXL
specification use cases.

> 
> 3) Type2 support will need some testing infrastructure, unit tests for ensuring
>    Type2 devices are working, and module tests for ensuring CXL core changes do
>    not affect Type2 support.
> 
> I hope these reasons are convincing enough.

I am happy to have general specification discussions about core functionality
that all CXL accelerators need and plumbing those ABIs to be exercised via
something like cxl_test.

I expect that cxl_test only covers coarse grained inter-module ABIs and that the
fine details will need to wait until the accelerator specifics are closer to
being disclosed and the real driver patches can start flowing. In other words, I
am not keen to see QEMU used as a way to proxy features into the kernel without
disclosing what the real world consumer actually needs.

Now, that said, there may also be opportunity for targeted extensions of the
QEMU CXL memory-expander device, like DPA capacity that is mapped as "EFI
Reserved" device-memory, but lets talk about the specifics here.

Also recall that the current CXL driver does not yet support parsing CXL PMEM
region labels. Support for that ends up looking quite similar to an accelerator
asking for a specific DPA range to be dynamically mapped by the CXL core. So I
think we can get quite far along the path to enabling type-2-CXL.mem just by
flushing out some of the generic type-3-CXL.mem use cases.

> I have decided to follow a gradual approach for adding such a driver using the
> exported CXL functions and structs. I think it is easier to review the driver
> when the new funcionality is added than to add the driver at the end, but not a
> big deal if my approach is not liked.
> 
> The patches are based on a patchset sent by Dan Williams [1] which was just
> partially integrated, most related to making things ready for Type2 but none
> related to specific Type2 support. Those patches based on Dan´s work have Dan´s
> signing, so Dan, tell me if you do not want me to add you.

It is fine to reuse those patches, but please do include:

Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Link: <lore-link for copied code>

...so I can try to remember if I still think the source patch was a good idea or
not. Again, a summary analysis of what did and did not work for you from that
posting would help me get a running start at reviewing this.

> Type2 implies, I think, only the related driver to manage the CXL specifics.
> This means no user space intervention and therefore no sysfs files. This makes
> easy to avoid the current problem of most of the sysfs related code expecting
> Type3 devices. If I´m wrong in this regard, such a code will need further
> changes.

Not sure what you mean here, I am hoping that sysfs ABI for enumerating objects
like memdevs, decoders, and regions "just works" and is hidden from the
accelerator vendor driver.

> A final note about CXL.cache is needed. This patchset does not cover it at all,
> although the emulated Type2 device advertises it. From the kernel point of view
> supporting CXL.cache will imply to be sure the CXL path supports what the Type2
> device needs. A device accelerator will likely be connected to a Root Switch,
> but other configurations can not be discarded. Therefore the kernel will need to
> check not just HPA, DPA, interleave and granularity, but also the available
> CXL.cache support and resources in each switch in the CXL path to the Type2
> device. I expect to contribute to this support in the following months, and
> it would be good to discuss about it when possible.

Yup, especially CXL 3.x Cache ID management. Lets talk about some ways to plumb
that, because it is also a general CXL specification capability. Another topic
is what is needed for CXL Protocol and Component error handling, I expect some
of that to be CXL core common and some of that to be accelertor driver
augmented.

If you think we will still be talking about this driver enabling in September
I would invite you to submit this as a topic for the CXL uConf.

https://lpc.events/event/18/contributions/1673/

  parent reply	other threads:[~2024-05-17  0:08 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-16  8:11 [RFC PATCH 00/13] RFC: add Type2 device support alucerop
2024-05-16  8:11 ` [RFC PATCH 01/13] cxl: move header files for absolute references alucerop
2024-06-12  4:27   ` Dan Williams
2024-06-12  4:30     ` Christoph Hellwig
2024-06-12  5:54       ` Alejandro Lucero Palau
2024-06-12 10:07         ` Jonathan Cameron
2024-06-12 13:36           ` Alejandro Lucero Palau
2024-06-12 21:18       ` Dan Williams
2024-06-13 11:45         ` Alejandro Lucero Palau
2024-06-14  1:22           ` Dan Williams
2024-06-14  8:54             ` Alejandro Lucero Palau
2024-06-12  5:42     ` Alejandro Lucero Palau
2024-05-16  8:11 ` [RFC PATCH 02/13] cxl: add type2 device basic support alucerop
2024-05-17 14:30   ` Jonathan Cameron
2024-05-20 15:46     ` Alejandro Lucero Palau
2024-06-12  4:43   ` Dan Williams
2024-06-12  6:04     ` Alejandro Lucero Palau
2024-06-12 14:17       ` Alejandro Lucero Palau
2024-06-12 18:29     ` Alison Schofield
2024-06-12 18:58       ` Dan Williams
2024-06-12  7:13   ` Alejandro Lucero Palau
2024-05-16  8:11 ` [RFC PATCH 03/13] cxl: export core function for type2 devices alucerop
2024-06-12  4:50   ` Dan Williams
2024-06-12  6:07     ` Alejandro Lucero Palau
2024-05-16  8:11 ` [RFC PATCH 04/13] cxl: allow devices without mailbox capability alucerop
2024-05-17 14:33   ` Jonathan Cameron
2024-05-20 15:49     ` Alejandro Lucero Palau
2024-05-16  8:11 ` [RFC PATCH 05/13] cxl: fix check about pmem resource alucerop
2024-05-17 14:40   ` Jonathan Cameron
2024-05-20 15:41     ` Alejandro Lucero Palau
2024-05-16  8:11 ` [RFC PATCH 06/13] cxl: support type2 memdev creation alucerop
2024-05-16  8:11 ` [RFC PATCH 07/13] cxl: add functions for exclusive access to endpoint port topology alucerop
2024-06-12  7:22   ` Alejandro Lucero Palau
2024-05-16  8:11 ` [RFC PATCH 08/13] cxl: add cxl_get_hpa_freespace alucerop
2024-06-12  7:27   ` Alejandro Lucero Palau
2024-05-16  8:11 ` [RFC PATCH 09/13] cxl: add cxl_request_dpa alucerop
2024-06-12  7:29   ` Alejandro Lucero Palau
2024-05-16  8:11 ` [RFC PATCH 10/13] cxl: make region type based on endpoint type alucerop
2024-05-16  8:12 ` [RFC PATCH 11/13] cxl: allow automatic region creation by type2 drivers alucerop
2024-06-12  7:32   ` Alejandro Lucero Palau
2024-05-16  8:12 ` [RFC PATCH 12/13] cxl: preclude device memory to be used for dax alucerop
2024-05-16  8:12 ` [RFC PATCH 13/13] cxl: test type2 private mapping alucerop
2024-05-17  0:08 ` Dan Williams [this message]
2024-05-18  9:59   ` [RFC PATCH 00/13] RFC: add Type2 device support Alejandro Lucero Palau
2024-05-21  4:56     ` Dan Williams
2024-05-22 16:38       ` Alejandro Lucero Palau
2024-05-31 10:52         ` Alejandro Lucero Palau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66469ff1b8fbc_2c2629427@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=alejandro.lucero-palau@amd.com \
    --cc=alucerop@amd.com \
    --cc=dinan.gunawardena@amd.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=pieter.jansen-van-vuuren@amd.com \
    --cc=richard.hughes@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox