From: Dan Williams <dan.j.williams@intel.com>
To: Alejandro Lucero Palau <alucerop@amd.com>,
Dan Williams <dan.j.williams@intel.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: <linux-cxl@vger.kernel.org>, Dave Jiang <dave.jiang@intel.com>,
Ira Weiny <ira.weiny@intel.com>
Subject: Re: [PATCH 3/4] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
Date: Fri, 31 Jan 2025 15:54:22 -0800 [thread overview]
Message-ID: <679d62ae30f0_2d2c29475@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <30ba3e11-0ba7-6bc6-38ac-1ba0c577f651@amd.com>
Alejandro Lucero Palau wrote:
>
> On 1/17/25 18:23, Dan Williams wrote:
> > Jonathan Cameron wrote:
> >> On Thu, 16 Jan 2025 22:10:44 -0800
> >> Dan Williams <dan.j.williams@intel.com> wrote:
> >>
> >>> The pending efforts to add CXL Accelerator (type-2) device [1], and
> >>> Dynamic Capacity (DCD) support [2], tripped on the
> >>> no-longer-fit-for-purpose design in the CXL subsystem for tracking
> >>> device-physical-address (DPA) metadata. Trip hazards include:
> >>>
> >>> - CXL Memory Devices need to consider a PMEM partition, but Accelerator
> >>> devices with CXL.mem likely do not in the common case.
> >>>
> >>> - CXL Memory Devices enumerate DPA through Memory Device mailbox
> >>> commands like Partition Info, Accelerators devices do not.
> >>>
> >>> - CXL Memory Devices that support DCD support more than 2 partitions.
> >>> Some of the driver algorithms are awkward to expand to > 2 partition
> >>> cases.
> >>>
> >>> - DPA performance data is a general capability that can be shared with
> >>> accelerators, so tracking it in 'struct cxl_memdev_state' is no longer
> >>> suitable.
> >>>
> >>> - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a
> >>> memory property, it should be phased in favor of a partition id and
> >>> the memory property comes from the partition info.
> >>>
> >>> Towards cleaning up those issues and allowing a smoother landing for the
> >>> aforementioned pending efforts, introduce a 'struct cxl_dpa_partition'
> >>> array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared
> >>> way for Memory Devices and Accelerators to initialize the DPA information
> >>> in 'struct cxl_dev_state'.
> >>>
> >>> For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to
> >>> get the new data structure initialized, and cleanup some qos_class init.
> >>> Follow on patches will go further to use the new data structure to
> >>> cleanup algorithms that are better suited to loop over all possible
> >>> partitions.
> >>>
> >>> cxl_dpa_setup() follows the locking expectations of mutating the device
> >>> DPA map, and is suitable for Accelerator drivers to use. Accelerators
> >>> likely only have one hardcoded 'ram' partition to convey to the
> >>> cxl_core.
> >>>
> >>> Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1]
> >>> Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2]
> >>> Cc: Dave Jiang <dave.jiang@intel.com>
> >>> Cc: Alejandro Lucero <alucerop@amd.com>
> >>> Cc: Ira Weiny <ira.weiny@intel.com>
> >>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >> Hi Dan,
> >>
> >> In basic form this seems fine, but I find the nr_paritions variable usage very
> >> counter intuitive. It's just how many we configured not how many there
> >> are, potentially with 0 size (so not a partition). I'd be happier if we
> >> can avoid that by just prefilling the lot with zero size and filling in
> >> the ones we want. So zero size means doesn't exist and use an iterator where
> >> appropriate to skip the zero size ones.
> > The PMEM-only device case did give me pause. Is that 2 partitions with a
> > zero-sized first partition, or is that just 1 partition?
>
>
> I was wrong about the code being broken for this case.
>
> The code would create two partitions, at least for the case of partition
> alignment being 0, with the first one having 0 size.
That was the feedback from Jonathan from v1 => v2. Drop zero-size
partitions from the tracking.
> This is all based on data/code based on mbox commands. Without mbox this
> partition info needs to be hardcoded or created somehow by the accel
> driver by its own means, so it is good to know the code expects such a
> 0-size partition for the pmem-only case and the nr_partitions should be
> set accordingly. Not what my type2 patchset needs now, but I bet this
> will need to be set properly by a coming accel driver.
An accelerator does not need to worry about passing in a 0-sized pmem
partition, it can just register the one DPA range that it cares about.
nr_partitions means that the partittion array has entry
[0]..[nr_partitions-1] filled with non-zero data, each with a distinct
operation mode, and in contiguous order starting from zero.
The only place Ira and I identified where this potentially runs into
trouble is if a device places a gap between partitions, or if we want to
skip over "shared" capacity in the first round of DCD support. Might
just need to mandate that userspace skip over "shared" capacity for now.
next prev parent reply other threads:[~2025-01-31 23:54 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-17 6:10 [PATCH 0/4] cxl: DPA partition metadata is a mess Dan Williams
2025-01-17 6:10 ` [PATCH 1/4] cxl: Remove the CXL_DECODER_MIXED mistake Dan Williams
2025-01-17 10:03 ` Jonathan Cameron
2025-01-17 17:47 ` Dan Williams
2025-01-17 10:24 ` Alejandro Lucero Palau
2025-01-17 17:54 ` Dan Williams
2025-01-17 18:45 ` Ira Weiny
2025-01-17 6:10 ` [PATCH 2/4] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers Dan Williams
2025-01-17 10:20 ` Jonathan Cameron
2025-01-17 10:23 ` Jonathan Cameron
2025-01-17 17:55 ` Dan Williams
2025-01-17 13:33 ` Alejandro Lucero Palau
2025-01-17 20:47 ` Dan Williams
2025-01-17 6:10 ` [PATCH 3/4] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' Dan Williams
2025-01-17 10:52 ` Jonathan Cameron
2025-01-17 13:38 ` Alejandro Lucero Palau
2025-01-17 18:23 ` Dan Williams
2025-01-17 20:32 ` Ira Weiny
2025-01-20 12:24 ` Alejandro Lucero Palau
2025-01-31 23:54 ` Dan Williams [this message]
2025-01-17 15:58 ` Alejandro Lucero Palau
2025-01-17 22:52 ` Dan Williams
2025-01-17 20:42 ` Ira Weiny
2025-01-17 22:08 ` Ira Weiny
2025-01-31 23:39 ` Dan Williams
2025-01-17 6:10 ` [PATCH 4/4] cxl: Make cxl_dpa_alloc() DPA partition number agnostic Dan Williams
2025-01-17 11:12 ` Jonathan Cameron
2025-01-17 18:37 ` Dan Williams
2025-01-17 15:42 ` Alejandro Lucero Palau
2025-01-17 20:57 ` Dan Williams
2025-01-20 12:39 ` Alejandro Lucero Palau
2025-02-01 0:08 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=679d62ae30f0_2d2c29475@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=alucerop@amd.com \
--cc=dave.jiang@intel.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox