From: Dan Williams <dan.j.williams@intel.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Dan Williams <dan.j.williams@intel.com>
Cc: <linux-cxl@vger.kernel.org>, Dave Jiang <dave.jiang@intel.com>,
"Alejandro Lucero" <alucerop@amd.com>,
Ira Weiny <ira.weiny@intel.com>
Subject: Re: [PATCH 3/4] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
Date: Fri, 17 Jan 2025 10:23:52 -0800 [thread overview]
Message-ID: <678aa0381324e_20fa2942a@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <20250117105254.00001dd4@huawei.com>
Jonathan Cameron wrote:
> On Thu, 16 Jan 2025 22:10:44 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > The pending efforts to add CXL Accelerator (type-2) device [1], and
> > Dynamic Capacity (DCD) support [2], tripped on the
> > no-longer-fit-for-purpose design in the CXL subsystem for tracking
> > device-physical-address (DPA) metadata. Trip hazards include:
> >
> > - CXL Memory Devices need to consider a PMEM partition, but Accelerator
> > devices with CXL.mem likely do not in the common case.
> >
> > - CXL Memory Devices enumerate DPA through Memory Device mailbox
> > commands like Partition Info, Accelerators devices do not.
> >
> > - CXL Memory Devices that support DCD support more than 2 partitions.
> > Some of the driver algorithms are awkward to expand to > 2 partition
> > cases.
> >
> > - DPA performance data is a general capability that can be shared with
> > accelerators, so tracking it in 'struct cxl_memdev_state' is no longer
> > suitable.
> >
> > - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a
> > memory property, it should be phased in favor of a partition id and
> > the memory property comes from the partition info.
> >
> > Towards cleaning up those issues and allowing a smoother landing for the
> > aforementioned pending efforts, introduce a 'struct cxl_dpa_partition'
> > array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared
> > way for Memory Devices and Accelerators to initialize the DPA information
> > in 'struct cxl_dev_state'.
> >
> > For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to
> > get the new data structure initialized, and cleanup some qos_class init.
> > Follow on patches will go further to use the new data structure to
> > cleanup algorithms that are better suited to loop over all possible
> > partitions.
> >
> > cxl_dpa_setup() follows the locking expectations of mutating the device
> > DPA map, and is suitable for Accelerator drivers to use. Accelerators
> > likely only have one hardcoded 'ram' partition to convey to the
> > cxl_core.
> >
> > Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1]
> > Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2]
> > Cc: Dave Jiang <dave.jiang@intel.com>
> > Cc: Alejandro Lucero <alucerop@amd.com>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> Hi Dan,
>
> In basic form this seems fine, but I find the nr_paritions variable usage very
> counter intuitive. It's just how many we configured not how many there
> are, potentially with 0 size (so not a partition). I'd be happier if we
> can avoid that by just prefilling the lot with zero size and filling in
> the ones we want. So zero size means doesn't exist and use an iterator where
> appropriate to skip the zero size ones.
The PMEM-only device case did give me pause. Is that 2 partitions with a
zero-sized first partition, or is that just 1 partition?
Ultimately I do think the code should further evolve to treat that as
1-PMEM-partition, but as far as I can see that depends on 'enum
cxl_decoder_mode' being eliminated and teaching all code paths to search
for the position of the PMEM partition.
> Without that tidied up, to me this is more confusing than the previous code.
I was going to save PMEM at a partition other than 1 for the DCD series,
but let me take another pass at adding that to this series.
[..]
> > +/* if this fails the caller must destroy @cxlds, there is no recovery */
> > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info)
> > +{
> > + struct device *dev = cxlds->dev;
> > +
> > + guard(rwsem_write)(&cxl_dpa_rwsem);
> > +
> > + if (cxlds->nr_partitions)
> > + return -EBUSY;
> > +
> > + if (!info->size || !info->nr_partitions) {
> > + cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> > + cxlds->nr_partitions = 0;
> > + return 0;
> > + }
> > +
> > + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size);
> > +
> > + for (int i = 0; i < info->nr_partitions; i++) {
> > + const char *desc;
> > + int rc;
> > +
> > + if (i == CXL_PARTITION_RAM)
> > + desc = "ram";
> > + else if (i == CXL_PARTITION_PMEM)
> > + desc = "pmem";
> > + else
> > + desc = "";
> > + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID;
> > + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res,
> > + info->range[i].start,
> > + range_len(&info->range[i]), desc);
> > + if (rc)
> > + return rc;
> > + cxlds->nr_partitions++;
> I'd just initialize the rest to 0 length similar to what is happening
> if we have pmem only anyway. Then this nr_patitions goes away and
> stops being a possible source of confusion.
Modulo teaching other code that wants to ask "what is the size of the
PMEM partition" to use a helper that hides the "find the device's PMEM
partition".
>
> > + }
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(cxl_dpa_setup);
>
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 3502f1633ad2..7dca5c8c3494 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -1241,57 +1241,36 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd)
>
> > -int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
> > +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
> > {
> > struct cxl_dev_state *cxlds = &mds->cxlds;
> > - struct resource *ram_res = to_ram_res(cxlds);
> > - struct resource *pmem_res = to_pmem_res(cxlds);
> > struct device *dev = cxlds->dev;
> > int rc;
> >
> > if (!cxlds->media_ready) {
> > - cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> > - *ram_res = DEFINE_RES_MEM(0, 0);
> > - *pmem_res = DEFINE_RES_MEM(0, 0);
> > + info->size = 0;
> > return 0;
> > }
> >
> > - cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes);
> > + info->size = mds->total_bytes;
> >
> > if (mds->partition_align_bytes == 0) {
> Obviously nothing to do with your patch as such, but maybe tidy this up
> by making active values == fixed values when we don't have partition control.
> That seems logical anyway to me and means we only end up with one lot of
> range setup in here. I can't immediately see any side effects of doing this.
Yeah, I mentioned this in another thread. There is no reason
for 'struct cxl_memdev_state' to carry these values at all. They are
just temporary init-data.
So, cxl_dev_state_identify() becomes cxl_mem_identify(), since
it is a memory-device command. Move it inside of cxl_mem_dpa_fetch()
since it is just temporary init-data for 'struct cxl_dpa_info'.
[..]
> > - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
> > - mds->volatile_only_bytes, "ram");
> > - if (rc)
> > - return rc;
> > - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
> > - mds->volatile_only_bytes,
> > - mds->persistent_only_bytes, "pmem");
> > + info->range[CXL_PARTITION_RAM] = (struct range) {
> > + .start = 0,
> > + .end = mds->volatile_only_bytes - 1,
> > + };
> > + info->nr_partitions++;
> > +
> > + if (!mds->persistent_only_bytes)
> > + return 0;
> > +
> > + info->range[CXL_PARTITION_PMEM] = (struct range) {
> > + .start = mds->volatile_only_bytes,
> > + .end = mds->volatile_only_bytes +
> > + mds->persistent_only_bytes - 1,
> > + };
> > + info->nr_partitions++;
>
> This nr partitions makes some sense though I'd be tempted to add a type
> array to info so that we can just not pass empty ones if we don't want to.
> Makes this code a little more complex, but not a lot and means
> nr->partitions becomes the ones that actually exist.
Agree, that's the end goal.
>
> > + return 0;
> > }
> >
> > rc = cxl_mem_get_partition_info(mds);
> > @@ -1300,15 +1279,24 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
> > return rc;
> > }
> >
> > - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
> > - mds->active_volatile_bytes, "ram");
> > - if (rc)
> > - return rc;
> > - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
> > - mds->active_volatile_bytes,
> > - mds->active_persistent_bytes, "pmem");
> > + info->range[CXL_PARTITION_RAM] = (struct range) {
> > + .start = 0,
> > + .end = mds->active_volatile_bytes - 1,
> > + };
> > + info->nr_partitions++;
> > +
> > + if (!mds->active_persistent_bytes)
> > + return 0;
> > +
> > + info->range[CXL_PARTITION_PMEM] = (struct range) {
> > + .start = mds->active_volatile_bytes,
> > + .end = mds->active_volatile_bytes + mds->active_persistent_bytes - 1,
> > + };
> > + info->nr_partitions++;
> > +
> > + return 0;
> > }
> > -EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, "CXL");
> > +EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL");
>
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 78e92e24d7b5..2e728d4b7327 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -97,6 +97,20 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > resource_size_t base, resource_size_t len,
> > resource_size_t skipped);
> >
> > +/* Well known, spec defined partition indices */
> > +enum cxl_partition {
> > + CXL_PARTITION_RAM,
> > + CXL_PARTITION_PMEM,
> > + CXL_PARTITION_MAX,
> > +};
> > +
> > +struct cxl_dpa_info {
> > + u64 size;
> > + struct range range[CXL_PARTITION_MAX];
> > + int nr_partitions;
> > +};
>
> blank line seems appropriate here.
Added.
>
> > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info);
> > +
> > static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
> > struct cxl_memdev *cxlmd)
> > {
> > @@ -408,6 +422,16 @@ struct cxl_dpa_perf {
> > int qos_class;
> > };
> >
>
> > /**
> > * struct cxl_dev_state - The driver device state
> > *
> > @@ -423,8 +447,8 @@ struct cxl_dpa_perf {
> > * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
> > * @media_ready: Indicate whether the device media is usable
> > * @dpa_res: Overall DPA resource tree for the device
> > - * @_pmem_res: Active Persistent memory capacity configuration
> > - * @_ram_res: Active Volatile memory capacity configuration
> > + * @part: DPA partition array
> > + * @nr_partitions: Number of DPA partitions
>
> This needs more. It is not the number of partitions present I think, it
> is the number that a particular driver is potentially interested in.
>
> > * @serial: PCIe Device Serial Number
> > * @type: Generic Memory Class device or Vendor Specific Memory device
> > * @cxl_mbox: CXL mailbox context
> > @@ -438,21 +462,39 @@ struct cxl_dev_state {
> > bool rcd;
> > bool media_ready;
> > struct resource dpa_res;
> > - struct resource _pmem_res;
> > - struct resource _ram_res;
> > + struct cxl_dpa_partition part[CXL_PARTITION_MAX];
> > + unsigned int nr_partitions;
> > u64 serial;
> > enum cxl_devtype type;
> > struct cxl_mailbox cxl_mbox;
> > };
> >
> > -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> > +static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> > {
> > - return &cxlds->_ram_res;
> > + if (cxlds->nr_partitions > 0)
> > + return &cxlds->part[CXL_PARTITION_RAM].res;
> > + return NULL;
> > }
> >
> > -static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
> > +static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
> > {
> > - return &cxlds->_pmem_res;
> > + if (cxlds->nr_partitions > 1)
>
> This is very confusing as nr_partitions is being used not to indicate
> number of partitions but whether a driver has filled in the data for them
> (which may well be empty).
>
> I'd rather see that as a bitmap, or a 'not set' value initialized by
> the core that is then replaced when they are set.
...or even better, not require PMEM to be at partition1.
[..]
next prev parent reply other threads:[~2025-01-17 18:24 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-17 6:10 [PATCH 0/4] cxl: DPA partition metadata is a mess Dan Williams
2025-01-17 6:10 ` [PATCH 1/4] cxl: Remove the CXL_DECODER_MIXED mistake Dan Williams
2025-01-17 10:03 ` Jonathan Cameron
2025-01-17 17:47 ` Dan Williams
2025-01-17 10:24 ` Alejandro Lucero Palau
2025-01-17 17:54 ` Dan Williams
2025-01-17 18:45 ` Ira Weiny
2025-01-17 6:10 ` [PATCH 2/4] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers Dan Williams
2025-01-17 10:20 ` Jonathan Cameron
2025-01-17 10:23 ` Jonathan Cameron
2025-01-17 17:55 ` Dan Williams
2025-01-17 13:33 ` Alejandro Lucero Palau
2025-01-17 20:47 ` Dan Williams
2025-01-17 6:10 ` [PATCH 3/4] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' Dan Williams
2025-01-17 10:52 ` Jonathan Cameron
2025-01-17 13:38 ` Alejandro Lucero Palau
2025-01-17 18:23 ` Dan Williams [this message]
2025-01-17 20:32 ` Ira Weiny
2025-01-20 12:24 ` Alejandro Lucero Palau
2025-01-31 23:54 ` Dan Williams
2025-01-17 15:58 ` Alejandro Lucero Palau
2025-01-17 22:52 ` Dan Williams
2025-01-17 20:42 ` Ira Weiny
2025-01-17 22:08 ` Ira Weiny
2025-01-31 23:39 ` Dan Williams
2025-01-17 6:10 ` [PATCH 4/4] cxl: Make cxl_dpa_alloc() DPA partition number agnostic Dan Williams
2025-01-17 11:12 ` Jonathan Cameron
2025-01-17 18:37 ` Dan Williams
2025-01-17 15:42 ` Alejandro Lucero Palau
2025-01-17 20:57 ` Dan Williams
2025-01-20 12:39 ` Alejandro Lucero Palau
2025-02-01 0:08 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=678aa0381324e_20fa2942a@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=alucerop@amd.com \
--cc=dave.jiang@intel.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox