Linux CXL
 help / color / mirror / Atom feed
From: Alejandro Lucero Palau <alucerop@amd.com>
To: Dan Williams <dan.j.williams@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: linux-cxl@vger.kernel.org, Dave Jiang <dave.jiang@intel.com>,
	Ira Weiny <ira.weiny@intel.com>
Subject: Re: [PATCH 3/4] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
Date: Mon, 20 Jan 2025 12:24:41 +0000	[thread overview]
Message-ID: <30ba3e11-0ba7-6bc6-38ac-1ba0c577f651@amd.com> (raw)
In-Reply-To: <678aa0381324e_20fa2942a@dwillia2-xfh.jf.intel.com.notmuch>


On 1/17/25 18:23, Dan Williams wrote:
> Jonathan Cameron wrote:
>> On Thu, 16 Jan 2025 22:10:44 -0800
>> Dan Williams <dan.j.williams@intel.com> wrote:
>>
>>> The pending efforts to add CXL Accelerator (type-2) device [1], and
>>> Dynamic Capacity (DCD) support [2], tripped on the
>>> no-longer-fit-for-purpose design in the CXL subsystem for tracking
>>> device-physical-address (DPA) metadata. Trip hazards include:
>>>
>>> - CXL Memory Devices need to consider a PMEM partition, but Accelerator
>>>    devices with CXL.mem likely do not in the common case.
>>>
>>> - CXL Memory Devices enumerate DPA through Memory Device mailbox
>>>    commands like Partition Info, Accelerators devices do not.
>>>
>>> - CXL Memory Devices that support DCD support more than 2 partitions.
>>>    Some of the driver algorithms are awkward to expand to > 2 partition
>>>    cases.
>>>
>>> - DPA performance data is a general capability that can be shared with
>>>    accelerators, so tracking it in 'struct cxl_memdev_state' is no longer
>>>    suitable.
>>>
>>> - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a
>>>    memory property, it should be phased in favor of a partition id and
>>>    the memory property comes from the partition info.
>>>
>>> Towards cleaning up those issues and allowing a smoother landing for the
>>> aforementioned pending efforts, introduce a 'struct cxl_dpa_partition'
>>> array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared
>>> way for Memory Devices and Accelerators to initialize the DPA information
>>> in 'struct cxl_dev_state'.
>>>
>>> For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to
>>> get the new data structure initialized, and cleanup some qos_class init.
>>> Follow on patches will go further to use the new data structure to
>>> cleanup algorithms that are better suited to loop over all possible
>>> partitions.
>>>
>>> cxl_dpa_setup() follows the locking expectations of mutating the device
>>> DPA map, and is suitable for Accelerator drivers to use. Accelerators
>>> likely only have one hardcoded 'ram' partition to convey to the
>>> cxl_core.
>>>
>>> Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1]
>>> Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2]
>>> Cc: Dave Jiang <dave.jiang@intel.com>
>>> Cc: Alejandro Lucero <alucerop@amd.com>
>>> Cc: Ira Weiny <ira.weiny@intel.com>
>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> Hi Dan,
>>
>> In basic form this seems fine, but I find the nr_paritions variable usage very
>> counter intuitive.  It's just how many we configured not how many there
>> are, potentially with 0 size (so not a partition).  I'd be happier if we
>> can avoid that by just prefilling the lot with zero size and filling in
>> the ones we want.  So zero size means doesn't exist and use an iterator where
>> appropriate to skip the zero size ones.
> The PMEM-only device case did give me pause. Is that 2 partitions with a
> zero-sized first partition, or is that just 1 partition?


I was wrong about the code being broken for this case.

The code would create two partitions, at least for the case of partition 
alignment being 0, with the first one having 0 size.

This is all based on data/code based on mbox commands. Without mbox this 
partition info needs to be hardcoded or created somehow by the accel 
driver by its own means, so it is good to know the code expects such a 
0-size partition for the pmem-only case and the nr_partitions should be 
set accordingly. Not what my type2 patchset needs now, but I bet this 
will need to be set properly by a coming accel driver.


> Ultimately I do think the code should further evolve to treat that as
> 1-PMEM-partition, but as far as I can see that depends on 'enum
> cxl_decoder_mode' being eliminated and teaching all code paths to search
> for the position of the PMEM partition.
>
>> Without that tidied up, to me this is more confusing than the previous code.
> I was going to save PMEM at a partition other than 1 for the DCD series,
> but let me take another pass at adding that to this series.
>
> [..]
>>> +/* if this fails the caller must destroy @cxlds, there is no recovery */
>>> +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info)
>>> +{
>>> +	struct device *dev = cxlds->dev;
>>> +
>>> +	guard(rwsem_write)(&cxl_dpa_rwsem);
>>> +
>>> +	if (cxlds->nr_partitions)
>>> +		return -EBUSY;
>>> +
>>> +	if (!info->size || !info->nr_partitions) {
>>> +		cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
>>> +		cxlds->nr_partitions = 0;
>>> +		return 0;
>>> +	}
>>> +
>>> +	cxlds->dpa_res = DEFINE_RES_MEM(0, info->size);
>>> +
>>> +	for (int i = 0; i < info->nr_partitions; i++) {
>>> +		const char *desc;
>>> +		int rc;
>>> +
>>> +		if (i == CXL_PARTITION_RAM)
>>> +			desc = "ram";
>>> +		else if (i == CXL_PARTITION_PMEM)
>>> +			desc = "pmem";
>>> +		else
>>> +			desc = "";
>>> +		cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID;
>>> +		rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res,
>>> +				 info->range[i].start,
>>> +				 range_len(&info->range[i]), desc);
>>> +		if (rc)
>>> +			return rc;
>>> +		cxlds->nr_partitions++;
>> I'd just initialize the rest to 0 length similar to what is happening
>> if we have pmem only anyway.  Then this nr_patitions goes away and
>> stops being a possible source of confusion.
> Modulo teaching other code that wants to ask "what is the size of the
> PMEM partition" to use a helper that hides the "find the device's PMEM
> partition".
>
>
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +EXPORT_SYMBOL_GPL(cxl_dpa_setup);
>>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>>> index 3502f1633ad2..7dca5c8c3494 100644
>>> --- a/drivers/cxl/core/mbox.c
>>> +++ b/drivers/cxl/core/mbox.c
>>> @@ -1241,57 +1241,36 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd)
>>> -int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
>>> +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
>>>   {
>>>   	struct cxl_dev_state *cxlds = &mds->cxlds;
>>> -	struct resource *ram_res = to_ram_res(cxlds);
>>> -	struct resource *pmem_res = to_pmem_res(cxlds);
>>>   	struct device *dev = cxlds->dev;
>>>   	int rc;
>>>   
>>>   	if (!cxlds->media_ready) {
>>> -		cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
>>> -		*ram_res = DEFINE_RES_MEM(0, 0);
>>> -		*pmem_res = DEFINE_RES_MEM(0, 0);
>>> +		info->size = 0;
>>>   		return 0;
>>>   	}
>>>   
>>> -	cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes);
>>> +	info->size = mds->total_bytes;
>>>   
>>>   	if (mds->partition_align_bytes == 0) {
>> Obviously nothing to do with your patch as such, but maybe tidy this up
>> by making active values == fixed values when we don't have partition control.
>> That seems logical anyway to me and means we only end up with one lot of
>> range setup in here.  I can't immediately see any side effects of doing this.
> Yeah, I mentioned this in another thread. There is no reason
> for 'struct cxl_memdev_state' to carry these values at all. They are
> just temporary init-data.
>
> So, cxl_dev_state_identify() becomes cxl_mem_identify(), since
> it is a memory-device command. Move it inside of cxl_mem_dpa_fetch()
> since it is just temporary init-data for 'struct cxl_dpa_info'.
>
> [..]
>>> -		rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
>>> -				 mds->volatile_only_bytes, "ram");
>>> -		if (rc)
>>> -			return rc;
>>> -		return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
>>> -				   mds->volatile_only_bytes,
>>> -				   mds->persistent_only_bytes, "pmem");
>>> +		info->range[CXL_PARTITION_RAM] = (struct range) {
>>> +			.start = 0,
>>> +			.end = mds->volatile_only_bytes - 1,
>>> +		};
>>> +		info->nr_partitions++;
>>> +
>>> +		if (!mds->persistent_only_bytes)
>>> +			return 0;
>>> +
>>> +		info->range[CXL_PARTITION_PMEM] = (struct range) {
>>> +			.start = mds->volatile_only_bytes,
>>> +			.end = mds->volatile_only_bytes +
>>> +			       mds->persistent_only_bytes - 1,
>>> +		};
>>> +		info->nr_partitions++;
>> This nr partitions makes some sense though I'd be tempted to add a type
>> array to info so that we can just not pass empty ones if we don't want to.
>> Makes this code a little more complex, but not a lot and means
>> nr->partitions becomes the ones that actually exist.
> Agree, that's the end goal.
>
>>> +		return 0;
>>>   	}
>>>   
>>>   	rc = cxl_mem_get_partition_info(mds);
>>> @@ -1300,15 +1279,24 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
>>>   		return rc;
>>>   	}
>>>   
>>> -	rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
>>> -			 mds->active_volatile_bytes, "ram");
>>> -	if (rc)
>>> -		return rc;
>>> -	return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
>>> -			   mds->active_volatile_bytes,
>>> -			   mds->active_persistent_bytes, "pmem");
>>> +	info->range[CXL_PARTITION_RAM] = (struct range) {
>>> +		.start = 0,
>>> +		.end = mds->active_volatile_bytes - 1,
>>> +	};
>>> +	info->nr_partitions++;
>>> +
>>> +	if (!mds->active_persistent_bytes)
>>> +		return 0;
>>> +
>>> +	info->range[CXL_PARTITION_PMEM] = (struct range) {
>>> +		.start = mds->active_volatile_bytes,
>>> +		.end = mds->active_volatile_bytes + mds->active_persistent_bytes - 1,
>>> +	};
>>> +	info->nr_partitions++;
>>> +
>>> +	return 0;
>>>   }
>>> -EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, "CXL");
>>> +EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL");
>>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>>> index 78e92e24d7b5..2e728d4b7327 100644
>>> --- a/drivers/cxl/cxlmem.h
>>> +++ b/drivers/cxl/cxlmem.h
>>> @@ -97,6 +97,20 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
>>>   			 resource_size_t base, resource_size_t len,
>>>   			 resource_size_t skipped);
>>>   
>>> +/* Well known, spec defined partition indices */
>>> +enum cxl_partition {
>>> +	CXL_PARTITION_RAM,
>>> +	CXL_PARTITION_PMEM,
>>> +	CXL_PARTITION_MAX,
>>> +};
>>> +
>>> +struct cxl_dpa_info {
>>> +	u64 size;
>>> +	struct range range[CXL_PARTITION_MAX];
>>> +	int nr_partitions;
>>> +};
>> blank line seems appropriate here.
> Added.
>
>>> +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info);
>>> +
>>>   static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
>>>   					 struct cxl_memdev *cxlmd)
>>>   {
>>> @@ -408,6 +422,16 @@ struct cxl_dpa_perf {
>>>   	int qos_class;
>>>   };
>>>   
>>>   /**
>>>    * struct cxl_dev_state - The driver device state
>>>    *
>>> @@ -423,8 +447,8 @@ struct cxl_dpa_perf {
>>>    * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
>>>    * @media_ready: Indicate whether the device media is usable
>>>    * @dpa_res: Overall DPA resource tree for the device
>>> - * @_pmem_res: Active Persistent memory capacity configuration
>>> - * @_ram_res: Active Volatile memory capacity configuration
>>> + * @part: DPA partition array
>>> + * @nr_partitions: Number of DPA partitions
>> This needs more. It is not the number of partitions present I think, it
>> is the number that a particular driver is potentially interested in.
>>
>>>    * @serial: PCIe Device Serial Number
>>>    * @type: Generic Memory Class device or Vendor Specific Memory device
>>>    * @cxl_mbox: CXL mailbox context
>>> @@ -438,21 +462,39 @@ struct cxl_dev_state {
>>>   	bool rcd;
>>>   	bool media_ready;
>>>   	struct resource dpa_res;
>>> -	struct resource _pmem_res;
>>> -	struct resource _ram_res;
>>> +	struct cxl_dpa_partition part[CXL_PARTITION_MAX];
>>> +	unsigned int nr_partitions;
>>>   	u64 serial;
>>>   	enum cxl_devtype type;
>>>   	struct cxl_mailbox cxl_mbox;
>>>   };
>>>   
>>> -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
>>> +static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds)
>>>   {
>>> -	return &cxlds->_ram_res;
>>> +	if (cxlds->nr_partitions > 0)
>>> +		return &cxlds->part[CXL_PARTITION_RAM].res;
>>> +	return NULL;
>>>   }
>>>   
>>> -static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
>>> +static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
>>>   {
>>> -	return &cxlds->_pmem_res;
>>> +	if (cxlds->nr_partitions > 1)
>> This is very confusing as nr_partitions is being used not to indicate
>> number of partitions but whether a driver has filled in the data for them
>> (which may well be empty).
>>
>> I'd rather see that as a bitmap, or a 'not set' value initialized by
>> the core that is then replaced when they are set.
> ...or even better, not require PMEM to be at partition1.
>
> [..]

  parent reply	other threads:[~2025-01-20 12:24 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-17  6:10 [PATCH 0/4] cxl: DPA partition metadata is a mess Dan Williams
2025-01-17  6:10 ` [PATCH 1/4] cxl: Remove the CXL_DECODER_MIXED mistake Dan Williams
2025-01-17 10:03   ` Jonathan Cameron
2025-01-17 17:47     ` Dan Williams
2025-01-17 10:24   ` Alejandro Lucero Palau
2025-01-17 17:54     ` Dan Williams
2025-01-17 18:45   ` Ira Weiny
2025-01-17  6:10 ` [PATCH 2/4] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers Dan Williams
2025-01-17 10:20   ` Jonathan Cameron
2025-01-17 10:23     ` Jonathan Cameron
2025-01-17 17:55       ` Dan Williams
2025-01-17 13:33   ` Alejandro Lucero Palau
2025-01-17 20:47     ` Dan Williams
2025-01-17  6:10 ` [PATCH 3/4] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' Dan Williams
2025-01-17 10:52   ` Jonathan Cameron
2025-01-17 13:38     ` Alejandro Lucero Palau
2025-01-17 18:23     ` Dan Williams
2025-01-17 20:32       ` Ira Weiny
2025-01-20 12:24       ` Alejandro Lucero Palau [this message]
2025-01-31 23:54         ` Dan Williams
2025-01-17 15:58   ` Alejandro Lucero Palau
2025-01-17 22:52     ` Dan Williams
2025-01-17 20:42   ` Ira Weiny
2025-01-17 22:08   ` Ira Weiny
2025-01-31 23:39     ` Dan Williams
2025-01-17  6:10 ` [PATCH 4/4] cxl: Make cxl_dpa_alloc() DPA partition number agnostic Dan Williams
2025-01-17 11:12   ` Jonathan Cameron
2025-01-17 18:37     ` Dan Williams
2025-01-17 15:42   ` Alejandro Lucero Palau
2025-01-17 20:57     ` Dan Williams
2025-01-20 12:39       ` Alejandro Lucero Palau
2025-02-01  0:08         ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=30ba3e11-0ba7-6bc6-38ac-1ba0c577f651@amd.com \
    --to=alucerop@amd.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox