Linux CXL
 help / color / mirror / Atom feed
From: Dave Jiang <dave.jiang@intel.com>
To: <ira.weiny@intel.com>, Navneet Singh <navneet.singh@intel.com>,
	Fan Ni <fan.ni@samsung.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	<linux-cxl@vger.kernel.org>
Subject: Re: [PATCH 4/5] cxl/mem: Add support to handle DCD add and release capacity events.
Date: Thu, 15 Jun 2023 09:58:48 -0700	[thread overview]
Message-ID: <c8d5594f-846b-2075-2a65-1378de9bfdfb@intel.com> (raw)
In-Reply-To: <20230604-dcd-type2-upstream-v1-4-71b6341bae54@intel.com>



On 6/14/23 12:16, ira.weiny@intel.com wrote:
> From: Navneet Singh <navneet.singh@intel.com>
> 
> A dynamic capacity device utilizes events to signal the host about the
> changes to the allocation of DC blocks. The device communicates the
> state of these blocks of dynamic capacity through an extent list that
> describes the starting DPA and length of all blocks the host can access.
> 
> Based on the dynamic capacity add or release event type,
> dynamic memory represented by the extents are either added
> or removed as devdax device.
> 
> Process the dynamic capacity add and release events.
> 
> Signed-off-by: Navneet Singh <navneet.singh@intel.com>
> 
> ---
> [iweiny: Remove invalid comment]
> ---
>   drivers/cxl/core/mbox.c   | 345 +++++++++++++++++++++++++++++++++++++++++++++-
>   drivers/cxl/core/region.c | 214 +++++++++++++++++++++++++++-
>   drivers/cxl/core/trace.h  |   3 +-
>   drivers/cxl/cxl.h         |   4 +-
>   drivers/cxl/cxlmem.h      |  76 ++++++++++
>   drivers/cxl/pci.c         |  10 +-
>   drivers/dax/bus.c         |  11 +-
>   drivers/dax/bus.h         |   5 +-
>   8 files changed, 652 insertions(+), 16 deletions(-)

Rather large patch. Can this be broken into the add and release events?


> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index c5b696737c87..db9295216de5 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -767,6 +767,14 @@ static const uuid_t log_uuid[] = {
>   	[VENDOR_DEBUG_UUID] = DEFINE_CXL_VENDOR_DEBUG_UUID,
>   };
>   
> +/* See CXL 3.0 8.2.9.2.1.5 */
> +enum dc_event {
> +	ADD_CAPACITY,
> +	RELEASE_CAPACITY,
> +	FORCED_CAPACITY_RELEASE,
> +	REGION_CONFIGURATION_UPDATED,
> +};
> +
>   /**
>    * cxl_enumerate_cmds() - Enumerate commands for a device.
>    * @mds: The driver data for the operation
> @@ -852,6 +860,14 @@ static const uuid_t mem_mod_event_uuid =
>   	UUID_INIT(0xfe927475, 0xdd59, 0x4339,
>   		  0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74);
>   
> +/*
> + * Dynamic Capacity Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
> + */
> +static const uuid_t dc_event_uuid =
> +	UUID_INIT(0xca95afa7, 0xf183, 0x4018, 0x8c,
> +		0x2f, 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a);
> +
>   static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>   				   enum cxl_event_log_type type,
>   				   struct cxl_event_record_raw *record)
> @@ -945,6 +961,188 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
>   	return rc;
>   }
>   
> +static int cxl_send_dc_cap_response(struct cxl_memdev_state *mds,
> +				struct cxl_mbox_dc_response *res,
> +				int extent_cnt, int opcode)
> +{
> +	struct cxl_mbox_cmd mbox_cmd;
> +	int rc, size;
> +
> +	size = struct_size(res, extent_list, extent_cnt);
> +	res->extent_list_size = cpu_to_le32(extent_cnt);
> +
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = opcode,
> +		.size_in = size,
> +		.payload_in = res,
> +	};
> +
> +	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> +
> +	return rc;
> +
> +}

Return cxl_internal_send_cmd() directly

> +
> +static int cxl_prepare_ext_list(struct cxl_mbox_dc_response **res,
'res' always triggers "resource" in my head. Maybe either spell it out 
or 'resp'?

> +					int *n, struct range *extent)

Maybe a parameter more descriptive than 'n'?


> +{
> +	struct cxl_mbox_dc_response *dc_res;

Same comment as 'res'

> +	unsigned int size;
> +
> +	if (!extent)
> +		size = struct_size(dc_res, extent_list, 0);
> +	else
> +		size = struct_size(dc_res, extent_list, *n + 1);
> +
> +	dc_res = krealloc(*res, size, GFP_KERNEL);
> +	if (!dc_res)
> +		return -ENOMEM;
> +
> +	if (extent) {
> +		dc_res->extent_list[*n].dpa_start = cpu_to_le64(extent->start);
> +		memset(dc_res->extent_list[*n].reserved, 0, 8);

SZ_8 perhaps?

> +		dc_res->extent_list[*n].length =
> +				cpu_to_le64(range_len(extent));
> +		(*n)++;
> +	}
> +
> +	*res = dc_res;
> +	return 0;
> +}
> +/**
> + * cxl_handle_dcd_event_records() - Read DCD event records.
> + * @mds: The memory device state
> + *
> + * Returns 0 if enumerate completed successfully.

Please also add "errno on failure."

> + *
> + * CXL devices can generate DCD events to add or remove extents in the list.
> + */
> +static int cxl_handle_dcd_event_records(struct cxl_memdev_state *mds,
> +					struct cxl_event_record_raw *rec)
> +{
> +	struct cxl_mbox_dc_response *dc_res = NULL;
> +	struct device *dev = mds->cxlds.dev;
> +	uuid_t *id = &rec->hdr.id;
> +	struct dcd_event_dyn_cap *record =
> +			(struct dcd_event_dyn_cap *)rec;
> +	int extent_cnt = 0, rc = 0;
> +	struct cxl_dc_extent_data *extent;
> +	struct range alloc_range, rel_range;
> +	resource_size_t dpa, size;
> +
> +	if (!uuid_equal(id, &dc_event_uuid))
> +		return -EINVAL;
> +
> +	switch (record->data.event_type) {
> +	case ADD_CAPACITY:
> +		extent = devm_kzalloc(dev, sizeof(*extent), GFP_ATOMIC);
> +		if (!extent)
> +			return -ENOMEM;
> +
> +		extent->dpa_start = le64_to_cpu(record->data.extent.start_dpa);
> +		extent->length = le64_to_cpu(record->data.extent.length);
> +		memcpy(extent->tag, record->data.extent.tag,
> +				sizeof(record->data.extent.tag));

A general comment, the alignment of second line in this patch is 
inconsistent

> +		extent->shared_extent_seq =
> +			le16_to_cpu(record->data.extent.shared_extn_seq);
> +		dev_dbg(dev, "Add DC extent DPA:0x%llx LEN:%llx\n",
> +					extent->dpa_start, extent->length);
> +		alloc_range = (struct range) {
> +			.start = extent->dpa_start,
> +			.end = extent->dpa_start + extent->length - 1,
> +		};
> +
> +		rc = cxl_add_dc_extent(mds, &alloc_range);
> +		if (rc < 0) {
> +			dev_dbg(dev, "unconsumed DC extent DPA:0x%llx LEN:%llx\n",
> +					extent->dpa_start, extent->length);
> +			rc = cxl_prepare_ext_list(&dc_res, &extent_cnt, NULL);
> +			if (rc < 0) {
> +				dev_err(dev, "Couldn't create extent list %d\n",
> +									rc);
> +				devm_kfree(dev, extent);
> +				return rc;
> +			}
> +
> +			rc = cxl_send_dc_cap_response(mds, dc_res,
> +					extent_cnt, CXL_MBOX_OP_ADD_DC_RESPONSE);
> +			if (rc < 0) {
> +				devm_kfree(dev, extent);
> +				goto out;

Please be consistent in direct return vs single path error out. Mixing 
makes it more difficult to read.

> +			}
> +
> +			kfree(dc_res);
> +			devm_kfree(dev, extent);
> +
> +			return 0;
> +		}
> +
> +		rc = xa_insert(&mds->dc_extent_list, extent->dpa_start, extent,
> +				GFP_KERNEL);
> +		if (rc < 0)
> +			goto out;
> +
> +		mds->num_dc_extents++;
> +		rc = cxl_prepare_ext_list(&dc_res, &extent_cnt, &alloc_range);
> +		if (rc < 0) {
> +			dev_err(dev, "Couldn't create extent list %d\n", rc);
> +			return rc;
> +		}
> +
> +		rc = cxl_send_dc_cap_response(mds, dc_res, extent_cnt,
> +					      CXL_MBOX_OP_ADD_DC_RESPONSE);
> +		if (rc < 0)
> +			goto out;
> +
> +		break;
> +
> +	case RELEASE_CAPACITY:
> +		dpa = le64_to_cpu(record->data.extent.start_dpa);
> +		size = le64_to_cpu(record->data.extent.length);
> +		dev_dbg(dev, "Release DC extents DPA:0x%llx LEN:%llx\n",
> +				dpa, size);
> +		extent = xa_load(&mds->dc_extent_list, dpa);
> +		if (!extent) {
> +			dev_err(dev, "No extent found with DPA:0x%llx\n", dpa);
> +			return -EINVAL;
> +		}
> +
> +		rel_range = (struct range) {
> +			.start = dpa,
> +			.end = dpa + size - 1,
> +		};
> +
> +		rc = cxl_release_dc_extent(mds, &rel_range);
> +		if (rc < 0) {
> +			dev_dbg(dev, "withhold DC extent DPA:0x%llx LEN:%llx\n",
> +									dpa, size);
> +			return 0;
> +		}
> +
> +		xa_erase(&mds->dc_extent_list, dpa);
> +		devm_kfree(dev, extent);
> +		mds->num_dc_extents--;
> +		rc = cxl_prepare_ext_list(&dc_res, &extent_cnt, &rel_range);
> +		if (rc < 0) {
> +			dev_err(dev, "Couldn't create extent list %d\n", rc);
> +			return rc;
> +		}
> +
> +		rc = cxl_send_dc_cap_response(mds, dc_res, extent_cnt,
> +					      CXL_MBOX_OP_RELEASE_DC);
> +		if (rc < 0)
> +			goto out;
> +
> +		break;
> +
> +	default:
> +		return -EINVAL;
> +	}
> +out:
> +	kfree(dc_res);
> +	return rc;
> +}
> +
>   static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
>   				    enum cxl_event_log_type type)
>   {
> @@ -982,9 +1180,17 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
>   		if (!nr_rec)
>   			break;
>   
> -		for (i = 0; i < nr_rec; i++)
> +		for (i = 0; i < nr_rec; i++) {
>   			cxl_event_trace_record(cxlmd, type,
>   					       &payload->records[i]);
> +			if (type == CXL_EVENT_TYPE_DCD) {
> +				rc = cxl_handle_dcd_event_records(mds,
> +						&payload->records[i]);
> +				if (rc)
> +					dev_err_ratelimited(dev,
> +						"dcd event failed: %d\n", rc);
> +			}
> +		}
>   
>   		if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
>   			trace_cxl_overflow(cxlmd, type, payload);
> @@ -1024,6 +1230,8 @@ void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status)
>   		cxl_mem_get_records_log(mds, CXL_EVENT_TYPE_WARN);
>   	if (status & CXLDEV_EVENT_STATUS_INFO)
>   		cxl_mem_get_records_log(mds, CXL_EVENT_TYPE_INFO);
> +	if (status & CXLDEV_EVENT_STATUS_DCD)
> +		cxl_mem_get_records_log(mds, CXL_EVENT_TYPE_DCD);
>   }
>   EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
>   
> @@ -1244,6 +1452,140 @@ int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds)
>   }
>   EXPORT_SYMBOL_NS_GPL(cxl_dev_dynamic_capacity_identify, CXL);
>   
> +int cxl_dev_get_dc_extent_cnt(struct cxl_memdev_state *mds,
> +			      unsigned int *extent_gen_num)
> +{
> +	struct device *dev = mds->cxlds.dev;
> +	struct cxl_mbox_dc_extents *dc_extents;
> +	struct cxl_mbox_get_dc_extent get_dc_extent;
> +	unsigned int total_extent_cnt;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	int rc;
> +
> +	/* Check GET_DC_EXTENT_LIST is supported by device */
> +	if (!test_bit(CXL_DCD_ENABLED_GET_EXTENT_LIST, mds->dcd_cmds)) {
> +		dev_dbg(dev, "unsupported cmd : get dyn cap extent list\n");
> +		return 0;
> +	}
> +
> +	dc_extents = kvmalloc(mds->payload_size, GFP_KERNEL);
> +	if (!dc_extents)
> +		return -ENOMEM;
> +
> +	get_dc_extent = (struct cxl_mbox_get_dc_extent) {
> +		.extent_cnt = 0,
> +		.start_extent_index = 0,
> +	};
> +
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_GET_DC_EXTENT_LIST,
> +		.payload_in = &get_dc_extent,
> +		.size_in = sizeof(get_dc_extent),
> +		.size_out = mds->payload_size,
> +		.payload_out = dc_extents,
> +		.min_out = 1,
> +	};
> +	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> +	if (rc < 0)
> +		goto out;
> +
> +	total_extent_cnt = le32_to_cpu(dc_extents->total_extent_cnt);
> +	*extent_gen_num = le32_to_cpu(dc_extents->extent_list_num);
> +	dev_dbg(dev, "Total extent count :%d Extent list Generation Num: %d\n",
> +			total_extent_cnt, *extent_gen_num);
> +out:
> +
> +	kvfree(dc_extents);
> +	if (rc < 0)
> +		return rc;
> +
> +	return total_extent_cnt;
> +
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_dev_get_dc_extent_cnt, CXL);
> +
> +int cxl_dev_get_dc_extents(struct cxl_memdev_state *mds,
> +			   unsigned int index, unsigned int cnt)
> +{
> +	/* See CXL 3.0 Table 125 dynamic capacity config  Output Payload */
> +	struct device *dev = mds->cxlds.dev;
> +	struct cxl_mbox_dc_extents *dc_extents;
> +	struct cxl_mbox_get_dc_extent get_dc_extent;
> +	unsigned int extent_gen_num, available_extents, total_extent_cnt;
> +	int rc;
> +	struct cxl_dc_extent_data *extent;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	struct range alloc_range;
> +
> +	/* Check GET_DC_EXTENT_LIST is supported by device */
> +	if (!test_bit(CXL_DCD_ENABLED_GET_EXTENT_LIST, mds->dcd_cmds)) {
> +		dev_dbg(dev, "unsupported cmd : get dyn cap extent list\n");
> +		return 0;
> +	}
> +
> +	dc_extents = kvmalloc(mds->payload_size, GFP_KERNEL);
> +	if (!dc_extents)
> +		return -ENOMEM;
> +	get_dc_extent = (struct cxl_mbox_get_dc_extent) {
> +		.extent_cnt = cnt,
> +		.start_extent_index = index,
> +	};
> +
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_GET_DC_EXTENT_LIST,
> +		.payload_in = &get_dc_extent,
> +		.size_in = sizeof(get_dc_extent),
> +		.size_out = mds->payload_size,
> +		.payload_out = dc_extents,
> +		.min_out = 1,
> +	};
> +	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> +	if (rc < 0)
> +		goto out;
> +
> +	available_extents = le32_to_cpu(dc_extents->ret_extent_cnt);
> +	total_extent_cnt = le32_to_cpu(dc_extents->total_extent_cnt);
> +	extent_gen_num = le32_to_cpu(dc_extents->extent_list_num);
> +	dev_dbg(dev, "No Total extent count :%d Extent list Generation Num:%d\n",
> +			total_extent_cnt, extent_gen_num);
> +
> +
> +	for (int i = 0; i < available_extents ; i++) {
> +		extent = devm_kzalloc(dev, sizeof(*extent), GFP_KERNEL);
> +		if (!extent) {
> +			rc = -ENOMEM;
> +			goto out;
> +		}
> +		extent->dpa_start = le64_to_cpu(dc_extents->extent[i].start_dpa);
> +		extent->length = le64_to_cpu(dc_extents->extent[i].length);
> +		memcpy(extent->tag, dc_extents->extent[i].tag,
> +					sizeof(dc_extents->extent[i].tag));
> +		extent->shared_extent_seq =
> +				le16_to_cpu(dc_extents->extent[i].shared_extn_seq);
> +		dev_dbg(dev, "dynamic capacity extent[%d] DPA:0x%llx LEN:%llx\n",
> +				i, extent->dpa_start, extent->length);
> +
> +		alloc_range = (struct range){
> +			.start = extent->dpa_start,
> +			.end = extent->dpa_start + extent->length - 1,
> +		};
> +
> +		rc = cxl_add_dc_extent(mds, &alloc_range);
> +		if (rc < 0)
> +			goto out;
> +		rc = xa_insert(&mds->dc_extent_list, extent->dpa_start, extent,
> +				GFP_KERNEL);
> +	}
> +
> +out:
> +	kvfree(dc_extents);
> +	if (rc < 0)
> +		return rc;
> +
> +	return available_extents;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_dev_get_dc_extents, CXL);
> +
>   static int add_dpa_res(struct device *dev, struct resource *parent,
>   		       struct resource *res, resource_size_t start,
>   		       resource_size_t size, const char *type)
> @@ -1452,6 +1794,7 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
>   	mutex_init(&mds->event.log_lock);
>   	mds->cxlds.dev = dev;
>   	mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
> +	xa_init(&mds->dc_extent_list);
>   
>   	return mds;
>   }
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 144232c8305e..ba45c1c3b0a9 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1,6 +1,7 @@
>   // SPDX-License-Identifier: GPL-2.0-only
>   /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
>   #include <linux/memregion.h>
> +#include <linux/interrupt.h>
>   #include <linux/genalloc.h>
>   #include <linux/device.h>
>   #include <linux/module.h>
> @@ -11,6 +12,8 @@
>   #include <cxlmem.h>
>   #include <cxl.h>
>   #include "core.h"
> +#include "../../dax/bus.h"
> +#include "../../dax/dax-private.h"
>   
>   /**
>    * DOC: cxl core region
> @@ -166,6 +169,38 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
>   	return 0;
>   }
>   
> +static int cxl_region_manage_dc(struct cxl_region *cxlr)
> +{
> +	struct cxl_region_params *p = &cxlr->params;
> +	unsigned int extent_gen_num;
> +	int i, rc;
> +
> +	/* Designed for Non Interleaving flow with the assumption one

comment need to go to next line
/*
  * comment
  */

> +	 * cxl_region will map the complete device DC region's DPA range
> +	 */
> +	for (i = 0; i < p->nr_targets; i++) {
> +		struct cxl_endpoint_decoder *cxled = p->targets[i];
> +		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +		struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
> +
> +		rc = cxl_dev_get_dc_extent_cnt(mds, &extent_gen_num);
> +		if (rc < 0)
Need {} since else if does
> +			goto err;

I would just do return rc here. No need for the goto. and then else if() 
below would become if()

> +		else if (rc > 1) {
> +			rc = cxl_dev_get_dc_extents(mds, rc, 0);
> +			if (rc < 0)
> +				goto err;
> +			mds->num_dc_extents = rc;
> +			mds->dc_extents_index = rc - 1;
> +		}
> +		mds->dc_list_gen_num = extent_gen_num;
> +		dev_dbg(mds->cxlds.dev, "No of preallocated extents :%d\n", rc);
> +	}
> +	return 0;
> +err:
> +	return rc;
> +}
> +
>   static int commit_decoder(struct cxl_decoder *cxld)
>   {
>   	struct cxl_switch_decoder *cxlsd = NULL;
> @@ -2865,11 +2900,14 @@ static int devm_cxl_add_dc_region(struct cxl_region *cxlr)
>   		return PTR_ERR(cxlr_dax);
>   
>   	cxlr_dc = kzalloc(sizeof(*cxlr_dc), GFP_KERNEL);
> -	if (!cxlr_dc) {
> -		rc = -ENOMEM;
> -		goto err;
> -	}
> +	if (!cxlr_dc)
> +		return -ENOMEM;
>   
> +	rc = request_module("dax_cxl");
> +	if (rc) {
> +		dev_err(dev, "failed to load dax-ctl module\n");
> +		goto load_err;
> +	}
>   	dev = &cxlr_dax->dev;
>   	rc = dev_set_name(dev, "dax_region%d", cxlr->id);
>   	if (rc)
> @@ -2891,10 +2929,24 @@ static int devm_cxl_add_dc_region(struct cxl_region *cxlr)
>   	xa_init(&cxlr_dc->dax_dev_list);
>   	cxlr->cxlr_dc = cxlr_dc;
>   	rc = devm_add_action_or_reset(&cxlr->dev, cxl_dc_region_release, cxlr);
> -	if (!rc)
> -		return 0;
> +	if (rc)
> +		goto err;
> +
> +	if (!dev->driver) {
> +		dev_err(dev, "%s Driver not attached\n", dev_name(dev));
> +		rc = -ENXIO;
> +		goto err;
> +	}
> +
> +	rc = cxl_region_manage_dc(cxlr);
> +	if (rc)
> +		goto err;
> +
> +	return 0;
> +
>   err:
>   	put_device(dev);
> +load_err:
>   	kfree(cxlr_dc);
>   	return rc;
>   }
> @@ -3076,6 +3128,156 @@ struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
>   }
>   EXPORT_SYMBOL_NS_GPL(cxl_create_region, CXL);
>   
> +static int match_ep_decoder_by_range(struct device *dev, void *data)
> +{
> +	struct cxl_endpoint_decoder *cxled;
> +	struct range *dpa_range = data;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	if (!cxled->cxld.region)
> +		return 0;
> +
> +	if (cxled->dpa_res->start <= dpa_range->start &&
> +				cxled->dpa_res->end >= dpa_range->end)
> +		return 1;
> +
> +	return 0;

Return the compare directly

> +}
> +
> +int cxl_release_dc_extent(struct cxl_memdev_state *mds,
> +			  struct range *rel_range)
> +{
> +	struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
> +	struct cxl_endpoint_decoder *cxled;
> +	struct cxl_dc_region *cxlr_dc;
> +	struct dax_region *dax_region;
> +	resource_size_t dpa_offset;
> +	struct cxl_region *cxlr;
> +	struct range hpa_range;
> +	struct dev_dax *dev_dax;
> +	resource_size_t hpa;
> +	struct device *dev;
> +	int ranges, rc = 0;
> +
> +	/*
> +	 * Find the cxl endpoind decoder with which has the extent dpa range and
> +	 * get the cxl_region, dax_region refrences.
> +	 */
> +	dev = device_find_child(&cxlmd->endpoint->dev, rel_range,
> +				match_ep_decoder_by_range);
> +	if (!dev) {
> +		dev_err(mds->cxlds.dev, "%pr not mapped\n", rel_range);
> +		return PTR_ERR(dev);

dev would be NULL and PTR_ERR() won't work. Maybe 'return -ENODEV'? 
device_find_child() doesn't return ERRPTR it seems.

> +	}
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	hpa_range = cxled->cxld.hpa_range;
> +	cxlr = cxled->cxld.region;
> +	cxlr_dc = cxlr->cxlr_dc;
> +
> +	/* DPA to HPA translation */
> +	if (cxled->cxld.interleave_ways == 1) {
> +		dpa_offset = rel_range->start - cxled->dpa_res->start;
> +		hpa = hpa_range.start + dpa_offset;
> +	} else {
> +		dev_err(mds->cxlds.dev, "Interleaving DC not supported\n");
> +		return -EINVAL;
> +	}

Check != 1 first and return. Then you don't need else.

> +
> +	dev_dax = xa_load(&cxlr_dc->dax_dev_list, hpa);
> +	if (!dev_dax)
> +		return -EINVAL;
> +
> +	dax_region = dev_dax->region;
> +	ranges = dev_dax->nr_range;
> +
> +	while (ranges) {
> +		int i = ranges - 1;
> +		struct dax_mapping *mapping = dev_dax->ranges[i].mapping;
> +
> +		devm_release_action(dax_region->dev, unregister_dax_mapping,
> +								&mapping->dev);
> +		ranges--;
> +	}
> +
> +	dev_dbg(mds->cxlds.dev, "removing devdax device:%s\n",
> +						dev_name(&dev_dax->dev));
> +	devm_release_action(dax_region->dev, unregister_dev_dax,
> +							&dev_dax->dev);
> +	xa_erase(&cxlr_dc->dax_dev_list, hpa);
> +
> +	return rc;
> +}
> +
> +int cxl_add_dc_extent(struct cxl_memdev_state *mds, struct range *alloc_range)
> +{
> +	struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
> +	struct cxl_endpoint_decoder *cxled;
> +	struct cxl_dax_region *cxlr_dax;
> +	struct cxl_dc_region *cxlr_dc;
> +	struct dax_region *dax_region;
> +	resource_size_t dpa_offset;
> +	struct dev_dax_data data;
> +	struct dev_dax *dev_dax;
> +	struct cxl_region *cxlr;
> +	struct range hpa_range;
> +	resource_size_t hpa;
> +	struct device *dev;
> +	int rc;
> +
> +	/*
> +	 * Find the cxl endpoind decoder with which has the extent dpa range and
> +	 * get the cxl_region, dax_region refrences.
> +	 */
> +	dev = device_find_child(&cxlmd->endpoint->dev, alloc_range,
> +				match_ep_decoder_by_range);
> +	if (!dev) {
> +		dev_err(mds->cxlds.dev, "%pr not mapped\n",	alloc_range);

A lot of extra spaces after ','

> +		return PTR_ERR(dev);

Same comment as earlier.

> +	}
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	hpa_range = cxled->cxld.hpa_range;
> +	cxlr = cxled->cxld.region;
> +	cxlr_dc = cxlr->cxlr_dc;
> +	cxlr_dax = cxlr_dc->cxlr_dax;
> +	dax_region = dev_get_drvdata(&cxlr_dax->dev);
> +
> +	/* DPA to HPA translation */
> +	if (cxled->cxld.interleave_ways == 1) {
> +		dpa_offset = alloc_range->start - cxled->dpa_res->start;
> +		hpa = hpa_range.start + dpa_offset;
> +	} else {
> +		dev_err(mds->cxlds.dev, "Interleaving DC not supported\n");
> +		return -EINVAL;
> +	}

Check == 1 first and return. No need for else.


> +
> +	data = (struct dev_dax_data) {
> +		.dax_region = dax_region,
> +		.id = -1,

magic id number

> +		.size = 0,
> +	};
> +
> +	dev_dax = devm_create_dev_dax(&data);
> +	if (IS_ERR(dev_dax))
> +		return PTR_ERR(dev_dax);
> +
> +	if (IS_ALIGNED(range_len(alloc_range), max_t(unsigned long,
> +				dev_dax->align, memremap_compat_align()))) {
> +		rc = alloc_dev_dax_range(dev_dax, hpa,
> +					range_len(alloc_range));
> +		if (rc)
> +			return rc;
> +	}
> +
> +	rc = xa_insert(&cxlr_dc->dax_dev_list, hpa, dev_dax, GFP_KERNEL);
> +
> +	return rc;
> +}
> +
>   /* Establish an empty region covering the given HPA range */
>   static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>   					   struct cxl_endpoint_decoder *cxled)
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index a0b5819bc70b..e11651255780 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -122,7 +122,8 @@ TRACE_EVENT(cxl_aer_correctable_error,
>   		{ CXL_EVENT_TYPE_INFO, "Informational" },	\
>   		{ CXL_EVENT_TYPE_WARN, "Warning" },		\
>   		{ CXL_EVENT_TYPE_FAIL, "Failure" },		\
> -		{ CXL_EVENT_TYPE_FATAL, "Fatal" })
> +		{ CXL_EVENT_TYPE_FATAL, "Fatal" },		\
> +		{ CXL_EVENT_TYPE_DCD, "DCD" })
>   
>   TRACE_EVENT(cxl_overflow,
>   
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 7ac1237938b7..60c436b7ebb1 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -163,11 +163,13 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>   #define CXLDEV_EVENT_STATUS_WARN		BIT(1)
>   #define CXLDEV_EVENT_STATUS_FAIL		BIT(2)
>   #define CXLDEV_EVENT_STATUS_FATAL		BIT(3)
> +#define CXLDEV_EVENT_STATUS_DCD			BIT(4)
>   
>   #define CXLDEV_EVENT_STATUS_ALL (CXLDEV_EVENT_STATUS_INFO |	\
>   				 CXLDEV_EVENT_STATUS_WARN |	\
>   				 CXLDEV_EVENT_STATUS_FAIL |	\
> -				 CXLDEV_EVENT_STATUS_FATAL)
> +				 CXLDEV_EVENT_STATUS_FATAL|	\
> +				 CXLDEV_EVENT_STATUS_DCD)
>   
>   /* CXL rev 3.0 section 8.2.9.2.4; Table 8-52 */
>   #define CXLDEV_EVENT_INT_MODE_MASK	GENMASK(1, 0)
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 9c0b2fa72bdd..0440b5c04ef6 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -5,6 +5,7 @@
>   #include <uapi/linux/cxl_mem.h>
>   #include <linux/cdev.h>
>   #include <linux/uuid.h>
> +#include <linux/xarray.h>
>   #include "cxl.h"
>   
>   /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> @@ -226,6 +227,7 @@ struct cxl_event_interrupt_policy {
>   	u8 warn_settings;
>   	u8 failure_settings;
>   	u8 fatal_settings;
> +	u8 dyncap_settings;
>   } __packed;
>   
>   /**
> @@ -296,6 +298,13 @@ enum cxl_devtype {
>   #define CXL_MAX_DC_REGION 8
>   #define CXL_DC_REGION_SRTLEN 8
>   
> +struct cxl_dc_extent_data {
> +	u64 dpa_start;
> +	u64 length;
> +	u8 tag[16];
> +	u16 shared_extent_seq;
> +};
> +
>   /**
>    * struct cxl_dev_state - The driver device state
>    *
> @@ -406,6 +415,11 @@ struct cxl_memdev_state {
>   		u8 flags;
>   	} dc_region[CXL_MAX_DC_REGION];
>   
> +	u32 dc_list_gen_num;
> +	u32 dc_extents_index;
> +	struct xarray dc_extent_list;
> +	u32 num_dc_extents;
> +
>   	size_t dc_event_log_size;
>   	struct cxl_event_state event;
>   	struct cxl_poison_state poison;
> @@ -470,6 +484,17 @@ enum cxl_opcode {
>   	UUID_INIT(0xe1819d9, 0x11a9, 0x400c, 0x81, 0x1f, 0xd6, 0x07, 0x19,     \
>   		  0x40, 0x3d, 0x86)
>   
> +
> +struct cxl_mbox_dc_response {
> +	__le32 extent_list_size;
> +	u8 reserved[4];
> +	struct updated_extent_list {
> +		__le64 dpa_start;
> +		__le64 length;
> +		u8 reserved[8];
> +	} __packed extent_list[];
> +} __packed;
> +
>   struct cxl_mbox_get_supported_logs {
>   	__le16 entries;
>   	u8 rsvd[6];
> @@ -555,6 +580,7 @@ enum cxl_event_log_type {
>   	CXL_EVENT_TYPE_WARN,
>   	CXL_EVENT_TYPE_FAIL,
>   	CXL_EVENT_TYPE_FATAL,
> +	CXL_EVENT_TYPE_DCD,
>   	CXL_EVENT_TYPE_MAX
>   };
>   
> @@ -639,6 +665,35 @@ struct cxl_event_mem_module {
>   	u8 reserved[0x3d];
>   } __packed;
>   
> +/*
> + * Dynamic Capacity Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.5; Table 8-47
> + */
> +
> +#define CXL_EVENT_DC_TAG_SIZE	0x10
> +struct cxl_dc_extent {
> +	__le64 start_dpa;
> +	__le64 length;
> +	u8 tag[CXL_EVENT_DC_TAG_SIZE];
> +	__le16 shared_extn_seq;
> +	u8 reserved[6];
> +} __packed;
> +
> +struct dcd_record_data {
> +	u8 event_type;
> +	u8 reserved;
> +	__le16 host_id;
> +	u8 region_index;
> +	u8 reserved1[3];
> +	struct cxl_dc_extent extent;
> +	u8 reserved2[32];
> +} __packed;
> +
> +struct dcd_event_dyn_cap {
> +	struct cxl_event_record_hdr hdr;
> +	struct dcd_record_data data;
> +} __packed;
> +
>   struct cxl_mbox_get_partition_info {
>   	__le64 active_volatile_cap;
>   	__le64 active_persistent_cap;
> @@ -684,6 +739,19 @@ struct cxl_mbox_dynamic_capacity {
>   #define  CXL_SET_PARTITION_IMMEDIATE_FLAG	BIT(0)
>   #define CXL_DYNAMIC_CAPACITY_SANITIZE_ON_RELEASE_FLAG BIT(0)
>   
> +struct cxl_mbox_get_dc_extent {
> +	__le32 extent_cnt;
> +	__le32 start_extent_index;
> +} __packed;
> +
> +struct cxl_mbox_dc_extents {
> +	__le32 ret_extent_cnt;
> +	__le32 total_extent_cnt;
> +	__le32 extent_list_num;
> +	u8 rsvd[4];
> +	struct cxl_dc_extent extent[];
> +}  __packed;
> +
>   /* Set Timestamp CXL 3.0 Spec 8.2.9.4.2 */
>   struct cxl_mbox_set_timestamp_in {
>   	__le64 timestamp;
> @@ -826,6 +894,14 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
>   int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
>   int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
>   
> +/* FIXME why not have these be static in mbox.c? */
> +int cxl_add_dc_extent(struct cxl_memdev_state *mds, struct range *alloc_range);
> +int cxl_release_dc_extent(struct cxl_memdev_state *mds, struct range *rel_range);
> +int cxl_dev_get_dc_extent_cnt(struct cxl_memdev_state *mds,
> +			      unsigned int *extent_gen_num);
> +int cxl_dev_get_dc_extents(struct cxl_memdev_state *mds, unsigned int cnt,
> +			   unsigned int index);
> +
>   #ifdef CONFIG_CXL_SUSPEND
>   void cxl_mem_active_inc(void);
>   void cxl_mem_active_dec(void);
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index ac1a41bc083d..558ffbcb9b34 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -522,8 +522,8 @@ static int cxl_event_req_irq(struct cxl_dev_state *cxlds, u8 setting)
>   		return irq;
>   
>   	return devm_request_threaded_irq(dev, irq, NULL, cxl_event_thread,
> -					 IRQF_SHARED | IRQF_ONESHOT, NULL,
> -					 dev_id);
> +					IRQF_SHARED | IRQF_ONESHOT, NULL,
> +					dev_id);

Stray edit?

>   }
>   
>   static int cxl_event_get_int_policy(struct cxl_memdev_state *mds,
> @@ -555,6 +555,7 @@ static int cxl_event_config_msgnums(struct cxl_memdev_state *mds,
>   		.warn_settings = CXL_INT_MSI_MSIX,
>   		.failure_settings = CXL_INT_MSI_MSIX,
>   		.fatal_settings = CXL_INT_MSI_MSIX,
> +		.dyncap_settings = CXL_INT_MSI_MSIX,
>   	};
>   
>   	mbox_cmd = (struct cxl_mbox_cmd) {
> @@ -608,6 +609,11 @@ static int cxl_event_irqsetup(struct cxl_memdev_state *mds)
>   		return rc;
>   	}
>   
> +	rc = cxl_event_req_irq(cxlds, policy.dyncap_settings);
> +	if (rc) {
> +		dev_err(cxlds->dev, "Failed to get interrupt for event dc log\n");
> +		return rc;
> +	}
>   	return 0;
>   }
>   
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 227800053309..b2b27033f589 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -434,7 +434,7 @@ static void free_dev_dax_ranges(struct dev_dax *dev_dax)
>   		trim_dev_dax_range(dev_dax);
>   }
>   
> -static void unregister_dev_dax(void *dev)
> +void unregister_dev_dax(void *dev)
>   {
>   	struct dev_dax *dev_dax = to_dev_dax(dev);
>   
> @@ -445,6 +445,7 @@ static void unregister_dev_dax(void *dev)
>   	free_dev_dax_ranges(dev_dax);
>   	put_device(dev);
>   }
> +EXPORT_SYMBOL_GPL(unregister_dev_dax);
>   
>   /* a return value >= 0 indicates this invocation invalidated the id */
>   static int __free_dev_dax_id(struct dev_dax *dev_dax)
> @@ -641,7 +642,7 @@ static void dax_mapping_release(struct device *dev)
>   	kfree(mapping);
>   }
>   
> -static void unregister_dax_mapping(void *data)
> +void unregister_dax_mapping(void *data)
>   {
>   	struct device *dev = data;
>   	struct dax_mapping *mapping = to_dax_mapping(dev);
> @@ -658,7 +659,7 @@ static void unregister_dax_mapping(void *data)
>   	device_del(dev);
>   	put_device(dev);
>   }
> -
> +EXPORT_SYMBOL_GPL(unregister_dax_mapping);
>   static struct dev_dax_range *get_dax_range(struct device *dev)
>   {
>   	struct dax_mapping *mapping = to_dax_mapping(dev);
> @@ -793,7 +794,7 @@ static int devm_register_dax_mapping(struct dev_dax *dev_dax, int range_id)
>   	return 0;
>   }
>   
> -static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start,
> +int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start,
>   		resource_size_t size)
>   {
>   	struct dax_region *dax_region = dev_dax->region;
> @@ -853,6 +854,8 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start,
>   
>   	return rc;
>   }
> +EXPORT_SYMBOL_GPL(alloc_dev_dax_range);
> +
>   
>   static int adjust_dev_dax_range(struct dev_dax *dev_dax, struct resource *res, resource_size_t size)
>   {

Split the dax bus changes out as a prep patch?

> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index 8cd79ab34292..aa8418c7aead 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -47,8 +47,11 @@ int __dax_driver_register(struct dax_device_driver *dax_drv,
>   	__dax_driver_register(driver, THIS_MODULE, KBUILD_MODNAME)
>   void dax_driver_unregister(struct dax_device_driver *dax_drv);
>   void kill_dev_dax(struct dev_dax *dev_dax);
> +void unregister_dev_dax(void *dev);
> +void unregister_dax_mapping(void *data);
>   bool static_dev_dax(struct dev_dax *dev_dax);
> -
> +int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start,
> +					resource_size_t size);
>   /*
>    * While run_dax() is potentially a generic operation that could be
>    * defined in include/linux/dax.h we don't want to grow any users
> 

  parent reply	other threads:[~2023-06-15 16:59 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-14 19:16 [PATCH 0/5] cxl/dcd: Add support for Dynamic Capacity Devices (DCD) ira.weiny
2023-06-14 19:16 ` [PATCH 1/5] cxl/mem : Read Dynamic capacity configuration from the device ira.weiny
2023-06-14 22:53   ` Dave Jiang
2023-06-15 15:04     ` Ira Weiny
2023-06-14 23:49   ` Alison Schofield
2023-06-15 22:46     ` Ira Weiny
2023-06-15 18:30   ` Fan Ni
2023-06-15 19:17     ` Navneet Singh
2023-06-15 21:41   ` Fan Ni
2023-06-22 15:58   ` Jonathan Cameron
2023-06-24 13:08     ` Ira Weiny
2023-07-03  2:29       ` Jonathan Cameron
2023-06-14 19:16 ` [PATCH 2/5] cxl/region: Add dynamic capacity cxl region support ira.weiny
2023-06-14 23:37   ` Dave Jiang
2023-06-15 18:12     ` Ira Weiny
2023-06-15 18:28       ` Dave Jiang
2023-06-16  3:52         ` Navneet Singh
2023-06-15 18:56       ` Navneet Singh
2023-06-15  0:21   ` Alison Schofield
2023-06-16  2:06     ` Ira Weiny
2023-06-16 15:56       ` Alison Schofield
2023-06-16 16:51   ` Alison Schofield
2023-06-21  2:44     ` Ira Weiny
2023-06-20 17:55   ` Fan Ni
2023-06-20 20:33     ` Ira Weiny
2023-06-21  3:13     ` Navneet Singh
2023-06-21 17:20   ` Fan Ni
2023-06-23 18:02     ` Ira Weiny
2023-06-22 16:34   ` Jonathan Cameron
2023-07-05 14:49   ` Davidlohr Bueso
2023-06-14 19:16 ` [PATCH 3/5] cxl/mem : Expose dynamic capacity configuration to userspace ira.weiny
2023-06-15  0:40   ` Alison Schofield
2023-06-16  2:47     ` Ira Weiny
2023-06-16 15:58       ` Dave Jiang
2023-06-20 16:23         ` Ira Weiny
2023-06-20 16:48           ` Dave Jiang
2023-06-15 15:41   ` Dave Jiang
2023-06-14 19:16 ` [PATCH 4/5] cxl/mem: Add support to handle DCD add and release capacity events ira.weiny
2023-06-15  2:19   ` Alison Schofield
2023-06-16  4:11     ` Ira Weiny
2023-06-27 18:20       ` Fan Ni
2023-06-15 16:58   ` Dave Jiang [this message]
2023-06-22 17:01   ` Jonathan Cameron
2023-06-29 15:19     ` Ira Weiny
2023-06-27 18:17   ` Fan Ni
2023-07-13 12:55   ` Jørgen Hansen
2023-06-14 19:16 ` [PATCH 5/5] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2023-06-15 17:08   ` Dave Jiang
2023-06-15  0:56 ` [PATCH 0/5] cxl/dcd: Add support for Dynamic Capacity Devices (DCD) Alison Schofield
2023-06-16  2:57   ` Ira Weiny
2023-06-15 14:51 ` Ira Weiny
2023-06-22 15:07   ` Jonathan Cameron
2023-06-22 16:37     ` Jonathan Cameron
2023-06-27 14:59     ` Ira Weiny
2023-06-29 15:30 ` Ira Weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8d5594f-846b-2075-2a65-1378de9bfdfb@intel.com \
    --to=dave.jiang@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=dan.j.williams@intel.com \
    --cc=fan.ni@samsung.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=navneet.singh@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox