From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: <ira.weiny@intel.com>
Cc: Navneet Singh <navneet.singh@intel.com>,
Fan Ni <fan.ni@samsung.com>,
Dan Williams <dan.j.williams@intel.com>,
<linux-cxl@vger.kernel.org>
Subject: Re: [PATCH 1/5] cxl/mem : Read Dynamic capacity configuration from the device
Date: Thu, 22 Jun 2023 16:58:17 +0100 [thread overview]
Message-ID: <20230622165817.0000220e@Huawei.com> (raw)
In-Reply-To: <20230604-dcd-type2-upstream-v1-1-71b6341bae54@intel.com>
On Wed, 14 Jun 2023 12:16:28 -0700
ira.weiny@intel.com wrote:
> From: Navneet Singh <navneet.singh@intel.com>
>
> Read the Dynamic capacity configuration and store dynamic capacity region
> information in the device state which driver will use to map into the HDM
> ranges.
>
> Implement Get Dynamic Capacity Configuration (opcode 4800h) mailbox
> command as specified in CXL 3.0 spec section 8.2.9.8.9.1.
>
> Signed-off-by: Navneet Singh <navneet.singh@intel.com>
Hi Ira / Navneet,
I'll probably overlap with comments of others (good to see so much review!)
so feel free to ignore duplication.
Comments inline,
Jonathan
> +/**
> + * cxl_dev_dynamic_capacity_identify() - Reads the dynamic capacity
> + * information from the device.
> + * @mds: The memory device state
> + * Return: 0 if identify was executed successfully.
> + *
> + * This will dispatch the get_dynamic_capacity command to the device
> + * and on success populate structures to be exported to sysfs.
> + */
> +int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds)
> +{
> + struct cxl_dev_state *cxlds = &mds->cxlds;
> + struct device *dev = cxlds->dev;
> + struct cxl_mbox_dynamic_capacity *dc;
Calling it dc is confusing. I'd make it clear this is the mailbox
response. config_resp or dc_config_res.
> + struct cxl_mbox_get_dc_config get_dc;
> + struct cxl_mbox_cmd mbox_cmd;
> + u64 next_dc_region_start;
> + int rc, i;
> +
> + for (i = 0; i < CXL_MAX_DC_REGION; i++)
> + sprintf(mds->dc_region[i].name, "dc%d", i);
> +
> + /* Check GET_DC_CONFIG is supported by device */
> + if (!test_bit(CXL_DCD_ENABLED_GET_CONFIG, mds->dcd_cmds)) {
> + dev_dbg(dev, "unsupported cmd: get_dynamic_capacity_config\n");
> + return 0;
> + }
> +
> + dc = kvmalloc(mds->payload_size, GFP_KERNEL);
> + if (!dc)
> + return -ENOMEM;
Response to CXL_MBOX_OP_GET_DC_CONFIG has a known maximum
size. Can we provide that instead of potentially much larger?
8 + 0x28 * 8 I think so 328 bytes. Use struct_size()
But fun corner.... Mailbox is allowed to be smaller than that (256 bytes min
I think) so need to handle multiple reads with different start regions.
Which reminds me that we need to add support for running out of space
in the mailbox to qemu... So far we've just made sure everything fitted :)
> +
> + get_dc = (struct cxl_mbox_get_dc_config) {
> + .region_count = CXL_MAX_DC_REGION,
> + .start_region_index = 0,
> + };
> +
> + mbox_cmd = (struct cxl_mbox_cmd) {
> + .opcode = CXL_MBOX_OP_GET_DC_CONFIG,
> + .payload_in = &get_dc,
> + .size_in = sizeof(get_dc),
> + .size_out = mds->payload_size,
> + .payload_out = dc,
> + .min_out = 1,
> + };
> + rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> + if (rc < 0)
> + goto dc_error;
The error label is a bit too generic. Why dc_error?
"error" conveys just as small amount of info. I'd got for goto free_resp;
> +
> + mds->nr_dc_region = dc->avail_region_count;
> +
> + if (mds->nr_dc_region < 1 || mds->nr_dc_region > CXL_MAX_DC_REGION) {
> + dev_err(dev, "Invalid num of dynamic capacity regions %d\n",
> + mds->nr_dc_region);
> + rc = -EINVAL;
> + goto dc_error;
> + }
> +
> + for (i = 0; i < mds->nr_dc_region; i++) {
> + struct cxl_dc_region_info *dcr = &mds->dc_region[i];
> +
> + dcr->base = le64_to_cpu(dc->region[i].region_base);
> + dcr->decode_len =
> + le64_to_cpu(dc->region[i].region_decode_length);
> + dcr->decode_len *= CXL_CAPACITY_MULTIPLIER;
> + dcr->len = le64_to_cpu(dc->region[i].region_length);
> + dcr->blk_size = le64_to_cpu(dc->region[i].region_block_size);
> +
> + /* Check regions are in increasing DPA order */
> + if ((i + 1) < mds->nr_dc_region) {
Feels a bit odd to look at entries we haven't seen yet. Maybe flip this around
to check the ones we have looked at? So don't start until 2nd region and then check
it's start against mds->dc_region[0] etc?
Or factor out this loop contents in general and just pass in the single
value needed for checking this. Biggest advantage being direct returns in
that function as allocation and free will be in caller.
> + next_dc_region_start =
> + le64_to_cpu(dc->region[i + 1].region_base);
> + if ((dcr->base > next_dc_region_start) ||
> + ((dcr->base + dcr->decode_len) > next_dc_region_start)) {
Unless you have a negative decode length the second condition includes the first.
So just check that.
> + dev_err(dev,
> + "DPA ordering violation for DC region %d and %d\n",
> + i, i + 1);
> + rc = -EINVAL;
> + goto dc_error;
> + }
> + }
> +
> + /* Check the region is 256 MB aligned */
> + if (!IS_ALIGNED(dcr->base, SZ_256M)) {
That's an oddity. I wonder why those lower bits where defined as reserved...
Anyhow code is right if paranoid ;)
> + dev_err(dev, "DC region %d not aligned to 256MB\n", i);
> + rc = -EINVAL;
> + goto dc_error;
> + }
> +
> + /* Check Region base and length are aligned to block size */
> + if (!IS_ALIGNED(dcr->base, dcr->blk_size) ||
> + !IS_ALIGNED(dcr->len, dcr->blk_size)) {
> + dev_err(dev, "DC region %d not aligned to %#llx\n", i,
> + dcr->blk_size);
> + rc = -EINVAL;
> + goto dc_error;
> + }
> +
> + dcr->dsmad_handle =
> + le32_to_cpu(dc->region[i].region_dsmad_handle);
> + dcr->flags = dc->region[i].flags;
I'd just grab these at same time as all the other fields above.
A pattern where you fill values in only after checking would be fine, or one
where you fill them in all in one place. The mixture of the two is less clear
than either consistent approach.
> + sprintf(dcr->name, "dc%d", i);
> +
> + dev_dbg(dev,
> + "DC region %s DPA: %#llx LEN: %#llx BLKSZ: %#llx\n",
> + dcr->name, dcr->base, dcr->decode_len, dcr->blk_size);
> + }
> +
> + /*
> + * Calculate entire DPA range of all configured regions which will be mapped by
> + * one or more HDM decoders
> + */
> + mds->total_dynamic_capacity =
> + mds->dc_region[mds->nr_dc_region - 1].base +
> + mds->dc_region[mds->nr_dc_region - 1].decode_len -
> + mds->dc_region[0].base;
> + dev_dbg(dev, "Total dynamic capacity: %#llx\n",
> + mds->total_dynamic_capacity);
> +
> +dc_error:
> + kvfree(dc);
> + return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_dev_dynamic_capacity_identify, CXL);
> +
> @@ -1121,13 +1289,23 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
> }
>
> cxlds->dpa_res =
> - (struct resource)DEFINE_RES_MEM(0, mds->total_bytes);
> + (struct resource)DEFINE_RES_MEM(0, mds->total_capacity);
> +
> + for (int i = 0; i < CXL_MAX_DC_REGION; i++) {
> + struct cxl_dc_region_info *dcr = &mds->dc_region[i];
> +
> + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->dc_res[i],
> + dcr->base, dcr->decode_len, dcr->name);
> + if (rc)
> + return rc;
> + }
>
> if (mds->partition_align_bytes == 0) {
> rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
> mds->volatile_only_bytes, "ram");
> if (rc)
> return rc;
> +
Scrub for this stuff before posting v2. Just noise that slows down review
a little. If it is worth doing, do it in a separate patch.
> return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res,
> mds->volatile_only_bytes,
> mds->persistent_only_bytes, "pmem");
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 89e560ea14c0..9c0b2fa72bdd 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
>
>
> +#define CXL_MAX_DC_REGION 8
> +#define CXL_DC_REGION_SRTLEN 8
SRT?
> +
> /**
> * struct cxl_dev_state - The driver device state
> *
> @@ -300,6 +312,8 @@ enum cxl_devtype {
> * @dpa_res: Overall DPA resource tree for the device
> * @pmem_res: Active Persistent memory capacity configuration
> * @ram_res: Active Volatile memory capacity configuration
> + * @dc_res: Active Dynamic Capacity memory configuration for each possible
> + * region
> * @component_reg_phys: register base of component registers
> * @info: Cached DVSEC information about the device.
> * @serial: PCIe Device Serial Number
> @@ -315,6 +329,7 @@ struct cxl_dev_state {
> struct resource dpa_res;
> struct resource pmem_res;
> struct resource ram_res;
> + struct resource dc_res[CXL_MAX_DC_REGION];
> resource_size_t component_reg_phys;
> u64 serial;
> enum cxl_devtype type;
...
> @@ -357,9 +379,13 @@ struct cxl_memdev_state {
> size_t lsa_size;
> struct mutex mbox_mutex; /* Protects device mailbox and firmware */
> char firmware_version[0x10];
> + DECLARE_BITMAP(dcd_cmds, CXL_DCD_ENABLED_MAX);
> DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
> DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);
> - u64 total_bytes;
> +
> + u64 total_capacity;
> + u64 total_static_capacity;
> + u64 total_dynamic_capacity;
> u64 volatile_only_bytes;
> u64 persistent_only_bytes;
> u64 partition_align_bytes;
> @@ -367,6 +393,20 @@ struct cxl_memdev_state {
> u64 active_persistent_bytes;
> u64 next_volatile_bytes;
> u64 next_persistent_bytes;
> +
> + u8 nr_dc_region;
> +
> + struct cxl_dc_region_info {
> + u8 name[CXL_DC_REGION_SRTLEN];
char? SRT? Also isn't it a bit big? Looks like max 4 chars to me.
Put it next to flags and we can save some space.
> + u64 base;
> + u64 decode_len;
> + u64 len;
> + u64 blk_size;
> + u32 dsmad_handle;
> + u8 flags;
> + } dc_region[CXL_MAX_DC_REGION];
> +
> + size_t dc_event_log_size;
> /*
> @@ -617,7 +662,27 @@ struct cxl_mbox_set_partition_info {
> u8 flags;
> } __packed;
>
> +struct cxl_mbox_get_dc_config {
> + u8 region_count;
> + u8 start_region_index;
> +} __packed;
> +
> +/* See CXL 3.0 Table 125 get dynamic capacity config Output Payload */
> +struct cxl_mbox_dynamic_capacity {
> + u8 avail_region_count;
> + u8 rsvd[7];
> + struct cxl_dc_region_config {
> + __le64 region_base;
> + __le64 region_decode_length;
> + __le64 region_length;
> + __le64 region_block_size;
> + __le32 region_dsmad_handle;
> + u8 flags;
> + u8 rsvd[3];
> + } __packed region[];
> +} __packed;
> #define CXL_SET_PARTITION_IMMEDIATE_FLAG BIT(0)
This looks to have merged oddly with existing changes. I'd
move the define into the structure definition so ti's clear which
flag it reflects and avoids this sort of interleaving in future.
> +#define CXL_DYNAMIC_CAPACITY_SANITIZE_ON_RELEASE_FLAG BIT(0)
>
next prev parent reply other threads:[~2023-06-22 15:58 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-14 19:16 [PATCH 0/5] cxl/dcd: Add support for Dynamic Capacity Devices (DCD) ira.weiny
2023-06-14 19:16 ` [PATCH 1/5] cxl/mem : Read Dynamic capacity configuration from the device ira.weiny
2023-06-14 22:53 ` Dave Jiang
2023-06-15 15:04 ` Ira Weiny
2023-06-14 23:49 ` Alison Schofield
2023-06-15 22:46 ` Ira Weiny
2023-06-15 18:30 ` Fan Ni
2023-06-15 19:17 ` Navneet Singh
2023-06-15 21:41 ` Fan Ni
2023-06-22 15:58 ` Jonathan Cameron [this message]
2023-06-24 13:08 ` Ira Weiny
2023-07-03 2:29 ` Jonathan Cameron
2023-06-14 19:16 ` [PATCH 2/5] cxl/region: Add dynamic capacity cxl region support ira.weiny
2023-06-14 23:37 ` Dave Jiang
2023-06-15 18:12 ` Ira Weiny
2023-06-15 18:28 ` Dave Jiang
2023-06-16 3:52 ` Navneet Singh
2023-06-15 18:56 ` Navneet Singh
2023-06-15 0:21 ` Alison Schofield
2023-06-16 2:06 ` Ira Weiny
2023-06-16 15:56 ` Alison Schofield
2023-06-16 16:51 ` Alison Schofield
2023-06-21 2:44 ` Ira Weiny
2023-06-20 17:55 ` Fan Ni
2023-06-20 20:33 ` Ira Weiny
2023-06-21 3:13 ` Navneet Singh
2023-06-21 17:20 ` Fan Ni
2023-06-23 18:02 ` Ira Weiny
2023-06-22 16:34 ` Jonathan Cameron
2023-07-05 14:49 ` Davidlohr Bueso
2023-06-14 19:16 ` [PATCH 3/5] cxl/mem : Expose dynamic capacity configuration to userspace ira.weiny
2023-06-15 0:40 ` Alison Schofield
2023-06-16 2:47 ` Ira Weiny
2023-06-16 15:58 ` Dave Jiang
2023-06-20 16:23 ` Ira Weiny
2023-06-20 16:48 ` Dave Jiang
2023-06-15 15:41 ` Dave Jiang
2023-06-14 19:16 ` [PATCH 4/5] cxl/mem: Add support to handle DCD add and release capacity events ira.weiny
2023-06-15 2:19 ` Alison Schofield
2023-06-16 4:11 ` Ira Weiny
2023-06-27 18:20 ` Fan Ni
2023-06-15 16:58 ` Dave Jiang
2023-06-22 17:01 ` Jonathan Cameron
2023-06-29 15:19 ` Ira Weiny
2023-06-27 18:17 ` Fan Ni
2023-07-13 12:55 ` Jørgen Hansen
2023-06-14 19:16 ` [PATCH 5/5] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2023-06-15 17:08 ` Dave Jiang
2023-06-15 0:56 ` [PATCH 0/5] cxl/dcd: Add support for Dynamic Capacity Devices (DCD) Alison Schofield
2023-06-16 2:57 ` Ira Weiny
2023-06-15 14:51 ` Ira Weiny
2023-06-22 15:07 ` Jonathan Cameron
2023-06-22 16:37 ` Jonathan Cameron
2023-06-27 14:59 ` Ira Weiny
2023-06-29 15:30 ` Ira Weiny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230622165817.0000220e@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=dan.j.williams@intel.com \
--cc=fan.ni@samsung.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=navneet.singh@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox