From: Ira Weiny <ira.weiny@intel.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>, <ira.weiny@intel.com>
Cc: Navneet Singh <navneet.singh@intel.com>,
Fan Ni <fan.ni@samsung.com>,
"Dan Williams" <dan.j.williams@intel.com>,
<linux-cxl@vger.kernel.org>
Subject: Re: [PATCH 1/5] cxl/mem : Read Dynamic capacity configuration from the device
Date: Sat, 24 Jun 2023 06:08:04 -0700 [thread overview]
Message-ID: <6496eab44738f_633d62946f@iweiny-mobl.notmuch> (raw)
In-Reply-To: <20230622165817.0000220e@Huawei.com>
Jonathan Cameron wrote:
> On Wed, 14 Jun 2023 12:16:28 -0700
> ira.weiny@intel.com wrote:
>
> > From: Navneet Singh <navneet.singh@intel.com>
> >
> > Read the Dynamic capacity configuration and store dynamic capacity region
> > information in the device state which driver will use to map into the HDM
> > ranges.
> >
> > Implement Get Dynamic Capacity Configuration (opcode 4800h) mailbox
> > command as specified in CXL 3.0 spec section 8.2.9.8.9.1.
> >
> > Signed-off-by: Navneet Singh <navneet.singh@intel.com>
>
> Hi Ira / Navneet,
>
> I'll probably overlap with comments of others (good to see so much review!)
Indeed! Thanks!
> so feel free to ignore duplication.
>
> Comments inline,
>
> Jonathan
>
>
>
> > +/**
> > + * cxl_dev_dynamic_capacity_identify() - Reads the dynamic capacity
> > + * information from the device.
> > + * @mds: The memory device state
> > + * Return: 0 if identify was executed successfully.
> > + *
> > + * This will dispatch the get_dynamic_capacity command to the device
> > + * and on success populate structures to be exported to sysfs.
> > + */
> > +int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds)
> > +{
> > + struct cxl_dev_state *cxlds = &mds->cxlds;
> > + struct device *dev = cxlds->dev;
> > + struct cxl_mbox_dynamic_capacity *dc;
>
> Calling it dc is confusing. I'd make it clear this is the mailbox
> response. config_resp or dc_config_res.
How about dc_resp?
>
> > + struct cxl_mbox_get_dc_config get_dc;
> > + struct cxl_mbox_cmd mbox_cmd;
> > + u64 next_dc_region_start;
> > + int rc, i;
> > +
> > + for (i = 0; i < CXL_MAX_DC_REGION; i++)
> > + sprintf(mds->dc_region[i].name, "dc%d", i);
> > +
> > + /* Check GET_DC_CONFIG is supported by device */
> > + if (!test_bit(CXL_DCD_ENABLED_GET_CONFIG, mds->dcd_cmds)) {
> > + dev_dbg(dev, "unsupported cmd: get_dynamic_capacity_config\n");
> > + return 0;
> > + }
> > +
> > + dc = kvmalloc(mds->payload_size, GFP_KERNEL);
> > + if (!dc)
> > + return -ENOMEM;
>
> Response to CXL_MBOX_OP_GET_DC_CONFIG has a known maximum
> size. Can we provide that instead of potentially much larger?
>
> 8 + 0x28 * 8 I think so 328 bytes. Use struct_size()
Actually yea and just putting that on the stack might also be better.
>
>
> But fun corner.... Mailbox is allowed to be smaller than that (256 bytes min
> I think) so need to handle multiple reads with different start regions.
Oh bother. :-/
What are the chances a device is going to only support 256B and DC? I think
you are correct though. I'll add a loop to handle this possibility.
Anyway I've adjusted the algorithm... Hopefully it will just loop 1 time.
> Which reminds me that we need to add support for running out of space
> in the mailbox to qemu... So far we've just made sure everything fitted :)
Might be nice to test stuff.
>
>
> > +
> > + get_dc = (struct cxl_mbox_get_dc_config) {
> > + .region_count = CXL_MAX_DC_REGION,
> > + .start_region_index = 0,
> > + };
> > +
> > + mbox_cmd = (struct cxl_mbox_cmd) {
> > + .opcode = CXL_MBOX_OP_GET_DC_CONFIG,
> > + .payload_in = &get_dc,
> > + .size_in = sizeof(get_dc),
> > + .size_out = mds->payload_size,
> > + .payload_out = dc,
> > + .min_out = 1,
> > + };
> > + rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> > + if (rc < 0)
> > + goto dc_error;
> The error label is a bit too generic. Why dc_error?
> "error" conveys just as small amount of info. I'd got for goto free_resp;
Sure. But if we make dc on the stack then we get rid of this entirely. I
prefer that.
>
> > +
> > + mds->nr_dc_region = dc->avail_region_count;
> > +
> > + if (mds->nr_dc_region < 1 || mds->nr_dc_region > CXL_MAX_DC_REGION) {
> > + dev_err(dev, "Invalid num of dynamic capacity regions %d\n",
> > + mds->nr_dc_region);
> > + rc = -EINVAL;
> > + goto dc_error;
> > + }
> > +
> > + for (i = 0; i < mds->nr_dc_region; i++) {
> > + struct cxl_dc_region_info *dcr = &mds->dc_region[i];
> > +
> > + dcr->base = le64_to_cpu(dc->region[i].region_base);
> > + dcr->decode_len =
> > + le64_to_cpu(dc->region[i].region_decode_length);
> > + dcr->decode_len *= CXL_CAPACITY_MULTIPLIER;
> > + dcr->len = le64_to_cpu(dc->region[i].region_length);
> > + dcr->blk_size = le64_to_cpu(dc->region[i].region_block_size);
> > +
> > + /* Check regions are in increasing DPA order */
> > + if ((i + 1) < mds->nr_dc_region) {
>
> Feels a bit odd to look at entries we haven't seen yet. Maybe flip this around
> to check the ones we have looked at?
Totally agree. I already did that.
> So don't start until 2nd region and then check
> it's start against mds->dc_region[0] etc?
Yep! It makes the check easier...
> Or factor out this loop contents in general and just pass in the single
> value needed for checking this. Biggest advantage being direct returns in
> that function as allocation and free will be in caller.
Did that too. I did not like how big cxl_dev_dynamic_capacity_identify() was
getting, especially with possibility of having to loop for a 2nd time.
>
>
> > + next_dc_region_start =
> > + le64_to_cpu(dc->region[i + 1].region_base);
> > + if ((dcr->base > next_dc_region_start) ||
> > + ((dcr->base + dcr->decode_len) > next_dc_region_start)) {
>
> Unless you have a negative decode length the second condition includes the first.
> So just check that.
... exactly!
>
> > + dev_err(dev,
> > + "DPA ordering violation for DC region %d and %d\n",
> > + i, i + 1);
> > + rc = -EINVAL;
> > + goto dc_error;
> > + }
> > + }
> > +
> > + /* Check the region is 256 MB aligned */
> > + if (!IS_ALIGNED(dcr->base, SZ_256M)) {
>
> That's an oddity. I wonder why those lower bits where defined as reserved...
> Anyhow code is right if paranoid ;)
:shrug:
>
> > + dev_err(dev, "DC region %d not aligned to 256MB\n", i);
> > + rc = -EINVAL;
> > + goto dc_error;
> > + }
> > +
> > + /* Check Region base and length are aligned to block size */
> > + if (!IS_ALIGNED(dcr->base, dcr->blk_size) ||
> > + !IS_ALIGNED(dcr->len, dcr->blk_size)) {
> > + dev_err(dev, "DC region %d not aligned to %#llx\n", i,
> > + dcr->blk_size);
> > + rc = -EINVAL;
> > + goto dc_error;
> > + }
> > +
> > + dcr->dsmad_handle =
> > + le32_to_cpu(dc->region[i].region_dsmad_handle);
> > + dcr->flags = dc->region[i].flags;
>
> I'd just grab these at same time as all the other fields above.
> A pattern where you fill values in only after checking would be fine, or one
> where you fill them in all in one place. The mixture of the two is less clear
> than either consistent approach.
Ok, yea this did seem odd but I kind of ignored it. Done.
>
> > + sprintf(dcr->name, "dc%d", i);
I may take this out too now that we always set the name regardless of if the
region is available.
Although... I wonder if setting the name to something like '<nil>' by default
would be beneficial in some way? :-/
> > +
> > + dev_dbg(dev,
> > + "DC region %s DPA: %#llx LEN: %#llx BLKSZ: %#llx\n",
> > + dcr->name, dcr->base, dcr->decode_len, dcr->blk_size);
> > + }
> > +
> > + /*
> > + * Calculate entire DPA range of all configured regions which will be mapped by
> > + * one or more HDM decoders
> > + */
> > + mds->total_dynamic_capacity =
> > + mds->dc_region[mds->nr_dc_region - 1].base +
> > + mds->dc_region[mds->nr_dc_region - 1].decode_len -
> > + mds->dc_region[0].base;
> > + dev_dbg(dev, "Total dynamic capacity: %#llx\n",
> > + mds->total_dynamic_capacity);
> > +
> > +dc_error:
> > + kvfree(dc);
> > + return rc;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_dev_dynamic_capacity_identify, CXL);
> > +
>
>
>
> > @@ -1121,13 +1289,23 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
> > }
> >
> > cxlds->dpa_res =
> > - (struct resource)DEFINE_RES_MEM(0, mds->total_bytes);
> > + (struct resource)DEFINE_RES_MEM(0, mds->total_capacity);
> > +
> > + for (int i = 0; i < CXL_MAX_DC_REGION; i++) {
> > + struct cxl_dc_region_info *dcr = &mds->dc_region[i];
> > +
> > + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->dc_res[i],
> > + dcr->base, dcr->decode_len, dcr->name);
> > + if (rc)
> > + return rc;
> > + }
> >
> > if (mds->partition_align_bytes == 0) {
> > rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
> > mds->volatile_only_bytes, "ram");
> > if (rc)
> > return rc;
> > +
>
> Scrub for this stuff before posting v2. Just noise that slows down review
> a little. If it is worth doing, do it in a separate patch.
I believe Alison or Dave caught that already.
>
> > return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res,
> > mds->volatile_only_bytes,
> > mds->persistent_only_bytes, "pmem");
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 89e560ea14c0..9c0b2fa72bdd 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> >
> >
> > +#define CXL_MAX_DC_REGION 8
> > +#define CXL_DC_REGION_SRTLEN 8
>
> SRT?
LOL oh yea 'STR'... :-D
>
> > +
> > /**
> > * struct cxl_dev_state - The driver device state
> > *
> > @@ -300,6 +312,8 @@ enum cxl_devtype {
> > * @dpa_res: Overall DPA resource tree for the device
> > * @pmem_res: Active Persistent memory capacity configuration
> > * @ram_res: Active Volatile memory capacity configuration
> > + * @dc_res: Active Dynamic Capacity memory configuration for each possible
> > + * region
> > * @component_reg_phys: register base of component registers
> > * @info: Cached DVSEC information about the device.
> > * @serial: PCIe Device Serial Number
> > @@ -315,6 +329,7 @@ struct cxl_dev_state {
> > struct resource dpa_res;
> > struct resource pmem_res;
> > struct resource ram_res;
> > + struct resource dc_res[CXL_MAX_DC_REGION];
> > resource_size_t component_reg_phys;
> > u64 serial;
> > enum cxl_devtype type;
>
> ...
>
> > @@ -357,9 +379,13 @@ struct cxl_memdev_state {
> > size_t lsa_size;
> > struct mutex mbox_mutex; /* Protects device mailbox and firmware */
> > char firmware_version[0x10];
> > + DECLARE_BITMAP(dcd_cmds, CXL_DCD_ENABLED_MAX);
> > DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
> > DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);
> > - u64 total_bytes;
> > +
> > + u64 total_capacity;
> > + u64 total_static_capacity;
> > + u64 total_dynamic_capacity;
> > u64 volatile_only_bytes;
> > u64 persistent_only_bytes;
> > u64 partition_align_bytes;
> > @@ -367,6 +393,20 @@ struct cxl_memdev_state {
> > u64 active_persistent_bytes;
> > u64 next_volatile_bytes;
> > u64 next_persistent_bytes;
> > +
> > + u8 nr_dc_region;
> > +
> > + struct cxl_dc_region_info {
> > + u8 name[CXL_DC_REGION_SRTLEN];
>
> char? SRT? Also isn't it a bit big? Looks like max 4 chars to me.
> Put it next to flags and we can save some space.
Well if I go forward with the idea of having them named something like '<nil>'
this would need to be longer than 4.
>
> > + u64 base;
> > + u64 decode_len;
> > + u64 len;
> > + u64 blk_size;
> > + u32 dsmad_handle;
> > + u8 flags;
> > + } dc_region[CXL_MAX_DC_REGION];
> > +
> > + size_t dc_event_log_size;
>
> > /*
> > @@ -617,7 +662,27 @@ struct cxl_mbox_set_partition_info {
> > u8 flags;
> > } __packed;
> >
> > +struct cxl_mbox_get_dc_config {
> > + u8 region_count;
> > + u8 start_region_index;
> > +} __packed;
> > +
> > +/* See CXL 3.0 Table 125 get dynamic capacity config Output Payload */
> > +struct cxl_mbox_dynamic_capacity {
> > + u8 avail_region_count;
> > + u8 rsvd[7];
> > + struct cxl_dc_region_config {
> > + __le64 region_base;
> > + __le64 region_decode_length;
> > + __le64 region_length;
> > + __le64 region_block_size;
> > + __le32 region_dsmad_handle;
> > + u8 flags;
> > + u8 rsvd[3];
> > + } __packed region[];
> > +} __packed;
> > #define CXL_SET_PARTITION_IMMEDIATE_FLAG BIT(0)
>
> This looks to have merged oddly with existing changes. I'd
> move the define into the structure definition so ti's clear which
> flag it reflects and avoids this sort of interleaving in future.
That has not been done anywhere in this file. I think in this case it was just
a mistake to separate the partition define from the partition structure. We
already fixed that based on Alisons feedback.
What I did do is move CXL_DC_REGION_STRLEN next to cxl_dc_region_info and made
it 7 chars to compact the structure a bit. I think having that define next to
the structure helps to show why the odd length.
While we are at it I'm changing the sprintf's to snprintf's. We are not in a
fast path and I'm paranoid now.
Ira
next prev parent reply other threads:[~2023-06-24 13:08 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-14 19:16 [PATCH 0/5] cxl/dcd: Add support for Dynamic Capacity Devices (DCD) ira.weiny
2023-06-14 19:16 ` [PATCH 1/5] cxl/mem : Read Dynamic capacity configuration from the device ira.weiny
2023-06-14 22:53 ` Dave Jiang
2023-06-15 15:04 ` Ira Weiny
2023-06-14 23:49 ` Alison Schofield
2023-06-15 22:46 ` Ira Weiny
2023-06-15 18:30 ` Fan Ni
2023-06-15 19:17 ` Navneet Singh
2023-06-15 21:41 ` Fan Ni
2023-06-22 15:58 ` Jonathan Cameron
2023-06-24 13:08 ` Ira Weiny [this message]
2023-07-03 2:29 ` Jonathan Cameron
2023-06-14 19:16 ` [PATCH 2/5] cxl/region: Add dynamic capacity cxl region support ira.weiny
2023-06-14 23:37 ` Dave Jiang
2023-06-15 18:12 ` Ira Weiny
2023-06-15 18:28 ` Dave Jiang
2023-06-16 3:52 ` Navneet Singh
2023-06-15 18:56 ` Navneet Singh
2023-06-15 0:21 ` Alison Schofield
2023-06-16 2:06 ` Ira Weiny
2023-06-16 15:56 ` Alison Schofield
2023-06-16 16:51 ` Alison Schofield
2023-06-21 2:44 ` Ira Weiny
2023-06-20 17:55 ` Fan Ni
2023-06-20 20:33 ` Ira Weiny
2023-06-21 3:13 ` Navneet Singh
2023-06-21 17:20 ` Fan Ni
2023-06-23 18:02 ` Ira Weiny
2023-06-22 16:34 ` Jonathan Cameron
2023-07-05 14:49 ` Davidlohr Bueso
2023-06-14 19:16 ` [PATCH 3/5] cxl/mem : Expose dynamic capacity configuration to userspace ira.weiny
2023-06-15 0:40 ` Alison Schofield
2023-06-16 2:47 ` Ira Weiny
2023-06-16 15:58 ` Dave Jiang
2023-06-20 16:23 ` Ira Weiny
2023-06-20 16:48 ` Dave Jiang
2023-06-15 15:41 ` Dave Jiang
2023-06-14 19:16 ` [PATCH 4/5] cxl/mem: Add support to handle DCD add and release capacity events ira.weiny
2023-06-15 2:19 ` Alison Schofield
2023-06-16 4:11 ` Ira Weiny
2023-06-27 18:20 ` Fan Ni
2023-06-15 16:58 ` Dave Jiang
2023-06-22 17:01 ` Jonathan Cameron
2023-06-29 15:19 ` Ira Weiny
2023-06-27 18:17 ` Fan Ni
2023-07-13 12:55 ` Jørgen Hansen
2023-06-14 19:16 ` [PATCH 5/5] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2023-06-15 17:08 ` Dave Jiang
2023-06-15 0:56 ` [PATCH 0/5] cxl/dcd: Add support for Dynamic Capacity Devices (DCD) Alison Schofield
2023-06-16 2:57 ` Ira Weiny
2023-06-15 14:51 ` Ira Weiny
2023-06-22 15:07 ` Jonathan Cameron
2023-06-22 16:37 ` Jonathan Cameron
2023-06-27 14:59 ` Ira Weiny
2023-06-29 15:30 ` Ira Weiny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6496eab44738f_633d62946f@iweiny-mobl.notmuch \
--to=ira.weiny@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=dan.j.williams@intel.com \
--cc=fan.ni@samsung.com \
--cc=linux-cxl@vger.kernel.org \
--cc=navneet.singh@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox