Linux CXL
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: <alison.schofield@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Ben Widawsky <bwidawsk@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>
Cc: Alison Schofield <alison.schofield@intel.com>,
	<linux-cxl@vger.kernel.org>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>
Subject: RE: [PATCH v12 1/6] cxl/mbox: Add GET_POISON_LIST mailbox command
Date: Tue, 11 Apr 2023 18:47:56 -0700	[thread overview]
Message-ID: <64360dcc59cb0_417e294de@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <e87e1792e147a46f348019cc772e06f2ea19e970.1681159309.git.alison.schofield@intel.com>

alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices maintain a list of locations that are poisoned or result
> in poison if the addresses are accessed by the host.
> 
> Per the spec, (CXL 3.0 8.2.9.8.4.1), the device returns this Poison
> list as a set of Media Error Records that include the source of the
> error, the starting device physical address, and length. The length is
> the number of adjacent DPAs in the record and is in units of 64 bytes.
> 
> Retrieve the poison list.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  drivers/cxl/core/mbox.c | 71 +++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h    | 67 ++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/pci.c       |  4 +++
>  3 files changed, 142 insertions(+)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index f2addb457172..69a5d69dd53b 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -5,6 +5,8 @@
>  #include <linux/debugfs.h>
>  #include <linux/ktime.h>
>  #include <linux/mutex.h>
> +#include <asm/unaligned.h>
> +#include <cxlpci.h>
>  #include <cxlmem.h>
>  #include <cxl.h>
>  
> @@ -994,6 +996,7 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
>  	/* See CXL 2.0 Table 175 Identify Memory Device Output Payload */
>  	struct cxl_mbox_identify id;
>  	struct cxl_mbox_cmd mbox_cmd;
> +	u32 val;
>  	int rc;
>  
>  	mbox_cmd = (struct cxl_mbox_cmd) {
> @@ -1017,6 +1020,11 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
>  	cxlds->lsa_size = le32_to_cpu(id.lsa_size);
>  	memcpy(cxlds->firmware_version, id.fw_revision, sizeof(id.fw_revision));
>  
> +	if (test_bit(CXL_MEM_COMMAND_ID_GET_POISON, cxlds->enabled_cmds)) {
> +		val = get_unaligned_le24(id.poison_list_max_mer);
> +		cxlds->poison.max_errors = min_t(u32, val, CXL_POISON_LIST_MAX);
> +	}
> +

With this new interface I do not expect we want to support user tooling
that wants to retrieve the list via ioctl. So I think this wants a
lead-in patch that deprecates the poison command support so that the
linux-cxl community only has one mechanism to maintain going forward.

Something like the below as a lead-in, and then you would add code to
cxl_walk_cel() to set a flag for the "get poison" machinery.

-- >8 --
From f2cd1d1e09fe6f36255f3b8cd831b2b4903045d4 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams@intel.com>
Date: Tue, 11 Apr 2023 17:48:45 -0700
Subject: [PATCH] cxl/mbox: Deprecate poison commands

The CXL subsystem is adding a formal mechanism for retrieving the poison
list. Minimize the maintenance burden going forward, and maximize the
investment in common tooling by deprecating direct user access to issue
this command outside of CXL_MEM_RAW_COMMANDS debug scenarios.

A new cxl_deprecated_commands[] list is created for querying which
command ids defined in previous kernels are now deprecated.

Effectively all of the commands defined in:

87815ee9d006 ("cxl/pci: Add media provisioning required commands")

...were defined prematurely and should have waited until the kernel
implementation was decided. To my knowledge there are no shipping
devices with poison listing support and no known tools that would
regress with this change.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/mbox.c      |  3 ---
 include/uapi/linux/cxl_mem.h | 31 ++++++++++++++++++++++++++++---
 2 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index f2addb457172..8e24038b8769 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -61,9 +61,6 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
 	CXL_CMD(SET_ALERT_CONFIG, 0xc, 0, 0),
 	CXL_CMD(GET_SHUTDOWN_STATE, 0, 0x1, 0),
 	CXL_CMD(SET_SHUTDOWN_STATE, 0x1, 0, 0),
-	CXL_CMD(GET_POISON, 0x10, CXL_VARIABLE_PAYLOAD, 0),
-	CXL_CMD(INJECT_POISON, 0x8, 0, 0),
-	CXL_CMD(CLEAR_POISON, 0x48, 0, 0),
 	CXL_CMD(GET_SCAN_MEDIA_CAPS, 0x10, 0x4, 0),
 	CXL_CMD(SCAN_MEDIA, 0x11, 0, 0),
 	CXL_CMD(GET_SCAN_MEDIA, 0, CXL_VARIABLE_PAYLOAD, 0),
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index 86bbacf2a315..90f17343f1ba 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -40,19 +40,22 @@
 	___C(SET_ALERT_CONFIG, "Set Alert Configuration"),                \
 	___C(GET_SHUTDOWN_STATE, "Get Shutdown State"),                   \
 	___C(SET_SHUTDOWN_STATE, "Set Shutdown State"),                   \
-	___C(GET_POISON, "Get Poison List"),                              \
-	___C(INJECT_POISON, "Inject Poison"),                             \
-	___C(CLEAR_POISON, "Clear Poison"),                               \
+	___DEPRECATED(GET_POISON, "Get Poison List"),                     \
+	___DEPRECATED(INJECT_POISON, "Inject Poison"),                    \
+	___DEPRECATED(CLEAR_POISON, "Clear Poison"),                      \
 	___C(GET_SCAN_MEDIA_CAPS, "Get Scan Media Capabilities"),         \
 	___C(SCAN_MEDIA, "Scan Media"),                                   \
 	___C(GET_SCAN_MEDIA, "Get Scan Media Results"),                   \
 	___C(MAX, "invalid / last command")
 
 #define ___C(a, b) CXL_MEM_COMMAND_ID_##a
+#define ___DEPRECATED(a, b) CXL_MEM_DEPRECATED_ID_##a
 enum { CXL_CMDS };
 
 #undef ___C
+#undef ___DEPRECATED
 #define ___C(a, b) { b }
+#define ___DEPRECATED(a, b) { "Deprecated " b }
 static const struct {
 	const char *name;
 } cxl_command_names[] __attribute__((__unused__)) = { CXL_CMDS };
@@ -68,6 +71,28 @@ static const struct {
  */
 
 #undef ___C
+#undef ___DEPRECATED
+#define ___C(a, b) (0)
+#define ___DEPRECATED(a, b) (1)
+
+static const u8 cxl_deprecated_commands[]
+	__attribute__((__unused__)) = { CXL_CMDS };
+
+/*
+ * Here's how this actually breaks out:
+ * cxl_deprecated_commands[] = {
+ *	[CXL_MEM_COMMAND_ID_INVALID] = 0,
+ *	[CXL_MEM_COMMAND_ID_IDENTIFY] = 0,
+ *	...
+ *	[CXL_MEM_DEPRECATED_ID_GET_POISON] = 1,
+ *	[CXL_MEM_DEPRECATED_ID_INJECT_POISON] = 1,
+ *	[CXL_MEM_DEPRECATED_ID_CLEAR_POISON] = 1,
+ *	...
+ * };
+ */
+
+#undef ___C
+#undef ___DEPRECATED
 
 /**
  * struct cxl_command_info - Command information returned from a query.
-- 
2.39.2
-- 8< --

>  	return 0;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_dev_state_identify, CXL);
> @@ -1107,6 +1115,69 @@ int cxl_set_timestamp(struct cxl_dev_state *cxlds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_set_timestamp, CXL);
>  
> +int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> +		       struct cxl_region *cxlr)
> +{
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_mbox_poison_out *po;
> +	struct cxl_mbox_poison_in pi;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	int nr_records = 0;
> +	int rc;
> +
> +	rc = mutex_lock_interruptible(&cxlds->poison.lock);
> +	if (rc)
> +		return rc;
> +
> +	po = cxlds->poison.list_out;
> +	pi.offset = cpu_to_le64(offset);
> +	pi.length = cpu_to_le64(len / CXL_POISON_LEN_MULT);
> +
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_GET_POISON,
> +		.size_in = sizeof(pi),
> +		.payload_in = &pi,
> +		.size_out = cxlds->payload_size,
> +		.payload_out = po,
> +		.min_out = struct_size(po, record, 0),
> +	};
> +
> +	do {
> +		rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
> +		if (rc)
> +			break;
> +
> +		/* TODO TRACE the media error records */
> +
> +		/* Protect against an uncleared _FLAG_MORE */
> +		nr_records = nr_records + le16_to_cpu(po->count);
> +		if (nr_records >= cxlds->poison.max_errors) {
> +			dev_dbg(&cxlmd->dev, "Max Error Records reached: %d\n",
> +				nr_records);
> +			break;
> +		}
> +	} while (po->flags & CXL_POISON_FLAG_MORE);
> +
> +	mutex_unlock(&cxlds->poison.lock);
> +	return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison, CXL);
> +
> +int cxl_poison_state_init(struct cxl_dev_state *cxlds)
> +{
> +	if (!test_bit(CXL_MEM_COMMAND_ID_GET_POISON, cxlds->enabled_cmds))
> +		return 0;
> +
> +	cxlds->poison.list_out = devm_kzalloc(cxlds->dev, cxlds->payload_size,
> +					      GFP_KERNEL);

Given the payload can be multiple pages in size use kvmalloc() like
cxl_mem_alloc_event_buf().

> +	if (!cxlds->poison.list_out)
> +		return -ENOMEM;
> +
> +	mutex_init(&cxlds->poison.lock);
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, CXL);
> +
>  struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
>  {
>  	struct cxl_dev_state *cxlds;
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index ccbafc05a636..a3033c8dd8e2 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -215,6 +215,24 @@ struct cxl_event_state {
>  	struct mutex log_lock;
>  };
>  
> +/**
> + * struct cxl_poison_state - Driver poison state info
> + *
> + * @max_errors: Maximum media error records held in device cache
> + * @list_out: The poison list payload returned by device
> + * @lock: Protect reads of the poison list
> + *
> + * Reads of the poison list are synchronized to ensure that a reader
> + * does not get an incomplete list because their request overlapped
> + * (was interrupted or preceded by) another read request of the same
> + * DPA range. CXL Spec 3.0 Section 8.2.9.8.4.1
> + */
> +struct cxl_poison_state {
> +	u32 max_errors;
> +	struct cxl_mbox_poison_out *list_out;
> +	struct mutex lock;  /* Protect reads of poison list */
> +};
> +
>  /**
>   * struct cxl_dev_state - The driver device state
>   *
> @@ -251,6 +269,7 @@ struct cxl_event_state {
>   * @serial: PCIe Device Serial Number
>   * @doe_mbs: PCI DOE mailbox array
>   * @event: event log driver state
> + * @poison: poison driver state info
>   * @mbox_send: @dev specific transport for transmitting mailbox commands
>   *
>   * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
> @@ -290,6 +309,7 @@ struct cxl_dev_state {
>  	struct xarray doe_mbs;
>  
>  	struct cxl_event_state event;
> +	struct cxl_poison_state poison;
>  
>  	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
>  };
> @@ -538,6 +558,50 @@ struct cxl_mbox_set_timestamp_in {
>  
>  } __packed;
>  
> +/* Get Poison List  CXL 3.0 Spec 8.2.9.8.4.1 */
> +struct cxl_mbox_poison_in {
> +	__le64 offset;
> +	__le64 length;
> +} __packed;
> +
> +struct cxl_mbox_poison_out {
> +	u8 flags;
> +	u8 rsvd1;
> +	__le64 overflow_t;

I was wondering what the "_t" meant, I always read that as "type". Perhaps
"_ts" or even just spell out "_timestamp".

Aside from the minor fixups and reworking the enumeration mechanism per
above, this looks good to me.

  reply	other threads:[~2023-04-12  1:48 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-10 20:55 [PATCH v12 0/6] CXL Poison List Retrieval & Tracing alison.schofield
2023-04-10 20:55 ` [PATCH v12 1/6] cxl/mbox: Add GET_POISON_LIST mailbox command alison.schofield
2023-04-12  1:47   ` Dan Williams [this message]
2023-04-12  4:45     ` Alison Schofield
2023-04-12  5:18       ` Dan Williams
2023-04-12 18:01         ` Alison Schofield
2023-04-12 19:16           ` Dan Williams
2023-04-12 18:06     ` Alison Schofield
2023-04-13 16:48     ` Alison Schofield
2023-04-13 18:34       ` Dan Williams
2023-04-17 16:32     ` Alison Schofield
2023-04-17 19:39       ` Dan Williams
2023-04-10 20:55 ` [PATCH v12 2/6] cxl/trace: Add TRACE support for CXL media-error records alison.schofield
2023-04-10 20:55 ` [PATCH v12 3/6] cxl/memdev: Add trigger_poison_list sysfs attribute alison.schofield
2023-04-12  5:37   ` Dan Williams
2023-04-12 18:32     ` Alison Schofield
2023-04-12 19:34       ` Dan Williams
2023-04-10 20:55 ` [PATCH v12 4/6] cxl/region: Provide region info to the cxl_poison trace event alison.schofield
2023-04-12  5:55   ` Dan Williams
2023-04-12 18:39     ` Alison Schofield
2023-04-12 22:09       ` Dan Williams
2023-04-10 20:55 ` [PATCH v12 5/6] cxl/trace: Add an HPA to cxl_poison trace events alison.schofield
2023-04-10 20:55 ` [PATCH v12 6/6] tools/testing/cxl: Mock support for Get Poison List alison.schofield

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=64360dcc59cb0_417e294de@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=bwidawsk@kernel.org \
    --cc=dave.jiang@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox