[PATCH v4 0/7] cxl: Background cmds and device sanitation

Linux CXL
 help / color / mirror / Atom feed

* [PATCH v4 0/7] cxl: Background cmds and device sanitation
@ 2023-04-21  9:23 Davidlohr Bueso
  2023-04-21  9:23 ` [PATCH 1/7] cxl/pci: Allocate irq vectors earlier in pci probe Davidlohr Bueso
                   ` (7 more replies)
  0 siblings, 8 replies; 38+ messages in thread
From: Davidlohr Bueso @ 2023-04-21  9:23 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Jonathan.Cameron, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, dave, linux-cxl

Hi,

First thanks for the reviewing and feedback. It took me a while to get back to this in a
form of a new series between vacation, being sick and ensuring I (hopefully) covered all
your requirements. So all of this is intended for v6.5.

The first two patches can directly open the door for Scan Media, etc. and could be picked
up regardless of the rest of the series - I wouldn't want for Sanitize to hold up other
functionality, specially considering the special treatment it gets. That said, the synchronous
mbox bg handling code paths (patch 2) are the least tested in the series (for obvious reasons).

On that note, this series combines the original async semantics from the RFC/v2 specifically
for Sanitation while leaving the rest of the background-capable operations in the sync approach
from v3:
	https://lore.kernel.org/linux-cxl/20230224194652.1990604-1-dave@stgolabs.net/

Specifically, it's worth noting:

    o Supporting async polling for sanitation must protect against out-of-sync driver and hw.
      See testing (1) below.

    o Treating Sanitation as such a special beast can make the code a bit invasive imo,
      which I'm not crazy about but couldn't find a decent alternative. For example I realize
      that this is really ad-hoc code in __cxl_pci_mbox_send_cmd().

    o Regardless of the state of the irq setup, the probing never fails, and falls back to
      async polling as last resource.
      
    o Nothing depends explcitly on CPU cacheline management
      
    o All sysfs files/attributes in the security directory are visible.
    
    o I continue to use __ATTR() macros for sysfs attributes instead of the requested
      DEVICE_ATTR_*() ones because of the security directory (perhaps I'm missing something
      obvious).
    
    o I've dropped the 'security/state' sysfs file creation - I will use the a cached pmem
      security flags, but can be sent later once the rest is settled. The actual sanitize and
      erase commands do ask the hw about security - there is no risk of spamming the mailbox.

Testing.
========

o There are the mock device tests for Sanitize and Secure Erase.

o The latest (v2) qemu bg/sanitize support series is posted here:
	https://lore.kernel.org/linux-cxl/20230418172337.19207-1-dave@stgolabs.net/

(1) Window where driver is out of sync with hw (Sanitation async polling).

[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
[  159.297482] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:37:00.0: Sending command: 0x4400
[  159.298648] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:37:00.0: Doorbell wait took 0ms
[  159.299908] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:37:00.0: Sanitation operation started
>>>> qemu informs sanitation is done <<<<<
[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
[  165.897345] cxl_pci 0000:37:00.0: Failed to sanitize device : -16
[  171.692050] cxl_pci:cxl_mbox_sanitize_work:147: cxl_pci 0000:37:00.0: Sanitation operation ended
[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
[  173.373337] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:37:00.0: Sending command: 0x4400
[  173.374498] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:37:00.0: Doorbell wait took 0ms
[  173.375727] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:37:00.0: Sanitation operation started

(2) Perform sanitation of more than one memdev at a time (Sanitation async polling).

[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem1/security/sanitize
[  351.287129] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:36:00.0: Sending command: 0x4400
[  351.288403] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:36:00.0: Doorbell wait took 0ms
[  351.289706] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:36:00.0: Sanitation operation started
[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
[  353.058614] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:37:00.0: Sending command: 0x4400
[  353.059854] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:37:00.0: Doorbell wait took 0ms
[  353.061126] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:37:00.0: Sanitation operation started
>>>>  qemu informs sanitation is done <<<<<
>>>>  qemu informs sanitation is done <<<<<
[  363.692138] cxl_pci:cxl_mbox_sanitize_work:147: cxl_pci 0000:36:00.0: Sanitation operation ended
[  365.227416] cxl_pci:cxl_mbox_sanitize_work:147: cxl_pci 0000:37:00.0: Sanitation operation ended

(3) Perform sanitation of more than one memdev at a time (Sanitation async irq).

[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem1/security/sanitize
[  193.729821] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:c1:00.0: Sending command: 0x4400
[  193.731071] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:c1:00.0: Doorbell wait took 0ms
[  193.732360] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:c1:00.0: Sanitation operation started
[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
[  197.001466] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:36:00.0: Sending command: 0x4400
[  197.002694] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:36:00.0: Doorbell wait took 0ms
[  197.003956] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:36:00.0: Sanitation operation started
>>>> qemu says sanitation is done <<<<
[  197.731473] cxl_pci:cxl_pci_mbox_irq:119: cxl_pci 0000:c1:00.0: Sanitation operation ended
>>>> qemu says sanitation is done <<<<
[  201.003258] cxl_pci:cxl_pci_mbox_irq:119: cxl_pci 0000:36:00.0: Sanitation operation ended

(4) Forbid new sanitation while one is in progress (Sanitation asyn irq).
 
[root@fedora ~]# cat /sys/bus/cxl/devices/mem0/security/sanitize
disabled
[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
[   39.284258] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:36:00.0: Sending command: 0x4400
[   39.285459] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:36:00.0: Doorbell wait took 0ms
[   39.286723] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:36:00.0: Sanitation operation started
[root@fedora ~]# cat /sys/bus/cxl/devices/mem0/security/sanitize
sanitize
[root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
[   42.697129] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:36:00.0: Sending command: 0x4400
[   42.698323] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:36:00.0: Doorbell wait took 0ms
[   42.699525] cxl_pci:__cxl_pci_mbox_send_cmd:335: cxl_pci 0000:36:00.0: Mailbox operation had an error: ongoing background operation
[   42.701119] cxl_pci 0000:36:00.0: Failed to sanitize device : -6
>>>> qemu says sanitation is done <<<<
[   43.285334] cxl_pci:cxl_pci_mbox_irq:119: cxl_pci 0000:36:00.0: Sanitation operation ended

Applies against 'fixes' branch from cxl.git. Please consider for v6.5. 

Thanks!

Davidlohr Bueso (7):
  cxl/pci: Allocate irq vectors earlier in pci probe
  cxl/mbox: Add background cmd handling machinery
  cxl/mbox: Add sanitation handling machinery
  cxl/mem: Wire up Sanitation support
  cxl/test: Add Sanitize opcode support
  cxl/mem: Support Secure Erase
  cxl/test: Add Secure Erase opcode support

 Documentation/ABI/testing/sysfs-bus-cxl |  29 ++++
 drivers/cxl/core/mbox.c                 |  63 +++++++-
 drivers/cxl/core/memdev.c               | 120 +++++++++++++++
 drivers/cxl/cxl.h                       |   7 +
 drivers/cxl/cxlmem.h                    |  26 ++++
 drivers/cxl/pci.c                       | 192 +++++++++++++++++++++++-
 tools/testing/cxl/test/mem.c            |  52 +++++++
 7 files changed, 483 insertions(+), 6 deletions(-)

--
2.40.0


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/7] cxl/pci: Allocate irq vectors earlier in pci probe
  2023-04-21  9:23 [PATCH v4 0/7] cxl: Background cmds and device sanitation Davidlohr Bueso
@ 2023-04-21  9:23 ` Davidlohr Bueso
  2023-04-28 16:09   ` Dave Jiang
  2023-05-11 13:55   ` Jonathan Cameron
  2023-04-21  9:23 ` [PATCH 2/7] cxl/mbox: Add background cmd handling machinery Davidlohr Bueso
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 38+ messages in thread
From: Davidlohr Bueso @ 2023-04-21  9:23 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Jonathan.Cameron, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, dave, linux-cxl

Move the cxl_alloc_irq_vectors() call further up in the probing
in order to allow for mailbox interrupt usage. No change in
semantics.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 drivers/cxl/pci.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 60b23624d167..39b829a29f6c 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -757,6 +757,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (rc)
 		dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
 
+	rc = cxl_alloc_irq_vectors(pdev);
+	if (rc)
+		return rc;
+
 	rc = cxl_pci_setup_mailbox(cxlds);
 	if (rc)
 		return rc;
@@ -777,10 +781,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (rc)
 		return rc;
 
-	rc = cxl_alloc_irq_vectors(pdev);
-	if (rc)
-		return rc;
-
 	cxlmd = devm_cxl_add_memdev(cxlds);
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/7] cxl/mbox: Add background cmd handling machinery
  2023-04-21  9:23 [PATCH v4 0/7] cxl: Background cmds and device sanitation Davidlohr Bueso
  2023-04-21  9:23 ` [PATCH 1/7] cxl/pci: Allocate irq vectors earlier in pci probe Davidlohr Bueso
@ 2023-04-21  9:23 ` Davidlohr Bueso
  2023-04-23  7:54   ` Li, Ming
                     ` (2 more replies)
  2023-04-21  9:23 ` [PATCH 3/7] cxl/mbox: Add sanitation " Davidlohr Bueso
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 38+ messages in thread
From: Davidlohr Bueso @ 2023-04-21  9:23 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Jonathan.Cameron, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, dave, linux-cxl

This adds support for handling background operations, as defined in
the CXL 3.0 spec. Commands that can take too long (over ~2 seconds)
can run in the background asynchronously (to the hardware).

The driver will deal with such commands synchronously, blocking all
other incoming commands for a specified period of time, allowing
time-slicing the command such that the caller can send incremental
requests to avoid monopolizing the driver/device. This approach
makes the code simpler, where any out of sync (timeout) between the
driver and hardware is just disregarded as an invalid state until
the next successful submission.

On devices where mbox interrupts are supported, this will still use
a poller that will wakeup in the specified wait intervals. The irq
handler will simply awake a blocked cmd, which is also safe vs a
task that is either waking (timing out) or already awoken. Similarly
any irq setup error during the probing falls back to polling, thus
avoids unnecessarily erroring out.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 drivers/cxl/core/mbox.c |   3 +-
 drivers/cxl/cxl.h       |   7 +++
 drivers/cxl/cxlmem.h    |   5 ++
 drivers/cxl/pci.c       | 104 +++++++++++++++++++++++++++++++++++++++-
 4 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 6198637cb0bb..cde7270c6037 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -180,7 +180,8 @@ int cxl_internal_send_cmd(struct cxl_dev_state *cxlds,
 	if (rc)
 		return rc;
 
-	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS)
+	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS &&
+	    mbox_cmd->return_code != CXL_MBOX_CMD_RC_BACKGROUND)
 		return cxl_mbox_cmd_rc2errno(mbox_cmd);
 
 	if (!out_size)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 044a92d9813e..72731a896f58 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -176,14 +176,21 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
 /* CXL 2.0 8.2.8.4 Mailbox Registers */
 #define CXLDEV_MBOX_CAPS_OFFSET 0x00
 #define   CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
+#define   CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK GENMASK(10, 7)
+#define   CXLDEV_MBOX_CAP_BG_CMD_IRQ BIT(6)
 #define CXLDEV_MBOX_CTRL_OFFSET 0x04
 #define   CXLDEV_MBOX_CTRL_DOORBELL BIT(0)
+#define   CXLDEV_MBOX_CTRL_BG_CMD_IRQ BIT(2)
 #define CXLDEV_MBOX_CMD_OFFSET 0x08
 #define   CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)
 #define   CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK GENMASK_ULL(36, 16)
 #define CXLDEV_MBOX_STATUS_OFFSET 0x10
+#define   CXLDEV_MBOX_STATUS_BG_CMD BIT(0)
 #define   CXLDEV_MBOX_STATUS_RET_CODE_MASK GENMASK_ULL(47, 32)
 #define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18
+#define   CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)
+#define   CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK GENMASK_ULL(22, 16)
+#define   CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK GENMASK_ULL(47, 32)
 #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
 
 /*
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 090acebba4fa..8c3302fc7738 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -108,6 +108,9 @@ static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
  *            variable sized output commands, it tells the exact number of bytes
  *            written.
  * @min_out: (input) internal command output payload size validation
+ * @poll_count: (input)  Number of timeouts to attempt.
+ * @poll_interval: (input) Number of ms between mailbox background command
+ *                 polling intervals timeouts.
  * @return_code: (output) Error code returned from hardware.
  *
  * This is the primary mechanism used to send commands to the hardware.
@@ -123,6 +126,8 @@ struct cxl_mbox_cmd {
 	size_t size_in;
 	size_t size_out;
 	size_t min_out;
+	int poll_count;
+	int poll_interval;
 	u16 return_code;
 };
 
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 39b829a29f6c..aa1bb74a52a1 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -51,6 +51,7 @@
 static unsigned short mbox_ready_timeout = 60;
 module_param(mbox_ready_timeout, ushort, 0644);
 MODULE_PARM_DESC(mbox_ready_timeout, "seconds to wait for mailbox ready");
+static DECLARE_WAIT_QUEUE_HEAD(mbox_wait);
 
 static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
 {
@@ -85,6 +86,33 @@ static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
 			    status & CXLMDEV_DEV_FATAL ? " fatal" : "",        \
 			    status & CXLMDEV_FW_HALT ? " firmware-halt" : "")
 
+static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
+{
+	u64 reg;
+
+	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
+	return FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg) == 100;
+}
+
+static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
+{
+	struct cxl_dev_state *cxlds = id;
+
+	/* spurious or raced with hw? */
+	if (!cxl_mbox_background_complete(cxlds)) {
+		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+
+		dev_warn(&pdev->dev,
+			 "Mailbox background operation IRQ but incomplete\n");
+		goto done;
+	}
+
+	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
+	wake_up(&mbox_wait);
+done:
+	return IRQ_HANDLED;
+}
+
 /**
  * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
  * @cxlds: The device state to communicate with.
@@ -178,7 +206,59 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
 	mbox_cmd->return_code =
 		FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
 
-	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
+	/*
+	 * Handle the background command in a synchronous manner.
+	 *
+	 * All other mailbox commands will serialize/queue on the mbox_mutex,
+	 * which we currently hold. Furthermore this also guarantees that
+	 * cxl_mbox_background_complete() checks are safe amongst each other,
+	 * in that no new bg operation can occur in between.
+	 *
+	 * Background operations are timesliced in accordance with the nature
+	 * of the command. In the event of timeout, the mailbox state is
+	 * indeterminate until the next successful command submission and the
+	 * driver can get back in sync with the hardware state.
+	 */
+	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
+		u64 bg_status_reg;
+		int i;
+
+		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
+			mbox_cmd->opcode);
+
+		for (i = 0; i < mbox_cmd->poll_count; i++) {
+			int ret = wait_event_interruptible_timeout(
+				mbox_wait, cxl_mbox_background_complete(cxlds),
+				msecs_to_jiffies(mbox_cmd->poll_interval));
+			if (ret > 0)
+				break;
+
+			/* interrupted by a signal */
+			if (ret < 0)
+				return ret;
+		}
+
+		if (!cxl_mbox_background_complete(cxlds)) {
+			u64 md_status =
+				readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
+
+			cxl_cmd_err(cxlds->dev, mbox_cmd, md_status,
+				    "background timeout");
+			return -ETIMEDOUT;
+		}
+
+		bg_status_reg = readq(cxlds->regs.mbox +
+				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
+		mbox_cmd->return_code =
+			FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
+				  bg_status_reg);
+		dev_dbg(dev,
+			"Mailbox background operation (0x%04x) completed\n",
+			mbox_cmd->opcode);
+	}
+
+	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS &&
+	    mbox_cmd->return_code != CXL_MBOX_CMD_RC_BACKGROUND) {
 		dev_dbg(dev, "Mailbox operation had an error: %s\n",
 			cxl_mbox_cmd_rc2str(mbox_cmd));
 		return 0; /* completed but caller must check return_code */
@@ -224,6 +304,7 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
 	const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET);
 	unsigned long timeout;
 	u64 md_status;
+	int rc, irq;
 
 	timeout = jiffies + mbox_ready_timeout * HZ;
 	do {
@@ -272,6 +353,27 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
 	dev_dbg(cxlds->dev, "Mailbox payload sized %zu",
 		cxlds->payload_size);
 
+	if (cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ) {
+		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+
+		irq = pci_irq_vector(pdev,
+			     FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap));
+		if (irq < 0)
+			goto mbox_poll;
+
+		rc = devm_request_irq(cxlds->dev, irq, cxl_pci_mbox_irq,
+				      IRQF_SHARED, "mailbox", cxlds);
+		if (rc)
+			goto mbox_poll;
+
+		writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
+		       cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);
+
+		return 0;
+	}
+
+mbox_poll:
+	dev_dbg(cxlds->dev, "Mailbox interrupts are unsupported");
 	return 0;
 }
 
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 3/7] cxl/mbox: Add sanitation handling machinery
  2023-04-21  9:23 [PATCH v4 0/7] cxl: Background cmds and device sanitation Davidlohr Bueso
  2023-04-21  9:23 ` [PATCH 1/7] cxl/pci: Allocate irq vectors earlier in pci probe Davidlohr Bueso
  2023-04-21  9:23 ` [PATCH 2/7] cxl/mbox: Add background cmd handling machinery Davidlohr Bueso
@ 2023-04-21  9:23 ` Davidlohr Bueso
  2023-04-28 16:43   ` Dave Jiang
  2023-05-11 14:45   ` Jonathan Cameron
  2023-04-21  9:23 ` [PATCH 4/7] cxl/mem: Wire up Sanitation support Davidlohr Bueso
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 38+ messages in thread
From: Davidlohr Bueso @ 2023-04-21  9:23 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Jonathan.Cameron, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, dave, linux-cxl

Sanitation is by definition a device-monopolizing operation, and thus
the timeslicing rules for other background commands do not apply.
As such handle this special case asynchronously and return immediately.
Subsequent changes will allow completion to be pollable from userspace
via a sysfs file interface.

For devices that don't support interrupts for notifying background
command completion, self-poll with the caveat that the poller can
be out of sync with the ready hardware, and therefore care must be
taken to not allow any new commands to go through until the poller
sees the hw completion. The poller takes the mbox_mutex to stabilize
the flagging, minimizing any runtime overhead in the send path to
check for 'sanitize_tmo' for uncommon poll scenarios. This flag
also serves for sanitation (the only user of async polling) to know
when to queue work or simply rely on irqs.

The irq case is much simpler as hardware will serialize/error
appropriately.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 drivers/cxl/cxlmem.h | 16 +++++++++
 drivers/cxl/pci.c    | 79 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 8c3302fc7738..17e3ab3c641a 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -220,6 +220,18 @@ struct cxl_event_state {
 	struct mutex log_lock;
 };
 
+/**
+ * struct cxl_security_state - Device security state
+ *
+ * @sanitize_dwork: self-polling work item for sanitation
+ * @sanitize_tmo: self-polling timeout
+ */
+struct cxl_security_state {
+	/* below only used if device mbox irqs are not supported */
+	struct delayed_work sanitize_dwork;
+	int sanitize_tmo;
+};
+
 /**
  * struct cxl_dev_state - The driver device state
  *
@@ -256,6 +268,7 @@ struct cxl_event_state {
  * @serial: PCIe Device Serial Number
  * @doe_mbs: PCI DOE mailbox array
  * @event: event log driver state
+ * @sec: device security state
  * @mbox_send: @dev specific transport for transmitting mailbox commands
  *
  * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
@@ -296,6 +309,8 @@ struct cxl_dev_state {
 
 	struct cxl_event_state event;
 
+	struct cxl_security_state sec;
+
 	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
 };
 
@@ -327,6 +342,7 @@ enum cxl_opcode {
 	CXL_MBOX_OP_GET_SCAN_MEDIA_CAPS	= 0x4303,
 	CXL_MBOX_OP_SCAN_MEDIA		= 0x4304,
 	CXL_MBOX_OP_GET_SCAN_MEDIA	= 0x4305,
+	CXL_MBOX_OP_SANITIZE		= 0x4400,
 	CXL_MBOX_OP_GET_SECURITY_STATE	= 0x4500,
 	CXL_MBOX_OP_SET_PASSPHRASE	= 0x4501,
 	CXL_MBOX_OP_DISABLE_PASSPHRASE	= 0x4502,
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index aa1bb74a52a1..bdee5273af5a 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -97,6 +97,8 @@ static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
 static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
 {
 	struct cxl_dev_state *cxlds = id;
+	u64 reg;
+	u16 opcode;
 
 	/* spurious or raced with hw? */
 	if (!cxl_mbox_background_complete(cxlds)) {
@@ -107,12 +109,47 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
 		goto done;
 	}
 
-	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
-	wake_up(&mbox_wait);
+	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
+	opcode = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
+
+	if (opcode == CXL_MBOX_OP_SANITIZE) {
+		dev_dbg(cxlds->dev, "Sanitation operation ended\n");
+	} else {
+		/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
+		wake_up(&mbox_wait);
+	}
 done:
 	return IRQ_HANDLED;
 }
 
+/*
+ * Sanitation operation polling mode.
+ */
+static void cxl_mbox_sanitize_work(struct work_struct *work)
+{
+	struct cxl_dev_state *cxlds;
+
+	cxlds = container_of(work, struct cxl_dev_state,
+			     sec.sanitize_dwork.work);
+
+	WARN_ON(cxlds->sec.sanitize_tmo == -1);
+
+	mutex_lock(&cxlds->mbox_mutex);
+	if (cxl_mbox_background_complete(cxlds)) {
+		cxlds->sec.sanitize_tmo = 0;
+		put_device(cxlds->dev);
+
+		dev_dbg(cxlds->dev, "Sanitation operation ended\n");
+	} else {
+		int tmo = cxlds->sec.sanitize_tmo + 10;
+
+		cxlds->sec.sanitize_tmo = min(15 * 60, tmo);
+		queue_delayed_work(system_wq,
+				   &cxlds->sec.sanitize_dwork, tmo * HZ);
+	}
+	mutex_unlock(&cxlds->mbox_mutex);
+}
+
 /**
  * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
  * @cxlds: The device state to communicate with.
@@ -173,6 +210,16 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
 		return -EBUSY;
 	}
 
+	/*
+	 * With sanitize polling, hardware might be done and the poller still
+	 * not be in sync. Ensure no new command comes in until so. Keep the
+	 * hardware semantics and only allow device health status.
+	 */
+	if (unlikely(cxlds->sec.sanitize_tmo > 0)) {
+		if (mbox_cmd->opcode != CXL_MBOX_OP_GET_HEALTH_INFO)
+			return -EBUSY;
+	}
+
 	cmd_reg = FIELD_PREP(CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK,
 			     mbox_cmd->opcode);
 	if (mbox_cmd->size_in) {
@@ -223,6 +270,27 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
 		u64 bg_status_reg;
 		int i;
 
+		/*
+		 * Sanitation is a special case which monopolizes the device
+		 * in an uninterruptible state and thus cannot be timesliced.
+		 * Return immediately instead and allow userspace to poll(2)
+		 * for completion.
+		 */
+		if (mbox_cmd->opcode == CXL_MBOX_OP_SANITIZE) {
+			if (cxlds->sec.sanitize_tmo != -1) {
+				/* give first timeout a second */
+				cxlds->sec.sanitize_tmo = 1;
+				/* hold the device throughout */
+				get_device(cxlds->dev);
+				queue_delayed_work(system_wq,
+						   &cxlds->sec.sanitize_dwork,
+						   cxlds->sec.sanitize_tmo * HZ);
+			}
+
+			dev_dbg(dev, "Sanitation operation started\n");
+			return 0;
+		}
+
 		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
 			mbox_cmd->opcode);
 
@@ -366,6 +434,9 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
 		if (rc)
 			goto mbox_poll;
 
+		/* flag that irqs are enabled */
+		cxlds->sec.sanitize_tmo = -1;
+
 		writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
 		       cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);
 
@@ -373,7 +444,11 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
 	}
 
 mbox_poll:
+	INIT_DELAYED_WORK(&cxlds->sec.sanitize_dwork,
+			  cxl_mbox_sanitize_work);
+	cxlds->sec.sanitize_tmo = 0;
 	dev_dbg(cxlds->dev, "Mailbox interrupts are unsupported");
+
 	return 0;
 }
 
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 4/7] cxl/mem: Wire up Sanitation support
  2023-04-21  9:23 [PATCH v4 0/7] cxl: Background cmds and device sanitation Davidlohr Bueso
                   ` (2 preceding siblings ...)
  2023-04-21  9:23 ` [PATCH 3/7] cxl/mbox: Add sanitation " Davidlohr Bueso
@ 2023-04-21  9:23 ` Davidlohr Bueso
  2023-04-21 20:04   ` kernel test robot
                     ` (2 more replies)
  2023-04-21  9:23 ` [PATCH 5/7] cxl/test: Add Sanitize opcode support Davidlohr Bueso
                   ` (3 subsequent siblings)
  7 siblings, 3 replies; 38+ messages in thread
From: Davidlohr Bueso @ 2023-04-21  9:23 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Jonathan.Cameron, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, dave, linux-cxl

Implement support for CXL 3.0 8.2.9.8.5.1 Sanitize. This is done by
adding a security/sanitize' memdev sysfs file, which is poll(2)-capable
for completion. Unlike all other background commands, this is the
only operation that is special and monopolizes the device for long
periods of time.

In addition to the traditional pmem security requirements, all regions
must also be offline in order to perform the operation. This permits
avoiding explicit global CPU cache management, relying instead on
attach_target() setting CXL_REGION_F_INCOHERENT upon reconnect.

The expectation is that userspace can use it such as:

    cxl disable-memdev memX
    echo 1 > /sys/bus/cxl/devices/memX/security/sanitize
    cxl wait-sanitize memX
    cxl enable-memdev memX

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 Documentation/ABI/testing/sysfs-bus-cxl | 19 ++++++
 drivers/cxl/core/mbox.c                 | 56 ++++++++++++++++
 drivers/cxl/core/memdev.c               | 86 +++++++++++++++++++++++++
 drivers/cxl/cxlmem.h                    |  4 ++
 drivers/cxl/pci.c                       |  5 ++
 5 files changed, 170 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 3acf2f17a73f..2e98ec9220ca 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -58,6 +58,25 @@ Description:
 		affinity for this device.
 
 
+What:           /sys/bus/cxl/devices/memX/security/sanitize
+Date:           May, 2023
+KernelVersion:  v6.5
+Contact:        linux-cxl@vger.kernel.org
+Description:
+		(RW) Write a boolean 'true' string value to this attribute to
+		sanitize the device to securely re-purpose or decommission it.
+		This is done by ensuring that all user data and meta-data,
+		whether it resides in persistent capacity, volatile capacity,
+		or the LSA, is made permanently unavailable by whatever means
+		is appropriate for the media type. This functionality requires
+		the device to be not be actively decoding any HPA ranges.
+
+		Reading this file shows either "disabled" when not running, or
+		"sanitize" during the duration of the sanitize operation. This
+		sysfs entry is select/poll capable from userspace to notify upon
+		completion.
+
+
 What:		/sys/bus/cxl/devices/*/devtype
 Date:		June, 2021
 KernelVersion:	v5.14
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index cde7270c6037..28daf7dcdec4 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1021,6 +1021,62 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_dev_state_identify, CXL);
 
+/**
+ * cxl_mem_sanitize() - Send a sanitation command to the device.
+ * @cxlds: The device data for the operation
+ * @cmd: The specific sanitation command opcode
+ *
+ * Return: 0 if the command was executed successfully, regardless of
+ * whether or not the actual security operation is done in the background,
+ * such as for the Sanitize case.
+ * Error return values can be the result of the mailbox command, -EINVAL
+ * when security requirements are not met or invalid contexts, or -EBUSY
+ * if the device is not offline.
+ *
+ * See CXL 3.0 @8.2.9.8.5.1 Sanitize and @8.2.9.8.5.2 Secure Erase.
+ */
+int cxl_mem_sanitize(struct cxl_dev_state *cxlds, u16 cmd)
+{
+	int rc;
+	u32 sec_out = 0;
+	struct cxl_get_security_output {
+		__le32 flags;
+	} out;
+	struct cxl_mbox_cmd sec_cmd = {
+		.opcode = CXL_MBOX_OP_GET_SECURITY_STATE,
+		.payload_out = &out,
+		.size_out = sizeof(out),
+	};
+	struct cxl_mbox_cmd mbox_cmd = { .opcode = cmd };
+
+	if (cmd != CXL_MBOX_OP_SANITIZE)
+		return -EINVAL;
+
+	rc = cxl_internal_send_cmd(cxlds, &sec_cmd);
+	if (rc < 0) {
+		dev_err(cxlds->dev, "Failed to get security state : %d", rc);
+		return rc;
+	}
+
+	/*
+	 * Prior to using these commands, any security applied to
+	 * the user data areas of the device shall be DISABLED (or
+	 * UNLOCKED for secure erase case).
+	 */
+	sec_out = le32_to_cpu(out.flags);
+	if (sec_out & CXL_PMEM_SEC_STATE_USER_PASS_SET)
+		return -EINVAL;
+
+	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
+	if (rc < 0) {
+		dev_err(cxlds->dev, "Failed to sanitize device : %d", rc);
+		return rc;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_sanitize, CXL);
+
 static int add_dpa_res(struct device *dev, struct resource *parent,
 		       struct resource *res, resource_size_t start,
 		       resource_size_t size, const char *type)
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 28a05f2fe32d..70e7158826c9 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -89,6 +89,55 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr,
 static struct device_attribute dev_attr_pmem_size =
 	__ATTR(size, 0444, pmem_size_show, NULL);
 
+static ssize_t security_sanitize_show(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	u64 reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
+	u32 pct = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg);
+	u16 cmd = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
+
+	if (cmd == CXL_MBOX_OP_SANITIZE && pct != 100)
+		return sysfs_emit(buf, "sanitize\n");
+	else
+		return sysfs_emit(buf, "disabled\n");
+}
+
+static ssize_t security_sanitize_store(struct device *dev,
+				       struct device_attribute *attr,
+				       const char *buf, size_t len)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	ssize_t rc;
+	bool sanitize;
+
+	rc = kstrtobool(buf, &sanitize);
+	if (rc)
+		return rc;
+
+	if (sanitize) {
+		struct cxl_port *port = dev_get_drvdata(&cxlmd->dev);
+
+		if (!port || !is_cxl_endpoint(port))
+			return -EINVAL;
+		/* ensure no regions are mapped to this memdev */
+		if (port->commit_end != -1)
+			return -EBUSY;
+
+		rc = cxl_mem_sanitize(cxlds, CXL_MBOX_OP_SANITIZE);
+	}
+
+	if (rc == 0)
+		rc = len;
+	return rc;
+}
+
+static struct device_attribute dev_attr_security_sanitize =
+	__ATTR(sanitize, 0644,
+	       security_sanitize_show, security_sanitize_store);
+
 static ssize_t serial_show(struct device *dev, struct device_attribute *attr,
 			   char *buf)
 {
@@ -148,10 +197,21 @@ static struct attribute_group cxl_memdev_pmem_attribute_group = {
 	.attrs = cxl_memdev_pmem_attributes,
 };
 
+static struct attribute *cxl_memdev_security_attributes[] = {
+	&dev_attr_security_sanitize.attr,
+	NULL,
+};
+
+static struct attribute_group cxl_memdev_security_attribute_group = {
+	.name = "security",
+	.attrs = cxl_memdev_security_attributes,
+};
+
 static const struct attribute_group *cxl_memdev_attribute_groups[] = {
 	&cxl_memdev_attribute_group,
 	&cxl_memdev_ram_attribute_group,
 	&cxl_memdev_pmem_attribute_group,
+	&cxl_memdev_security_attribute_group,
 	NULL,
 };
 
@@ -324,11 +384,19 @@ static const struct file_operations cxl_memdev_fops = {
 	.llseek = noop_llseek,
 };
 
+static void put_sanitize(void *data)
+{
+	struct cxl_dev_state *cxlds = data;
+
+	sysfs_put(cxlds->sec.sanitize_state);
+}
+
 struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
 {
 	struct cxl_memdev *cxlmd;
 	struct device *dev;
 	struct cdev *cdev;
+	struct kernfs_node *sec;
 	int rc;
 
 	cxlmd = cxl_memdev_alloc(cxlds, &cxl_memdev_fops);
@@ -355,6 +423,24 @@ struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
 	rc = devm_add_action_or_reset(cxlds->dev, cxl_memdev_unregister, cxlmd);
 	if (rc)
 		return ERR_PTR(rc);
+
+	sec = sysfs_get_dirent(dev->kobj.sd, "security");
+	if (!sec) {
+		dev_err(dev, "sysfs_get_dirent 'security' failed\n");
+		rc = -ENODEV;
+		goto err;
+	}
+	cxlds->sec.sanitize_state = sysfs_get_dirent(sec, "sanitize");
+	sysfs_put(sec);
+	if (!cxlds->sec.sanitize_state) {
+		dev_err(dev, "sysfs_get_dirent 'sanitize' failed\n");
+		rc = -ENODEV;
+		goto err;
+	}
+	rc = devm_add_action_or_reset(cxlds->dev, put_sanitize, cxlds);
+	if (rc)
+		return ERR_PTR(rc);
+
 	return cxlmd;
 
 err:
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 17e3ab3c641a..9bd33cfdc0ec 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -223,10 +223,12 @@ struct cxl_event_state {
 /**
  * struct cxl_security_state - Device security state
  *
+ * @sanitize_state: sanitation sysfs file to notify
  * @sanitize_dwork: self-polling work item for sanitation
  * @sanitize_tmo: self-polling timeout
  */
 struct cxl_security_state {
+	struct kernfs_node *sanitize_state;
 	/* below only used if device mbox irqs are not supported */
 	struct delayed_work sanitize_dwork;
 	int sanitize_tmo;
@@ -642,6 +644,8 @@ static inline void cxl_mem_active_dec(void)
 }
 #endif
 
+int cxl_mem_sanitize(struct cxl_dev_state *cxlds, u16 cmd);
+
 struct cxl_hdm {
 	struct cxl_component_regs regs;
 	unsigned int decoder_count;
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index bdee5273af5a..2bc3b595f270 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -113,6 +113,9 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
 	opcode = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
 
 	if (opcode == CXL_MBOX_OP_SANITIZE) {
+		if (cxlds->sec.sanitize_state)
+			sysfs_notify_dirent(cxlds->sec.sanitize_state);
+
 		dev_dbg(cxlds->dev, "Sanitation operation ended\n");
 	} else {
 		/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
@@ -138,6 +141,8 @@ static void cxl_mbox_sanitize_work(struct work_struct *work)
 	if (cxl_mbox_background_complete(cxlds)) {
 		cxlds->sec.sanitize_tmo = 0;
 		put_device(cxlds->dev);
+		if (cxlds->sec.sanitize_state)
+			sysfs_notify_dirent(cxlds->sec.sanitize_state);
 
 		dev_dbg(cxlds->dev, "Sanitation operation ended\n");
 	} else {
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 5/7] cxl/test: Add Sanitize opcode support
  2023-04-21  9:23 [PATCH v4 0/7] cxl: Background cmds and device sanitation Davidlohr Bueso
                   ` (3 preceding siblings ...)
  2023-04-21  9:23 ` [PATCH 4/7] cxl/mem: Wire up Sanitation support Davidlohr Bueso
@ 2023-04-21  9:23 ` Davidlohr Bueso
  2023-05-11 15:09   ` Jonathan Cameron
  2023-04-21  9:23 ` [PATCH 6/7] cxl/mem: Support Secure Erase Davidlohr Bueso
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 38+ messages in thread
From: Davidlohr Bueso @ 2023-04-21  9:23 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Jonathan.Cameron, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, dave, linux-cxl

Add support to emulate the "Sanitize" operation, without
incurring in the background.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 tools/testing/cxl/test/mem.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 9263b04d35f7..d4466cb27947 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -497,6 +497,28 @@ static int mock_partition_info(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
+static int mock_sanitize(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(cxlds->dev);
+
+	if (cmd->size_in != 0)
+		return -EINVAL;
+
+	if (cmd->size_out != 0)
+		return -EINVAL;
+
+	if (mdata->security_state & CXL_PMEM_SEC_STATE_USER_PASS_SET) {
+		cmd->return_code = CXL_MBOX_CMD_RC_SECURITY;
+		return -ENXIO;
+	}
+	if (mdata->security_state & CXL_PMEM_SEC_STATE_LOCKED) {
+		cmd->return_code = CXL_MBOX_CMD_RC_SECURITY;
+		return -ENXIO;
+	}
+
+	return 0; /* assume less than 2 secs, no bg */
+}
+
 static int mock_get_security_state(struct cxl_dev_state *cxlds,
 				   struct cxl_mbox_cmd *cmd)
 {
@@ -924,6 +946,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
 	case CXL_MBOX_OP_GET_HEALTH_INFO:
 		rc = mock_health_info(cxlds, cmd);
 		break;
+	case CXL_MBOX_OP_SANITIZE:
+		rc = mock_sanitize(cxlds, cmd);
+		break;
 	case CXL_MBOX_OP_GET_SECURITY_STATE:
 		rc = mock_get_security_state(cxlds, cmd);
 		break;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 6/7] cxl/mem: Support Secure Erase
  2023-04-21  9:23 [PATCH v4 0/7] cxl: Background cmds and device sanitation Davidlohr Bueso
                   ` (4 preceding siblings ...)
  2023-04-21  9:23 ` [PATCH 5/7] cxl/test: Add Sanitize opcode support Davidlohr Bueso
@ 2023-04-21  9:23 ` Davidlohr Bueso
  2023-05-11 15:10   ` Jonathan Cameron
  2023-04-21  9:23 ` [PATCH 7/7] cxl/test: Add Secure Erase opcode support Davidlohr Bueso
  2023-04-23  2:05 ` [PATCH v4 0/7] cxl: Background cmds and device sanitation Davidlohr Bueso
  7 siblings, 1 reply; 38+ messages in thread
From: Davidlohr Bueso @ 2023-04-21  9:23 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Jonathan.Cameron, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, dave, linux-cxl

Implement support for the non-pmem exclusive secure erase, per
CXL specs. Create a write-only 'security/erase' sysfs file to
perform the requested operation.

As with the sanitation this requires the device being offline
and thus no active HPA-DPA decoding.

The expectation is that userspace can use it such as:

	cxl disable-memdev memX
	echo 1 > /sys/bus/cxl/devices/memX/security/erase
	cxl enable-memdev memX

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 Documentation/ABI/testing/sysfs-bus-cxl | 10 ++++++++
 drivers/cxl/core/mbox.c                 |  6 ++++-
 drivers/cxl/core/memdev.c               | 34 +++++++++++++++++++++++++
 drivers/cxl/cxlmem.h                    |  1 +
 4 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 2e98ec9220ca..af7b603faf77 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -77,6 +77,16 @@ Description:
 		completion.
 
 
+What            /sys/bus/cxl/devices/memX/security/erase
+Date:           May, 2023
+KernelVersion:  v6.5
+Contact:        linux-cxl@vger.kernel.org
+Description:
+		(WO) Write a boolean 'true' string value to this attribute to
+		secure erase user data by changing the media encryption keys for
+		all user data areas of the device.
+
+
 What:		/sys/bus/cxl/devices/*/devtype
 Date:		June, 2021
 KernelVersion:	v5.14
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 28daf7dcdec4..a2180f3e09eb 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1049,7 +1049,7 @@ int cxl_mem_sanitize(struct cxl_dev_state *cxlds, u16 cmd)
 	};
 	struct cxl_mbox_cmd mbox_cmd = { .opcode = cmd };
 
-	if (cmd != CXL_MBOX_OP_SANITIZE)
+	if (cmd != CXL_MBOX_OP_SANITIZE && cmd != CXL_MBOX_OP_SECURE_ERASE)
 		return -EINVAL;
 
 	rc = cxl_internal_send_cmd(cxlds, &sec_cmd);
@@ -1067,6 +1067,10 @@ int cxl_mem_sanitize(struct cxl_dev_state *cxlds, u16 cmd)
 	if (sec_out & CXL_PMEM_SEC_STATE_USER_PASS_SET)
 		return -EINVAL;
 
+	if (cmd == CXL_MBOX_OP_SECURE_ERASE &&
+	    sec_out & CXL_PMEM_SEC_STATE_LOCKED)
+		return -EINVAL;
+
 	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
 	if (rc < 0) {
 		dev_err(cxlds->dev, "Failed to sanitize device : %d", rc);
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 70e7158826c9..6406e8e47da2 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -138,6 +138,39 @@ static struct device_attribute dev_attr_security_sanitize =
 	__ATTR(sanitize, 0644,
 	       security_sanitize_show, security_sanitize_store);
 
+static ssize_t security_erase_store(struct device *dev,
+				    struct device_attribute *attr,
+				    const char *buf, size_t len)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	ssize_t rc;
+	bool erase;
+
+	rc = kstrtobool(buf, &erase);
+	if (rc)
+		return rc;
+
+	if (erase) {
+		struct cxl_port *port = dev_get_drvdata(&cxlmd->dev);
+
+		if (!port || !is_cxl_endpoint(port))
+			return -EINVAL;
+		/* ensure no regions are mapped to this memdev */
+		if (port->commit_end != -1)
+			return -EBUSY;
+
+		rc = cxl_mem_sanitize(cxlds, CXL_MBOX_OP_SECURE_ERASE);
+	}
+
+	if (rc == 0)
+		rc = len;
+	return rc;
+}
+
+static struct device_attribute dev_attr_security_erase =
+	__ATTR(erase, 0200, NULL, security_erase_store);
+
 static ssize_t serial_show(struct device *dev, struct device_attribute *attr,
 			   char *buf)
 {
@@ -199,6 +232,7 @@ static struct attribute_group cxl_memdev_pmem_attribute_group = {
 
 static struct attribute *cxl_memdev_security_attributes[] = {
 	&dev_attr_security_sanitize.attr,
+	&dev_attr_security_erase.attr,
 	NULL,
 };
 
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 9bd33cfdc0ec..f8b513e70c21 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -345,6 +345,7 @@ enum cxl_opcode {
 	CXL_MBOX_OP_SCAN_MEDIA		= 0x4304,
 	CXL_MBOX_OP_GET_SCAN_MEDIA	= 0x4305,
 	CXL_MBOX_OP_SANITIZE		= 0x4400,
+	CXL_MBOX_OP_SECURE_ERASE		= 0x4401,
 	CXL_MBOX_OP_GET_SECURITY_STATE	= 0x4500,
 	CXL_MBOX_OP_SET_PASSPHRASE	= 0x4501,
 	CXL_MBOX_OP_DISABLE_PASSPHRASE	= 0x4502,
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 7/7] cxl/test: Add Secure Erase opcode support
  2023-04-21  9:23 [PATCH v4 0/7] cxl: Background cmds and device sanitation Davidlohr Bueso
                   ` (5 preceding siblings ...)
  2023-04-21  9:23 ` [PATCH 6/7] cxl/mem: Support Secure Erase Davidlohr Bueso
@ 2023-04-21  9:23 ` Davidlohr Bueso
  2023-05-11 15:10   ` Jonathan Cameron
  2023-04-23  2:05 ` [PATCH v4 0/7] cxl: Background cmds and device sanitation Davidlohr Bueso
  7 siblings, 1 reply; 38+ messages in thread
From: Davidlohr Bueso @ 2023-04-21  9:23 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Jonathan.Cameron, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, dave, linux-cxl

Add support to emulate the CXL the "Secure Erase" operation.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 tools/testing/cxl/test/mem.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index d4466cb27947..8a22a4e592c6 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -519,6 +519,30 @@ static int mock_sanitize(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 	return 0; /* assume less than 2 secs, no bg */
 }
 
+static int mock_secure_erase(struct cxl_dev_state *cxlds,
+			     struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(cxlds->dev);
+
+	if (cmd->size_in != 0)
+		return -EINVAL;
+
+	if (cmd->size_out != 0)
+		return -EINVAL;
+
+	if (mdata->security_state & CXL_PMEM_SEC_STATE_USER_PASS_SET) {
+		cmd->return_code = CXL_MBOX_CMD_RC_SECURITY;
+		return -ENXIO;
+	}
+
+	if (mdata->security_state & CXL_PMEM_SEC_STATE_LOCKED) {
+		cmd->return_code = CXL_MBOX_CMD_RC_SECURITY;
+		return -ENXIO;
+	}
+
+	return 0;
+}
+
 static int mock_get_security_state(struct cxl_dev_state *cxlds,
 				   struct cxl_mbox_cmd *cmd)
 {
@@ -949,6 +973,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
 	case CXL_MBOX_OP_SANITIZE:
 		rc = mock_sanitize(cxlds, cmd);
 		break;
+	case CXL_MBOX_OP_SECURE_ERASE:
+		rc = mock_secure_erase(cxlds, cmd);
+		break;
 	case CXL_MBOX_OP_GET_SECURITY_STATE:
 		rc = mock_get_security_state(cxlds, cmd);
 		break;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/7] cxl/mem: Wire up Sanitation support
  2023-04-21  9:23 ` [PATCH 4/7] cxl/mem: Wire up Sanitation support Davidlohr Bueso
@ 2023-04-21 20:04   ` kernel test robot
  2023-04-21 20:24   ` kernel test robot
  2023-05-11 15:07   ` Jonathan Cameron
  2 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2023-04-21 20:04 UTC (permalink / raw)
  To: Davidlohr Bueso, dan.j.williams
  Cc: llvm, oe-kbuild-all, Jonathan.Cameron, dave.jiang,
	alison.schofield, ira.weiny, vishal.l.verma, fan.ni, a.manzanares,
	dave, linux-cxl

Hi Davidlohr,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.3-rc7 next-20230420]
[cannot apply to cxl/next cxl/pending]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Davidlohr-Bueso/cxl-pci-Allocate-irq-vectors-earlier-in-pci-probe/20230421-175725
base:   linus/master
patch link:    https://lore.kernel.org/r/20230421092321.12741-5-dave%40stgolabs.net
patch subject: [PATCH 4/7] cxl/mem: Wire up Sanitation support
config: powerpc-buildonly-randconfig-r006-20230421 (https://download.01.org/0day-ci/archive/20230422/202304220358.PInFhQmP-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project 437b7602e4a998220871de78afcb020b9c14a661)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install powerpc cross compiling tool for clang build
        # apt-get install binutils-powerpc-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/b5227afb40297993eba355a804720c834da3fe2a
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Davidlohr-Bueso/cxl-pci-Allocate-irq-vectors-earlier-in-pci-probe/20230421-175725
        git checkout b5227afb40297993eba355a804720c834da3fe2a
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=powerpc olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash drivers/cxl/core/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304220358.PInFhQmP-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/cxl/core/memdev.c:97:12: error: call to undeclared function 'readq'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           u64 reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
                     ^
   1 error generated.


vim +/readq +97 drivers/cxl/core/memdev.c

    88	
    89	static struct device_attribute dev_attr_pmem_size =
    90		__ATTR(size, 0444, pmem_size_show, NULL);
    91	
    92	static ssize_t security_sanitize_show(struct device *dev,
    93					      struct device_attribute *attr, char *buf)
    94	{
    95		struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
    96		struct cxl_dev_state *cxlds = cxlmd->cxlds;
  > 97		u64 reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
    98		u32 pct = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg);
    99		u16 cmd = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
   100	
   101		if (cmd == CXL_MBOX_OP_SANITIZE && pct != 100)
   102			return sysfs_emit(buf, "sanitize\n");
   103		else
   104			return sysfs_emit(buf, "disabled\n");
   105	}
   106	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/7] cxl/mem: Wire up Sanitation support
  2023-04-21  9:23 ` [PATCH 4/7] cxl/mem: Wire up Sanitation support Davidlohr Bueso
  2023-04-21 20:04   ` kernel test robot
@ 2023-04-21 20:24   ` kernel test robot
  2023-05-11 15:07   ` Jonathan Cameron
  2 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2023-04-21 20:24 UTC (permalink / raw)
  To: Davidlohr Bueso, dan.j.williams
  Cc: llvm, oe-kbuild-all, Jonathan.Cameron, dave.jiang,
	alison.schofield, ira.weiny, vishal.l.verma, fan.ni, a.manzanares,
	dave, linux-cxl

Hi Davidlohr,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.3-rc7 next-20230420]
[cannot apply to cxl/next cxl/pending]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Davidlohr-Bueso/cxl-pci-Allocate-irq-vectors-earlier-in-pci-probe/20230421-175725
base:   linus/master
patch link:    https://lore.kernel.org/r/20230421092321.12741-5-dave%40stgolabs.net
patch subject: [PATCH 4/7] cxl/mem: Wire up Sanitation support
config: i386-randconfig-a013 (https://download.01.org/0day-ci/archive/20230422/202304220436.O3l806d0-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/b5227afb40297993eba355a804720c834da3fe2a
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Davidlohr-Bueso/cxl-pci-Allocate-irq-vectors-earlier-in-pci-probe/20230421-175725
        git checkout b5227afb40297993eba355a804720c834da3fe2a
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/cxl/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304220436.O3l806d0-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/cxl/core/memdev.c:97:12: error: implicit declaration of function 'readq' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
           u64 reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
                     ^
   1 error generated.


vim +/readq +97 drivers/cxl/core/memdev.c

    88	
    89	static struct device_attribute dev_attr_pmem_size =
    90		__ATTR(size, 0444, pmem_size_show, NULL);
    91	
    92	static ssize_t security_sanitize_show(struct device *dev,
    93					      struct device_attribute *attr, char *buf)
    94	{
    95		struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
    96		struct cxl_dev_state *cxlds = cxlmd->cxlds;
  > 97		u64 reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
    98		u32 pct = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg);
    99		u16 cmd = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
   100	
   101		if (cmd == CXL_MBOX_OP_SANITIZE && pct != 100)
   102			return sysfs_emit(buf, "sanitize\n");
   103		else
   104			return sysfs_emit(buf, "disabled\n");
   105	}
   106	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 0/7] cxl: Background cmds and device sanitation
  2023-04-21  9:23 [PATCH v4 0/7] cxl: Background cmds and device sanitation Davidlohr Bueso
                   ` (6 preceding siblings ...)
  2023-04-21  9:23 ` [PATCH 7/7] cxl/test: Add Secure Erase opcode support Davidlohr Bueso
@ 2023-04-23  2:05 ` Davidlohr Bueso
  7 siblings, 0 replies; 38+ messages in thread
From: Davidlohr Bueso @ 2023-04-23  2:05 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Jonathan.Cameron, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Fri, 21 Apr 2023, Davidlohr Bueso wrote:

>Testing.
>========
>
>o There are the mock device tests for Sanitize and Secure Erase.
>
>o The latest (v2) qemu bg/sanitize support series is posted here:
>	https://lore.kernel.org/linux-cxl/20230418172337.19207-1-dave@stgolabs.net/

fyi here's the support for the cxl-tool cli
     https://github.com/davidlohr/ndctl/tree/cxl-memdev-sanitation-v1

... which is also posted on the list:
     https://lore.kernel.org/linux-cxl/20230423015920.11384-1-dave@stgolabs.net/

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] cxl/mbox: Add background cmd handling machinery
  2023-04-21  9:23 ` [PATCH 2/7] cxl/mbox: Add background cmd handling machinery Davidlohr Bueso
@ 2023-04-23  7:54   ` Li, Ming
  2023-04-23 20:51     ` Davidlohr Bueso
  2023-04-28 16:21   ` Dave Jiang
  2023-05-11 14:23   ` Jonathan Cameron
  2 siblings, 1 reply; 38+ messages in thread
From: Li, Ming @ 2023-04-23  7:54 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Jonathan.Cameron, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl, dan.j.williams

On 4/21/2023 5:23 PM, Davidlohr Bueso wrote:
> This adds support for handling background operations, as defined in
> the CXL 3.0 spec. Commands that can take too long (over ~2 seconds)
> can run in the background asynchronously (to the hardware).
> 
> The driver will deal with such commands synchronously, blocking all
> other incoming commands for a specified period of time, allowing
> time-slicing the command such that the caller can send incremental
> requests to avoid monopolizing the driver/device. This approach
> makes the code simpler, where any out of sync (timeout) between the
> driver and hardware is just disregarded as an invalid state until
> the next successful submission.
> 
> On devices where mbox interrupts are supported, this will still use
> a poller that will wakeup in the specified wait intervals. The irq
> handler will simply awake a blocked cmd, which is also safe vs a
> task that is either waking (timing out) or already awoken. Similarly
> any irq setup error during the probing falls back to polling, thus
> avoids unnecessarily erroring out.
> 
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---

......

> +static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
> +{
> +	u64 reg;
> +
> +	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> +	return FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg) == 100;
> +}

should using a MACRO to define '100' be better?

> +
> +static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
> +{
> +	struct cxl_dev_state *cxlds = id;
> +
> +	/* spurious or raced with hw? */
> +	if (!cxl_mbox_background_complete(cxlds)) {
> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> +
> +		dev_warn(&pdev->dev,
> +			 "Mailbox background operation IRQ but incomplete\n");
> +		goto done;
> +	}
> +
> +	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
> +	wake_up(&mbox_wait);
> +done:
> +	return IRQ_HANDLED;
> +}
> +
>  /**
>   * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
>   * @cxlds: The device state to communicate with.
> @@ -178,7 +206,59 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
>  	mbox_cmd->return_code =
>  		FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
>  
> -	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
> +	/*
> +	 * Handle the background command in a synchronous manner.
> +	 *
> +	 * All other mailbox commands will serialize/queue on the mbox_mutex,
> +	 * which we currently hold. Furthermore this also guarantees that
> +	 * cxl_mbox_background_complete() checks are safe amongst each other,
> +	 * in that no new bg operation can occur in between.
> +	 *
> +	 * Background operations are timesliced in accordance with the nature
> +	 * of the command. In the event of timeout, the mailbox state is
> +	 * indeterminate until the next successful command submission and the
> +	 * driver can get back in sync with the hardware state.
> +	 */
> +	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
> +		u64 bg_status_reg;
> +		int i;
> +
> +		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
> +			mbox_cmd->opcode);
> +
> +		for (i = 0; i < mbox_cmd->poll_count; i++) {
> +			int ret = wait_event_interruptible_timeout(
> +				mbox_wait, cxl_mbox_background_complete(cxlds),
> +				msecs_to_jiffies(mbox_cmd->poll_interval));
> +			if (ret > 0)
> +				break;
> +
> +			/* interrupted by a signal */
> +			if (ret < 0)
> +				return ret;
> +		}
> +
> +		if (!cxl_mbox_background_complete(cxlds)) {
> +			u64 md_status =
> +				readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> +
> +			cxl_cmd_err(cxlds->dev, mbox_cmd, md_status,
> +				    "background timeout");
> +			return -ETIMEDOUT;
> +		}
> +
> +		bg_status_reg = readq(cxlds->regs.mbox +
> +				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> +		mbox_cmd->return_code =
> +			FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
> +				  bg_status_reg);
> +		dev_dbg(dev,
> +			"Mailbox background operation (0x%04x) completed\n",
> +			mbox_cmd->opcode);
> +	}
> +
> +	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS &&
> +	    mbox_cmd->return_code != CXL_MBOX_CMD_RC_BACKGROUND) {
>  		dev_dbg(dev, "Mailbox operation had an error: %s\n",
>  			cxl_mbox_cmd_rc2str(mbox_cmd));
>  		return 0; /* completed but caller must check return_code */

why does here only handle failure cases for non-background command? Maybe I missed something, I think that we need to do same thing here for background command.

Thanks
Ming

> @@ -224,6 +304,7 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>  	const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET);
>  	unsigned long timeout;
>  	u64 md_status;
> +	int rc, irq;
>  
>  	timeout = jiffies + mbox_ready_timeout * HZ;
>  	do {
> @@ -272,6 +353,27 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>  	dev_dbg(cxlds->dev, "Mailbox payload sized %zu",
>  		cxlds->payload_size);
>  
> +	if (cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ) {
> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> +
> +		irq = pci_irq_vector(pdev,
> +			     FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap));
> +		if (irq < 0)
> +			goto mbox_poll;
> +
> +		rc = devm_request_irq(cxlds->dev, irq, cxl_pci_mbox_irq,
> +				      IRQF_SHARED, "mailbox", cxlds);
> +		if (rc)
> +			goto mbox_poll;
> +
> +		writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
> +		       cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);
> +
> +		return 0;
> +	}
> +
> +mbox_poll:
> +	dev_dbg(cxlds->dev, "Mailbox interrupts are unsupported");
>  	return 0;
>  }
>  


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] cxl/mbox: Add background cmd handling machinery
  2023-04-23  7:54   ` Li, Ming
@ 2023-04-23 20:51     ` Davidlohr Bueso
  0 siblings, 0 replies; 38+ messages in thread
From: Davidlohr Bueso @ 2023-04-23 20:51 UTC (permalink / raw)
  To: Li, Ming
  Cc: Jonathan.Cameron, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl, dan.j.williams

On Sun, 23 Apr 2023, Li, Ming wrote:

>On 4/21/2023 5:23 PM, Davidlohr Bueso wrote:
>> +static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
>> +{
>> +	u64 reg;
>> +
>> +	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
>> +	return FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg) == 100;
>> +}
>
>should using a MACRO to define '100' be better?

Given that an abstraction is already being provided, this feels like an
overkill. Plus pct == 100 is pretty self descriptive.

>> +
>> +static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>> +{
>> +	struct cxl_dev_state *cxlds = id;
>> +
>> +	/* spurious or raced with hw? */
>> +	if (!cxl_mbox_background_complete(cxlds)) {

While at it, this probably wants to be unlikely().

>> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
>> +
>> +		dev_warn(&pdev->dev,
>> +			 "Mailbox background operation IRQ but incomplete\n");
>> +		goto done;
>> +	}
>> +
>> +	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
>> +	wake_up(&mbox_wait);
>> +done:
>> +	return IRQ_HANDLED;
>> +}
>> +
>>  /**
>>   * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
>>   * @cxlds: The device state to communicate with.
>> @@ -178,7 +206,59 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
>>	mbox_cmd->return_code =
>>		FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
>>
>> -	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
>> +	/*
>> +	 * Handle the background command in a synchronous manner.
>> +	 *
>> +	 * All other mailbox commands will serialize/queue on the mbox_mutex,
>> +	 * which we currently hold. Furthermore this also guarantees that
>> +	 * cxl_mbox_background_complete() checks are safe amongst each other,
>> +	 * in that no new bg operation can occur in between.
>> +	 *
>> +	 * Background operations are timesliced in accordance with the nature
>> +	 * of the command. In the event of timeout, the mailbox state is
>> +	 * indeterminate until the next successful command submission and the
>> +	 * driver can get back in sync with the hardware state.
>> +	 */
>> +	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
>> +		u64 bg_status_reg;
>> +		int i;
>> +
>> +		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
>> +			mbox_cmd->opcode);
>> +
>> +		for (i = 0; i < mbox_cmd->poll_count; i++) {
>> +			int ret = wait_event_interruptible_timeout(
>> +				mbox_wait, cxl_mbox_background_complete(cxlds),
>> +				msecs_to_jiffies(mbox_cmd->poll_interval));
>> +			if (ret > 0)
>> +				break;
>> +
>> +			/* interrupted by a signal */
>> +			if (ret < 0)
>> +				return ret;
>> +		}
>> +
>> +		if (!cxl_mbox_background_complete(cxlds)) {
>> +			u64 md_status =
>> +				readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
>> +
>> +			cxl_cmd_err(cxlds->dev, mbox_cmd, md_status,
>> +				    "background timeout");
>> +			return -ETIMEDOUT;
>> +		}
>> +
>> +		bg_status_reg = readq(cxlds->regs.mbox +
>> +				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
>> +		mbox_cmd->return_code =
>> +			FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
>> +				  bg_status_reg);
>> +		dev_dbg(dev,
>> +			"Mailbox background operation (0x%04x) completed\n",
>> +			mbox_cmd->opcode);
>> +	}
>> +
>> +	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS &&
>> +	    mbox_cmd->return_code != CXL_MBOX_CMD_RC_BACKGROUND) {
>>		dev_dbg(dev, "Mailbox operation had an error: %s\n",
>>			cxl_mbox_cmd_rc2str(mbox_cmd));
>>		return 0; /* completed but caller must check return_code */
>
>why does here only handle failure cases for non-background command? Maybe I missed something, I think that we need to do same thing here for background command.

Good point. Checking for background rc here is bogus and confusing because
this is a synchronous path and will never be true. I'll get rid of it, while
harmless it is semantically wrong. The same check is however necessary for
sanitation later in the cxl_internal_send_cmd() layer.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/7] cxl/pci: Allocate irq vectors earlier in pci probe
  2023-04-21  9:23 ` [PATCH 1/7] cxl/pci: Allocate irq vectors earlier in pci probe Davidlohr Bueso
@ 2023-04-28 16:09   ` Dave Jiang
  2023-05-11 13:55   ` Jonathan Cameron
  1 sibling, 0 replies; 38+ messages in thread
From: Dave Jiang @ 2023-04-28 16:09 UTC (permalink / raw)
  To: Davidlohr Bueso, dan.j.williams
  Cc: Jonathan.Cameron, alison.schofield, ira.weiny, vishal.l.verma,
	fan.ni, a.manzanares, linux-cxl



On 4/21/23 2:23 AM, Davidlohr Bueso wrote:
> Move the cxl_alloc_irq_vectors() call further up in the probing
> in order to allow for mailbox interrupt usage. No change in
> semantics.
> 
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   drivers/cxl/pci.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 60b23624d167..39b829a29f6c 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -757,6 +757,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>   	if (rc)
>   		dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
>   
> +	rc = cxl_alloc_irq_vectors(pdev);
> +	if (rc)
> +		return rc;
> +
>   	rc = cxl_pci_setup_mailbox(cxlds);
>   	if (rc)
>   		return rc;
> @@ -777,10 +781,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>   	if (rc)
>   		return rc;
>   
> -	rc = cxl_alloc_irq_vectors(pdev);
> -	if (rc)
> -		return rc;
> -
>   	cxlmd = devm_cxl_add_memdev(cxlds);
>   	if (IS_ERR(cxlmd))
>   		return PTR_ERR(cxlmd);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] cxl/mbox: Add background cmd handling machinery
  2023-04-21  9:23 ` [PATCH 2/7] cxl/mbox: Add background cmd handling machinery Davidlohr Bueso
  2023-04-23  7:54   ` Li, Ming
@ 2023-04-28 16:21   ` Dave Jiang
  2023-04-28 17:18     ` Davidlohr Bueso
  2023-05-11 14:23   ` Jonathan Cameron
  2 siblings, 1 reply; 38+ messages in thread
From: Dave Jiang @ 2023-04-28 16:21 UTC (permalink / raw)
  To: Davidlohr Bueso, dan.j.williams
  Cc: Jonathan.Cameron, alison.schofield, ira.weiny, vishal.l.verma,
	fan.ni, a.manzanares, linux-cxl



On 4/21/23 2:23 AM, Davidlohr Bueso wrote:
> This adds support for handling background operations, as defined in
> the CXL 3.0 spec. Commands that can take too long (over ~2 seconds)
> can run in the background asynchronously (to the hardware).
> 
> The driver will deal with such commands synchronously, blocking all
> other incoming commands for a specified period of time, allowing
> time-slicing the command such that the caller can send incremental
> requests to avoid monopolizing the driver/device. This approach
> makes the code simpler, where any out of sync (timeout) between the
> driver and hardware is just disregarded as an invalid state until
> the next successful submission.
> 
> On devices where mbox interrupts are supported, this will still use
> a poller that will wakeup in the specified wait intervals. The irq
> handler will simply awake a blocked cmd, which is also safe vs a
> task that is either waking (timing out) or already awoken. Similarly
> any irq setup error during the probing falls back to polling, thus
> avoids unnecessarily erroring out.
> 
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
>   drivers/cxl/core/mbox.c |   3 +-
>   drivers/cxl/cxl.h       |   7 +++
>   drivers/cxl/cxlmem.h    |   5 ++
>   drivers/cxl/pci.c       | 104 +++++++++++++++++++++++++++++++++++++++-
>   4 files changed, 117 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 6198637cb0bb..cde7270c6037 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -180,7 +180,8 @@ int cxl_internal_send_cmd(struct cxl_dev_state *cxlds,
>   	if (rc)
>   		return rc;
>   
> -	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS)
> +	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS &&
> +	    mbox_cmd->return_code != CXL_MBOX_CMD_RC_BACKGROUND)
>   		return cxl_mbox_cmd_rc2errno(mbox_cmd);
>   
>   	if (!out_size)
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 044a92d9813e..72731a896f58 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -176,14 +176,21 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>   /* CXL 2.0 8.2.8.4 Mailbox Registers */
>   #define CXLDEV_MBOX_CAPS_OFFSET 0x00
>   #define   CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
> +#define   CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK GENMASK(10, 7)
> +#define   CXLDEV_MBOX_CAP_BG_CMD_IRQ BIT(6)
>   #define CXLDEV_MBOX_CTRL_OFFSET 0x04
>   #define   CXLDEV_MBOX_CTRL_DOORBELL BIT(0)
> +#define   CXLDEV_MBOX_CTRL_BG_CMD_IRQ BIT(2)
>   #define CXLDEV_MBOX_CMD_OFFSET 0x08
>   #define   CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)
>   #define   CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK GENMASK_ULL(36, 16)
>   #define CXLDEV_MBOX_STATUS_OFFSET 0x10
> +#define   CXLDEV_MBOX_STATUS_BG_CMD BIT(0)
>   #define   CXLDEV_MBOX_STATUS_RET_CODE_MASK GENMASK_ULL(47, 32)
>   #define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18
> +#define   CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)
> +#define   CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK GENMASK_ULL(22, 16)
> +#define   CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK GENMASK_ULL(47, 32)
>   #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
>   
>   /*
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 090acebba4fa..8c3302fc7738 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -108,6 +108,9 @@ static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
>    *            variable sized output commands, it tells the exact number of bytes
>    *            written.
>    * @min_out: (input) internal command output payload size validation
> + * @poll_count: (input)  Number of timeouts to attempt.
> + * @poll_interval: (input) Number of ms between mailbox background command
> + *                 polling intervals timeouts.
>    * @return_code: (output) Error code returned from hardware.
>    *
>    * This is the primary mechanism used to send commands to the hardware.
> @@ -123,6 +126,8 @@ struct cxl_mbox_cmd {
>   	size_t size_in;
>   	size_t size_out;
>   	size_t min_out;
> +	int poll_count;
> +	int poll_interval;
>   	u16 return_code;
>   };
>   
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 39b829a29f6c..aa1bb74a52a1 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -51,6 +51,7 @@
>   static unsigned short mbox_ready_timeout = 60;
>   module_param(mbox_ready_timeout, ushort, 0644);
>   MODULE_PARM_DESC(mbox_ready_timeout, "seconds to wait for mailbox ready");
> +static DECLARE_WAIT_QUEUE_HEAD(mbox_wait);

I apologize if I've asked this before and you've already answered. What 
is the reason the mbox_wait a module global wq instead of a per device 
wq? Just thinking when you tear down a device, you may want to wake all 
pending for that device to clean up.

>   
>   static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
>   {
> @@ -85,6 +86,33 @@ static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
>   			    status & CXLMDEV_DEV_FATAL ? " fatal" : "",        \
>   			    status & CXLMDEV_FW_HALT ? " firmware-halt" : "")
>   
> +static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
> +{
> +	u64 reg;
> +
> +	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> +	return FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg) == 100;
> +}
> +
> +static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
> +{
> +	struct cxl_dev_state *cxlds = id;
> +
> +	/* spurious or raced with hw? */
> +	if (!cxl_mbox_background_complete(cxlds)) {
> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> +
> +		dev_warn(&pdev->dev,
> +			 "Mailbox background operation IRQ but incomplete\n");
> +		goto done;
> +	}
> +
> +	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
> +	wake_up(&mbox_wait);
> +done:
> +	return IRQ_HANDLED;
> +}
> +
>   /**
>    * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
>    * @cxlds: The device state to communicate with.
> @@ -178,7 +206,59 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
>   	mbox_cmd->return_code =
>   		FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
>   
> -	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
> +	/*
> +	 * Handle the background command in a synchronous manner.
> +	 *
> +	 * All other mailbox commands will serialize/queue on the mbox_mutex,
> +	 * which we currently hold. Furthermore this also guarantees that
> +	 * cxl_mbox_background_complete() checks are safe amongst each other,
> +	 * in that no new bg operation can occur in between.
> +	 *
> +	 * Background operations are timesliced in accordance with the nature
> +	 * of the command. In the event of timeout, the mailbox state is
> +	 * indeterminate until the next successful command submission and the
> +	 * driver can get back in sync with the hardware state.
> +	 */
> +	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
> +		u64 bg_status_reg;
> +		int i;
> +
> +		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
> +			mbox_cmd->opcode);
> +
> +		for (i = 0; i < mbox_cmd->poll_count; i++) {
> +			int ret = wait_event_interruptible_timeout(
> +				mbox_wait, cxl_mbox_background_complete(cxlds),
> +				msecs_to_jiffies(mbox_cmd->poll_interval));
> +			if (ret > 0)
> +				break;
> +
> +			/* interrupted by a signal */
> +			if (ret < 0)
> +				return ret;
> +		}
> +
> +		if (!cxl_mbox_background_complete(cxlds)) {
> +			u64 md_status =
> +				readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> +
> +			cxl_cmd_err(cxlds->dev, mbox_cmd, md_status,
> +				    "background timeout");
> +			return -ETIMEDOUT;
> +		}
> +
> +		bg_status_reg = readq(cxlds->regs.mbox +
> +				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> +		mbox_cmd->return_code =
> +			FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
> +				  bg_status_reg);
> +		dev_dbg(dev,
> +			"Mailbox background operation (0x%04x) completed\n",
> +			mbox_cmd->opcode);
> +	}
> +
> +	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS &&
> +	    mbox_cmd->return_code != CXL_MBOX_CMD_RC_BACKGROUND) {
>   		dev_dbg(dev, "Mailbox operation had an error: %s\n",
>   			cxl_mbox_cmd_rc2str(mbox_cmd));
>   		return 0; /* completed but caller must check return_code */
> @@ -224,6 +304,7 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>   	const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET);
>   	unsigned long timeout;
>   	u64 md_status;
> +	int rc, irq;
>   
>   	timeout = jiffies + mbox_ready_timeout * HZ;
>   	do {
> @@ -272,6 +353,27 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>   	dev_dbg(cxlds->dev, "Mailbox payload sized %zu",
>   		cxlds->payload_size);
>   
> +	if (cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ) {
> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> +
> +		irq = pci_irq_vector(pdev,
> +			     FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap));
> +		if (irq < 0)
> +			goto mbox_poll;
> +
> +		rc = devm_request_irq(cxlds->dev, irq, cxl_pci_mbox_irq,
> +				      IRQF_SHARED, "mailbox", cxlds);
> +		if (rc)
> +			goto mbox_poll;
> +
> +		writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
> +		       cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);
> +
> +		return 0;
> +	}
> +
> +mbox_poll:
> +	dev_dbg(cxlds->dev, "Mailbox interrupts are unsupported");
>   	return 0;
>   }
>   

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] cxl/mbox: Add sanitation handling machinery
  2023-04-21  9:23 ` [PATCH 3/7] cxl/mbox: Add sanitation " Davidlohr Bueso
@ 2023-04-28 16:43   ` Dave Jiang
  2023-04-28 16:46     ` Davidlohr Bueso
  2023-05-11 14:45   ` Jonathan Cameron
  1 sibling, 1 reply; 38+ messages in thread
From: Dave Jiang @ 2023-04-28 16:43 UTC (permalink / raw)
  To: Davidlohr Bueso, dan.j.williams
  Cc: Jonathan.Cameron, alison.schofield, ira.weiny, vishal.l.verma,
	fan.ni, a.manzanares, linux-cxl



On 4/21/23 2:23 AM, Davidlohr Bueso wrote:
> Sanitation is by definition a device-monopolizing operation, and thus
> the timeslicing rules for other background commands do not apply.
> As such handle this special case asynchronously and return immediately.
> Subsequent changes will allow completion to be pollable from userspace
> via a sysfs file interface.
> 
> For devices that don't support interrupts for notifying background
> command completion, self-poll with the caveat that the poller can
> be out of sync with the ready hardware, and therefore care must be
> taken to not allow any new commands to go through until the poller
> sees the hw completion. The poller takes the mbox_mutex to stabilize
> the flagging, minimizing any runtime overhead in the send path to
> check for 'sanitize_tmo' for uncommon poll scenarios. This flag
> also serves for sanitation (the only user of async polling) to know
> when to queue work or simply rely on irqs.
> 
> The irq case is much simpler as hardware will serialize/error
> appropriately.
> 
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
>   drivers/cxl/cxlmem.h | 16 +++++++++
>   drivers/cxl/pci.c    | 79 ++++++++++++++++++++++++++++++++++++++++++--
>   2 files changed, 93 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 8c3302fc7738..17e3ab3c641a 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -220,6 +220,18 @@ struct cxl_event_state {
>   	struct mutex log_lock;
>   };
>   
> +/**
> + * struct cxl_security_state - Device security state
> + *
> + * @sanitize_dwork: self-polling work item for sanitation
> + * @sanitize_tmo: self-polling timeout
> + */
> +struct cxl_security_state {
> +	/* below only used if device mbox irqs are not supported */
> +	struct delayed_work sanitize_dwork;
> +	int sanitize_tmo;
> +};
> +
>   /**
>    * struct cxl_dev_state - The driver device state
>    *
> @@ -256,6 +268,7 @@ struct cxl_event_state {
>    * @serial: PCIe Device Serial Number
>    * @doe_mbs: PCI DOE mailbox array
>    * @event: event log driver state
> + * @sec: device security state
>    * @mbox_send: @dev specific transport for transmitting mailbox commands
>    *
>    * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
> @@ -296,6 +309,8 @@ struct cxl_dev_state {
>   
>   	struct cxl_event_state event;
>   
> +	struct cxl_security_state sec;
> +
>   	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
>   };
>   
> @@ -327,6 +342,7 @@ enum cxl_opcode {
>   	CXL_MBOX_OP_GET_SCAN_MEDIA_CAPS	= 0x4303,
>   	CXL_MBOX_OP_SCAN_MEDIA		= 0x4304,
>   	CXL_MBOX_OP_GET_SCAN_MEDIA	= 0x4305,
> +	CXL_MBOX_OP_SANITIZE		= 0x4400,
>   	CXL_MBOX_OP_GET_SECURITY_STATE	= 0x4500,
>   	CXL_MBOX_OP_SET_PASSPHRASE	= 0x4501,
>   	CXL_MBOX_OP_DISABLE_PASSPHRASE	= 0x4502,
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index aa1bb74a52a1..bdee5273af5a 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -97,6 +97,8 @@ static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
>   static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>   {
>   	struct cxl_dev_state *cxlds = id;
> +	u64 reg;
> +	u16 opcode;
>   
>   	/* spurious or raced with hw? */
>   	if (!cxl_mbox_background_complete(cxlds)) {
> @@ -107,12 +109,47 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>   		goto done;
>   	}
>   
> -	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
> -	wake_up(&mbox_wait);
> +	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> +	opcode = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
> +
> +	if (opcode == CXL_MBOX_OP_SANITIZE) {
> +		dev_dbg(cxlds->dev, "Sanitation operation ended\n");

I might be missing something. Do we not want to stop waiting as well if 
the sanitation operation has ended?

> +	} else {
> +		/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
> +		wake_up(&mbox_wait);
> +	}
>   done:
>   	return IRQ_HANDLED;
>   }
>   
> +/*
> + * Sanitation operation polling mode.
> + */
> +static void cxl_mbox_sanitize_work(struct work_struct *work)
> +{
> +	struct cxl_dev_state *cxlds;
> +
> +	cxlds = container_of(work, struct cxl_dev_state,
> +			     sec.sanitize_dwork.work);
> +
> +	WARN_ON(cxlds->sec.sanitize_tmo == -1);
> +
> +	mutex_lock(&cxlds->mbox_mutex);
> +	if (cxl_mbox_background_complete(cxlds)) {
> +		cxlds->sec.sanitize_tmo = 0;
> +		put_device(cxlds->dev);
> +
> +		dev_dbg(cxlds->dev, "Sanitation operation ended\n");
> +	} else {
> +		int tmo = cxlds->sec.sanitize_tmo + 10;
> +
> +		cxlds->sec.sanitize_tmo = min(15 * 60, tmo);
> +		queue_delayed_work(system_wq,
> +				   &cxlds->sec.sanitize_dwork, tmo * HZ);
> +	}
> +	mutex_unlock(&cxlds->mbox_mutex);
> +}
> +
>   /**
>    * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
>    * @cxlds: The device state to communicate with.
> @@ -173,6 +210,16 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
>   		return -EBUSY;
>   	}
>   
> +	/*
> +	 * With sanitize polling, hardware might be done and the poller still
> +	 * not be in sync. Ensure no new command comes in until so. Keep the
> +	 * hardware semantics and only allow device health status.
> +	 */
> +	if (unlikely(cxlds->sec.sanitize_tmo > 0)) {
> +		if (mbox_cmd->opcode != CXL_MBOX_OP_GET_HEALTH_INFO)
> +			return -EBUSY;
> +	}
> +
>   	cmd_reg = FIELD_PREP(CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK,
>   			     mbox_cmd->opcode);
>   	if (mbox_cmd->size_in) {
> @@ -223,6 +270,27 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
>   		u64 bg_status_reg;
>   		int i;
>   
> +		/*
> +		 * Sanitation is a special case which monopolizes the device
> +		 * in an uninterruptible state and thus cannot be timesliced.
> +		 * Return immediately instead and allow userspace to poll(2)
> +		 * for completion.
> +		 */
> +		if (mbox_cmd->opcode == CXL_MBOX_OP_SANITIZE) {
> +			if (cxlds->sec.sanitize_tmo != -1) {
> +				/* give first timeout a second */
> +				cxlds->sec.sanitize_tmo = 1;
> +				/* hold the device throughout */
> +				get_device(cxlds->dev);
> +				queue_delayed_work(system_wq,
> +						   &cxlds->sec.sanitize_dwork,
> +						   cxlds->sec.sanitize_tmo * HZ);
> +			}
> +
> +			dev_dbg(dev, "Sanitation operation started\n");
> +			return 0;
> +		}
> +
>   		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
>   			mbox_cmd->opcode);
>   
> @@ -366,6 +434,9 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>   		if (rc)
>   			goto mbox_poll;
>   
> +		/* flag that irqs are enabled */
> +		cxlds->sec.sanitize_tmo = -1;
> +
>   		writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
>   		       cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);
>   
> @@ -373,7 +444,11 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>   	}
>   
>   mbox_poll:
> +	INIT_DELAYED_WORK(&cxlds->sec.sanitize_dwork,
> +			  cxl_mbox_sanitize_work);
> +	cxlds->sec.sanitize_tmo = 0;
>   	dev_dbg(cxlds->dev, "Mailbox interrupts are unsupported");
> +
>   	return 0;
>   }
>   

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] cxl/mbox: Add sanitation handling machinery
  2023-04-28 16:43   ` Dave Jiang
@ 2023-04-28 16:46     ` Davidlohr Bueso
  2023-04-28 17:37       ` Dave Jiang
  0 siblings, 1 reply; 38+ messages in thread
From: Davidlohr Bueso @ 2023-04-28 16:46 UTC (permalink / raw)
  To: Dave Jiang
  Cc: dan.j.williams, Jonathan.Cameron, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Fri, 28 Apr 2023, Dave Jiang wrote:

>>  static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>>  {
>>	struct cxl_dev_state *cxlds = id;
>>+	u64 reg;
>>+	u16 opcode;
>>	/* spurious or raced with hw? */
>>	if (!cxl_mbox_background_complete(cxlds)) {
>>@@ -107,12 +109,47 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>>		goto done;
>>	}
>>-	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
>>-	wake_up(&mbox_wait);
>>+	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
>>+	opcode = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
>>+
>>+	if (opcode == CXL_MBOX_OP_SANITIZE) {
>>+		dev_dbg(cxlds->dev, "Sanitation operation ended\n");
>
>I might be missing something. Do we not want to stop waiting as well
>if the sanitation operation has ended?

The thing here is that sanitize won't ever use the mbox_wait, which is
what makes it special (asynchronous).

So while in theory patch 2 enables sanitize to be in the synchronous path,
it can never occur because there is nothing there yet to trigger it (or
anything else for that matter). And this patch ensures that the sanitize
is isolated within __cxl_pci_mbox_send_cmd().

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] cxl/mbox: Add background cmd handling machinery
  2023-04-28 16:21   ` Dave Jiang
@ 2023-04-28 17:18     ` Davidlohr Bueso
  2023-04-28 21:04       ` Dave Jiang
  0 siblings, 1 reply; 38+ messages in thread
From: Davidlohr Bueso @ 2023-04-28 17:18 UTC (permalink / raw)
  To: Dave Jiang
  Cc: dan.j.williams, Jonathan.Cameron, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Fri, 28 Apr 2023, Dave Jiang wrote:

>>+static DECLARE_WAIT_QUEUE_HEAD(mbox_wait);
>
>I apologize if I've asked this before and you've already answered.
>What is the reason the mbox_wait a module global wq instead of a per
>device wq? Just thinking when you tear down a device, you may want to
>wake all pending for that device to clean up.

Yes, I agree that doing the wait per-device is better, and not only
for tear down reasons. By doing it globally, the queue reflects waits
from different devices, but the driver really has no control about
the order of the wait of each device is, so upon a blind wake_up(),
it could perfectly well be that the first node in the waitq is not
the node for that device.

This all goes away with per-device, with the note that because of the
mbox_mutex there never will only ever be a single waiter, so no concept
of a queue really.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] cxl/mbox: Add sanitation handling machinery
  2023-04-28 16:46     ` Davidlohr Bueso
@ 2023-04-28 17:37       ` Dave Jiang
  0 siblings, 0 replies; 38+ messages in thread
From: Dave Jiang @ 2023-04-28 17:37 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, Jonathan.Cameron, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl



On 4/28/23 9:46 AM, Davidlohr Bueso wrote:
> On Fri, 28 Apr 2023, Dave Jiang wrote:
> 
>>>  static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>>>  {
>>>     struct cxl_dev_state *cxlds = id;
>>> +    u64 reg;
>>> +    u16 opcode;
>>>     /* spurious or raced with hw? */
>>>     if (!cxl_mbox_background_complete(cxlds)) {
>>> @@ -107,12 +109,47 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, 
>>> void *id)
>>>         goto done;
>>>     }
>>> -    /* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
>>> -    wake_up(&mbox_wait);
>>> +    reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
>>> +    opcode = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
>>> +
>>> +    if (opcode == CXL_MBOX_OP_SANITIZE) {
>>> +        dev_dbg(cxlds->dev, "Sanitation operation ended\n");
>>
>> I might be missing something. Do we not want to stop waiting as well
>> if the sanitation operation has ended?
> 
> The thing here is that sanitize won't ever use the mbox_wait, which is
> what makes it special (asynchronous).
> 
> So while in theory patch 2 enables sanitize to be in the synchronous path,
> it can never occur because there is nothing there yet to trigger it (or
> anything else for that matter). And this patch ensures that the sanitize
> is isolated within __cxl_pci_mbox_send_cmd().

Gotcha. Thanks for the explanation.
> 
> Thanks,
> Davidlohr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] cxl/mbox: Add background cmd handling machinery
  2023-04-28 17:18     ` Davidlohr Bueso
@ 2023-04-28 21:04       ` Dave Jiang
  2023-04-28 22:03         ` Davidlohr Bueso
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Jiang @ 2023-04-28 21:04 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, Jonathan.Cameron, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl



On 4/28/23 10:18 AM, Davidlohr Bueso wrote:
> On Fri, 28 Apr 2023, Dave Jiang wrote:
> 
>>> +static DECLARE_WAIT_QUEUE_HEAD(mbox_wait);
>>
>> I apologize if I've asked this before and you've already answered.
>> What is the reason the mbox_wait a module global wq instead of a per
>> device wq? Just thinking when you tear down a device, you may want to
>> wake all pending for that device to clean up.
> 
> Yes, I agree that doing the wait per-device is better, and not only
> for tear down reasons. By doing it globally, the queue reflects waits
> from different devices, but the driver really has no control about
> the order of the wait of each device is, so upon a blind wake_up(),
> it could perfectly well be that the first node in the waitq is not
> the node for that device.
> 
> This all goes away with per-device, with the note that because of the
> mbox_mutex there never will only ever be a single waiter, so no concept
> of a queue really.

If there's only a single waiter then I think a 'completion' can be used 
instead right?

> 
> Thanks,
> Davidlohr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] cxl/mbox: Add background cmd handling machinery
  2023-04-28 21:04       ` Dave Jiang
@ 2023-04-28 22:03         ` Davidlohr Bueso
  2023-05-01 15:56           ` Davidlohr Bueso
  0 siblings, 1 reply; 38+ messages in thread
From: Davidlohr Bueso @ 2023-04-28 22:03 UTC (permalink / raw)
  To: Dave Jiang
  Cc: dan.j.williams, Jonathan.Cameron, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Fri, 28 Apr 2023, Dave Jiang wrote:

>On 4/28/23 10:18 AM, Davidlohr Bueso wrote:
>>On Fri, 28 Apr 2023, Dave Jiang wrote:
>>
>>>>+static DECLARE_WAIT_QUEUE_HEAD(mbox_wait);
>>>
>>>I apologize if I've asked this before and you've already answered.
>>>What is the reason the mbox_wait a module global wq instead of a per
>>>device wq? Just thinking when you tear down a device, you may want to
>>>wake all pending for that device to clean up.
>>
>>Yes, I agree that doing the wait per-device is better, and not only
>>for tear down reasons. By doing it globally, the queue reflects waits
>>from different devices, but the driver really has no control about
>>the order of the wait of each device is, so upon a blind wake_up(),
>>it could perfectly well be that the first node in the waitq is not
>>the node for that device.
>>
>>This all goes away with per-device, with the note that because of the
>>mbox_mutex there never will only ever be a single waiter, so no concept
>>of a queue really.
>
>If there's only a single waiter then I think a 'completion' can be
>used instead right?

Well completions still use (simple) wq underneath. What we really
want is rcuwait semantics but there is currently no support for
schedule timeouts. INTERRUPTIBLE sleep was added for new users in
the past, so it could be doable.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] cxl/mbox: Add background cmd handling machinery
  2023-04-28 22:03         ` Davidlohr Bueso
@ 2023-05-01 15:56           ` Davidlohr Bueso
  0 siblings, 0 replies; 38+ messages in thread
From: Davidlohr Bueso @ 2023-05-01 15:56 UTC (permalink / raw)
  To: Dave Jiang
  Cc: dan.j.williams, Jonathan.Cameron, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Fri, 28 Apr 2023, Davidlohr Bueso wrote:

>On Fri, 28 Apr 2023, Dave Jiang wrote:
>
>>On 4/28/23 10:18 AM, Davidlohr Bueso wrote:
>>>On Fri, 28 Apr 2023, Dave Jiang wrote:
>>>
>>>>>+static DECLARE_WAIT_QUEUE_HEAD(mbox_wait);
>>>>
>>>>I apologize if I've asked this before and you've already answered.
>>>>What is the reason the mbox_wait a module global wq instead of a per
>>>>device wq? Just thinking when you tear down a device, you may want to
>>>>wake all pending for that device to clean up.
>>>
>>>Yes, I agree that doing the wait per-device is better, and not only
>>>for tear down reasons. By doing it globally, the queue reflects waits
>>>from different devices, but the driver really has no control about
>>>the order of the wait of each device is, so upon a blind wake_up(),
>>>it could perfectly well be that the first node in the waitq is not
>>>the node for that device.
>>>
>>>This all goes away with per-device, with the note that because of the
>>>mbox_mutex there never will only ever be a single waiter, so no concept
>>>of a queue really.
>>
>>If there's only a single waiter then I think a 'completion' can be
>>used instead right?
>
>Well completions still use (simple) wq underneath. What we really
>want is rcuwait semantics but there is currently no support for
>schedule timeouts. INTERRUPTIBLE sleep was added for new users in
>the past, so it could be doable.

Something like this, but I guess we could use completions for now and
convert to rcuwait later if we don't want to have to depend on sched
bits for this patch - and I'm sure there are a few other users out there
abusing queued wait for similar semantics that could make use of it.


diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h
index 8052d34da782..9e4759de228b 100644
--- a/include/linux/rcuwait.h
+++ b/include/linux/rcuwait.h
@@ -49,9 +49,9 @@ static inline void prepare_to_rcuwait(struct rcuwait *w)

  extern void finish_rcuwait(struct rcuwait *w);

-#define rcuwait_wait_event(w, condition, state)				\
+#define ___rcuwait_wait_event(w, condition, state, ret, cmd)		\
  ({									\
-	int __ret = 0;							\
+	long __ret = ret;						\
	prepare_to_rcuwait(w);						\
	for (;;) {							\
		/*							\
@@ -67,10 +67,27 @@ extern void finish_rcuwait(struct rcuwait *w);
			break;						\
		}							\
									\
-		schedule();						\
+		cmd;							\
	}								\
	finish_rcuwait(w);						\
	__ret;								\
  })

+#define rcuwait_wait_event(w, condition, state)				\
+	___rcuwait_wait_event(w, condition, state, 0, schedule())
+
+#define __rcuwait_wait_event_timeout(w, condition, state, timeout)	\
+	___rcuwait_wait_event(w, ___wait_cond_timeout(condition),	\
+			      state, timeout,				\
+			      __ret = schedule_timeout(__ret))
+
+#define rcuwait_wait_event_timeout(w, condition, state, timeout)	\
+({									\
+	long __ret = timeout;						\
+	if (!___wait_cond_timeout(condition))				\
+		__ret = __rcuwait_wait_event_timeout(w, condition,	\
+						     timeout);		\
+	__ret;								\
+})
+
  #endif /* _LINUX_RCUWAIT_H_ */

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/7] cxl/pci: Allocate irq vectors earlier in pci probe
  2023-04-21  9:23 ` [PATCH 1/7] cxl/pci: Allocate irq vectors earlier in pci probe Davidlohr Bueso
  2023-04-28 16:09   ` Dave Jiang
@ 2023-05-11 13:55   ` Jonathan Cameron
  1 sibling, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2023-05-11 13:55 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Fri, 21 Apr 2023 02:23:15 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:

> Move the cxl_alloc_irq_vectors() call further up in the probing
> in order to allow for mailbox interrupt usage. No change in
> semantics.
> 
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/pci.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 60b23624d167..39b829a29f6c 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -757,6 +757,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (rc)
>  		dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
>  
> +	rc = cxl_alloc_irq_vectors(pdev);
> +	if (rc)
> +		return rc;
> +
>  	rc = cxl_pci_setup_mailbox(cxlds);
>  	if (rc)
>  		return rc;
> @@ -777,10 +781,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (rc)
>  		return rc;
>  
> -	rc = cxl_alloc_irq_vectors(pdev);
> -	if (rc)
> -		return rc;
> -
>  	cxlmd = devm_cxl_add_memdev(cxlds);
>  	if (IS_ERR(cxlmd))
>  		return PTR_ERR(cxlmd);


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] cxl/mbox: Add background cmd handling machinery
  2023-04-21  9:23 ` [PATCH 2/7] cxl/mbox: Add background cmd handling machinery Davidlohr Bueso
  2023-04-23  7:54   ` Li, Ming
  2023-04-28 16:21   ` Dave Jiang
@ 2023-05-11 14:23   ` Jonathan Cameron
  2023-05-11 16:04     ` Davidlohr Bueso
  2 siblings, 1 reply; 38+ messages in thread
From: Jonathan Cameron @ 2023-05-11 14:23 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Fri, 21 Apr 2023 02:23:16 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:

> This adds support for handling background operations, as defined in
> the CXL 3.0 spec. Commands that can take too long (over ~2 seconds)
> can run in the background asynchronously (to the hardware).
> 
> The driver will deal with such commands synchronously, blocking all
> other incoming commands for a specified period of time, allowing
> time-slicing the command such that the caller can send incremental
> requests to avoid monopolizing the driver/device. This approach
> makes the code simpler, where any out of sync (timeout) between the
> driver and hardware is just disregarded as an invalid state until
> the next successful submission.
> 
> On devices where mbox interrupts are supported, this will still use
> a poller that will wakeup in the specified wait intervals. The irq
> handler will simply awake a blocked cmd, which is also safe vs a
> task that is either waking (timing out) or already awoken. Similarly
> any irq setup error during the probing falls back to polling, thus
> avoids unnecessarily erroring out.
> 
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
>  drivers/cxl/core/mbox.c |   3 +-
>  drivers/cxl/cxl.h       |   7 +++
>  drivers/cxl/cxlmem.h    |   5 ++
>  drivers/cxl/pci.c       | 104 +++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 117 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 6198637cb0bb..cde7270c6037 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -180,7 +180,8 @@ int cxl_internal_send_cmd(struct cxl_dev_state *cxlds,
>  	if (rc)
>  		return rc;
>  
> -	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS)
> +	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS &&
> +	    mbox_cmd->return_code != CXL_MBOX_CMD_RC_BACKGROUND)
>  		return cxl_mbox_cmd_rc2errno(mbox_cmd);
>  
>  	if (!out_size)
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 044a92d9813e..72731a896f58 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -176,14 +176,21 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>  /* CXL 2.0 8.2.8.4 Mailbox Registers */
>  #define CXLDEV_MBOX_CAPS_OFFSET 0x00
>  #define   CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
> +#define   CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK GENMASK(10, 7)
> +#define   CXLDEV_MBOX_CAP_BG_CMD_IRQ BIT(6)

Numeric order of bits probably makes more sense. So move this up one line.

>  #define CXLDEV_MBOX_CTRL_OFFSET 0x04
>  #define   CXLDEV_MBOX_CTRL_DOORBELL BIT(0)
> +#define   CXLDEV_MBOX_CTRL_BG_CMD_IRQ BIT(2)
>  #define CXLDEV_MBOX_CMD_OFFSET 0x08
>  #define   CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)
>  #define   CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK GENMASK_ULL(36, 16)
>  #define CXLDEV_MBOX_STATUS_OFFSET 0x10
> +#define   CXLDEV_MBOX_STATUS_BG_CMD BIT(0)

Hmm. Oddly field is called Background Operation.  Still that is then
described as a command so I guess this is a reasonable bit of consolidation
of naming.

>  #define   CXLDEV_MBOX_STATUS_RET_CODE_MASK GENMASK_ULL(47, 32)
>  #define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18
> +#define   CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)

Obviously there are line length disadvantages to includes the status part
in the field names and there isn't a BG_CMD_COMMAND register thankfully
but I still find the absence of status a bit inconsistent / confusing.
initial instinct is this a field called OP_CODE in a register called
BG_CMD_COMMAND

> +#define   CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK GENMASK_ULL(22, 16)
> +#define   CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK GENMASK_ULL(47, 32)

It might be nice to to do 'something' with the Vendor Specific extended
status even if it is just put it in a dev_dbg() for anyone who cares?

>  #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
>  
>  /*
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 090acebba4fa..8c3302fc7738 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -108,6 +108,9 @@ static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
>   *            variable sized output commands, it tells the exact number of bytes
>   *            written.
>   * @min_out: (input) internal command output payload size validation
> + * @poll_count: (input)  Number of timeouts to attempt.
> + * @poll_interval: (input) Number of ms between mailbox background command
> + *                 polling intervals timeouts.

name it poll_interval_ms: and the units become obvious everywhere without needing
comments.  Good for the lazy / forgetful reviewer if nothing else...

>   * @return_code: (output) Error code returned from hardware.
>   *
>   * This is the primary mechanism used to send commands to the hardware.
> @@ -123,6 +126,8 @@ struct cxl_mbox_cmd {
>  	size_t size_in;
>  	size_t size_out;
>  	size_t min_out;
> +	int poll_count;
> +	int poll_interval;
>  	u16 return_code;
>  };
>  
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 39b829a29f6c..aa1bb74a52a1 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -51,6 +51,7 @@
>  static unsigned short mbox_ready_timeout = 60;
>  module_param(mbox_ready_timeout, ushort, 0644);
>  MODULE_PARM_DESC(mbox_ready_timeout, "seconds to wait for mailbox ready");
> +static DECLARE_WAIT_QUEUE_HEAD(mbox_wait);

I see in discussion you are moving to a per device approach so I won't review
that bit on this version.

>  
>  static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
>  {
> @@ -85,6 +86,33 @@ static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
>  			    status & CXLMDEV_DEV_FATAL ? " fatal" : "",        \
>  			    status & CXLMDEV_FW_HALT ? " firmware-halt" : "")
>  
> +static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
> +{
> +	u64 reg;
> +
> +	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> +	return FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg) == 100;

This is what motivated comment on including _STATUS_ in field names.
I briefly thought you had a field from the wrong register.

> +}
> +
> +static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
> +{
> +	struct cxl_dev_state *cxlds = id;
> +
> +	/* spurious or raced with hw? */

If talking about a race, I'd like a comment that gives more info.
What is the potential hardware race?

I can sort of see polling might have noticed completed command and
launched another one, all before the interrupt actually got handled.
If that's what you were thinking then eat the interrupt without the
scary message.

> +	if (!cxl_mbox_background_complete(cxlds)) {
> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> +
> +		dev_warn(&pdev->dev,
> +			 "Mailbox background operation IRQ but incomplete\n");
> +		goto done;
> +	}
> +
> +	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
> +	wake_up(&mbox_wait);
> +done:
> +	return IRQ_HANDLED;
> +}
> +
>  /**
>   * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
>   * @cxlds: The device state to communicate with.
> @@ -178,7 +206,59 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
>  	mbox_cmd->return_code =
>  		FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
>  
> -	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
> +	/*
> +	 * Handle the background command in a synchronous manner.
> +	 *
> +	 * All other mailbox commands will serialize/queue on the mbox_mutex,
> +	 * which we currently hold. Furthermore this also guarantees that
> +	 * cxl_mbox_background_complete() checks are safe amongst each other,
> +	 * in that no new bg operation can occur in between.
> +	 *
> +	 * Background operations are timesliced in accordance with the nature
> +	 * of the command. In the event of timeout, the mailbox state is
> +	 * indeterminate until the next successful command submission and the
> +	 * driver can get back in sync with the hardware state.
> +	 */
> +	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
> +		u64 bg_status_reg;
> +		int i;
> +
> +		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
> +			mbox_cmd->opcode);
> +
> +		for (i = 0; i < mbox_cmd->poll_count; i++) {
> +			int ret = wait_event_interruptible_timeout(

We already have an rc floating around in here.  Having ret as well with more limited
scope isn't great for readability.  I think you can just use rc here.

> +				mbox_wait, cxl_mbox_background_complete(cxlds),
> +				msecs_to_jiffies(mbox_cmd->poll_interval));
> +			if (ret > 0)
> +				break;
> +
I'd drop this blank line to keep all the handling of ret in one block of code.
It looks a bit too separate to me otherwise.

> +			/* interrupted by a signal */
> +			if (ret < 0)
> +				return ret;
> +		}
> +
> +		if (!cxl_mbox_background_complete(cxlds)) {
> +			u64 md_status =
> +				readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> +
> +			cxl_cmd_err(cxlds->dev, mbox_cmd, md_status,
> +				    "background timeout");

Why are we interested in the memory device status at this point? A timeout on background
command isn't really a cxl_cmd_err() in my head at least.

> +			return -ETIMEDOUT;
> +		}
> +
> +		bg_status_reg = readq(cxlds->regs.mbox +
> +				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> +		mbox_cmd->return_code =
> +			FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
> +				  bg_status_reg);
> +		dev_dbg(dev,
> +			"Mailbox background operation (0x%04x) completed\n",
> +			mbox_cmd->opcode);

Here is where I'd like to also log the vendor specific extended status.

> +	}
> +
> +	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS &&
> +	    mbox_cmd->return_code != CXL_MBOX_CMD_RC_BACKGROUND) {

You overrode the original return code with the background command return code. So if you get
here and it's still RC_BACKGROUND I think something went wrong.

>  		dev_dbg(dev, "Mailbox operation had an error: %s\n",
>  			cxl_mbox_cmd_rc2str(mbox_cmd));
>  		return 0; /* completed but caller must check return_code */
> @@ -224,6 +304,7 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>  	const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET);
>  	unsigned long timeout;
>  	u64 md_status;
> +	int rc, irq;
>  
>  	timeout = jiffies + mbox_ready_timeout * HZ;
>  	do {
> @@ -272,6 +353,27 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>  	dev_dbg(cxlds->dev, "Mailbox payload sized %zu",
>  		cxlds->payload_size);
>  
> +	if (cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ) {
> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> +
> +		irq = pci_irq_vector(pdev,
> +			     FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap));
> +		if (irq < 0)
> +			goto mbox_poll;
> +
> +		rc = devm_request_irq(cxlds->dev, irq, cxl_pci_mbox_irq,
> +				      IRQF_SHARED, "mailbox", cxlds);
> +		if (rc)
> +			goto mbox_poll;

Hmm. The old argument of whether to carry on when something unexpected happens.
I'd argue in this case at least and possibly the one above we should fail
hard as we really want to know if interrupt allocations are failing, not just
fall back quietly to polling.  I'd rather fail to probe the driver in a fashion
that lets us figure out what broke.

> +
> +		writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
> +		       cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);
> +
> +		return 0;
> +	}
> +
> +mbox_poll:
> +	dev_dbg(cxlds->dev, "Mailbox interrupts are unsupported");
>  	return 0;
>  }
>  


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] cxl/mbox: Add sanitation handling machinery
  2023-04-21  9:23 ` [PATCH 3/7] cxl/mbox: Add sanitation " Davidlohr Bueso
  2023-04-28 16:43   ` Dave Jiang
@ 2023-05-11 14:45   ` Jonathan Cameron
  2023-05-11 16:48     ` Davidlohr Bueso
  1 sibling, 1 reply; 38+ messages in thread
From: Jonathan Cameron @ 2023-05-11 14:45 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Fri, 21 Apr 2023 02:23:17 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:

> Sanitation is by definition a device-monopolizing operation, and thus
> the timeslicing rules for other background commands do not apply.
> As such handle this special case asynchronously and return immediately.
> Subsequent changes will allow completion to be pollable from userspace
> via a sysfs file interface.
> 
> For devices that don't support interrupts for notifying background
> command completion, self-poll with the caveat that the poller can
> be out of sync with the ready hardware, and therefore care must be
> taken to not allow any new commands to go through until the poller
> sees the hw completion. The poller takes the mbox_mutex to stabilize
> the flagging, minimizing any runtime overhead in the send path to
> check for 'sanitize_tmo' for uncommon poll scenarios. This flag
> also serves for sanitation (the only user of async polling) to know
> when to queue work or simply rely on irqs.
> 
> The irq case is much simpler as hardware will serialize/error
> appropriately.
> 
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
>  drivers/cxl/cxlmem.h | 16 +++++++++
>  drivers/cxl/pci.c    | 79 ++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 93 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 8c3302fc7738..17e3ab3c641a 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -220,6 +220,18 @@ struct cxl_event_state {
>  	struct mutex log_lock;
>  };
>  
> +/**
> + * struct cxl_security_state - Device security state
> + *
> + * @sanitize_dwork: self-polling work item for sanitation
> + * @sanitize_tmo: self-polling timeout
> + */
> +struct cxl_security_state {
> +	/* below only used if device mbox irqs are not supported */
Call it out by name.  We are almost sure to make a 'below' bit rot
at somepoint :)

> +	struct delayed_work sanitize_dwork;
> +	int sanitize_tmo;
> +};
> +
>  /**
>   * struct cxl_dev_state - The driver device state
>   *
> @@ -256,6 +268,7 @@ struct cxl_event_state {
>   * @serial: PCIe Device Serial Number
>   * @doe_mbs: PCI DOE mailbox array
>   * @event: event log driver state
> + * @sec: device security state
>   * @mbox_send: @dev specific transport for transmitting mailbox commands
>   *
>   * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
> @@ -296,6 +309,8 @@ struct cxl_dev_state {
>  
>  	struct cxl_event_state event;
>  
> +	struct cxl_security_state sec;
> +
>  	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
>  };

...

> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index aa1bb74a52a1..bdee5273af5a 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -97,6 +97,8 @@ static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
>  static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>  {
>  	struct cxl_dev_state *cxlds = id;
> +	u64 reg;
> +	u16 opcode;
>  
>  	/* spurious or raced with hw? */
>  	if (!cxl_mbox_background_complete(cxlds)) {
> @@ -107,12 +109,47 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>  		goto done;
>  	}
>  
> -	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
> -	wake_up(&mbox_wait);
> +	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> +	opcode = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
> +
> +	if (opcode == CXL_MBOX_OP_SANITIZE) {
> +		dev_dbg(cxlds->dev, "Sanitation operation ended\n");
> +	} else {
> +		/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
> +		wake_up(&mbox_wait);
> +	}
>  done:
>  	return IRQ_HANDLED;
>  }
>  
> +/*
> + * Sanitation operation polling mode.
> + */
> +static void cxl_mbox_sanitize_work(struct work_struct *work)
> +{
> +	struct cxl_dev_state *cxlds;
> +
> +	cxlds = container_of(work, struct cxl_dev_state,
> +			     sec.sanitize_dwork.work);
> +
> +	WARN_ON(cxlds->sec.sanitize_tmo == -1);

Overly paranoid?

> +
> +	mutex_lock(&cxlds->mbox_mutex);
> +	if (cxl_mbox_background_complete(cxlds)) {
> +		cxlds->sec.sanitize_tmo = 0;
> +		put_device(cxlds->dev);
> +
> +		dev_dbg(cxlds->dev, "Sanitation operation ended\n");
> +	} else {
> +		int tmo = cxlds->sec.sanitize_tmo + 10;

Add some units to the naming of variables.

> +
> +		cxlds->sec.sanitize_tmo = min(15 * 60, tmo);

Why? That feels like it needs a comment to me.  Not that expensive
to check this so I'm not sure the ramp up is that logical.

> +		queue_delayed_work(system_wq,
> +				   &cxlds->sec.sanitize_dwork, tmo * HZ);
> +	}
> +	mutex_unlock(&cxlds->mbox_mutex);
> +}
> +
>  /**
>   * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
>   * @cxlds: The device state to communicate with.
> @@ -173,6 +210,16 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
>  		return -EBUSY;
>  	}
>  
> +	/*
> +	 * With sanitize polling, hardware might be done and the poller still
> +	 * not be in sync. Ensure no new command comes in until so. Keep the
> +	 * hardware semantics and only allow device health status.
> +	 */
> +	if (unlikely(cxlds->sec.sanitize_tmo > 0)) {
> +		if (mbox_cmd->opcode != CXL_MBOX_OP_GET_HEALTH_INFO)

Doesn't this let the value of mbox_cmd->opcode change to HEALTH_INFO so that
when we get here again we could carry on without other commands though still not in
sync (if things are very weird).

> +			return -EBUSY;
> +	}
> +
>  	cmd_reg = FIELD_PREP(CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK,
>  			     mbox_cmd->opcode);
>  	if (mbox_cmd->size_in) {
> @@ -223,6 +270,27 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
>  		u64 bg_status_reg;
>  		int i;
>  
> +		/*
> +		 * Sanitation is a special case which monopolizes the device
> +		 * in an uninterruptible state and thus cannot be timesliced.
> +		 * Return immediately instead and allow userspace to poll(2)
> +		 * for completion.
> +		 */
> +		if (mbox_cmd->opcode == CXL_MBOX_OP_SANITIZE) {
> +			if (cxlds->sec.sanitize_tmo != -1) {

As below. Have a self explanatory variable called sec.polling or sec.interrupt

> +				/* give first timeout a second */
> +				cxlds->sec.sanitize_tmo = 1;

If this was named santize_tmo_secs then comment not needed.

> +				/* hold the device throughout */
> +				get_device(cxlds->dev);
> +				queue_delayed_work(system_wq,
> +						   &cxlds->sec.sanitize_dwork,
> +						   cxlds->sec.sanitize_tmo * HZ);
> +			}
> +
> +			dev_dbg(dev, "Sanitation operation started\n");
> +			return 0;
> +		}
> +
>  		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
>  			mbox_cmd->opcode);
>  
> @@ -366,6 +434,9 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>  		if (rc)
>  			goto mbox_poll;
>  
> +		/* flag that irqs are enabled */
> +		cxlds->sec.sanitize_tmo = -1;

That's confusing.  I'd add a separate structure element for it instead with
appropriate naming.

> +
>  		writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
>  		       cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);
>  
> @@ -373,7 +444,11 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>  	}
>  
>  mbox_poll:
> +	INIT_DELAYED_WORK(&cxlds->sec.sanitize_dwork,
> +			  cxl_mbox_sanitize_work);
> +	cxlds->sec.sanitize_tmo = 0;
>  	dev_dbg(cxlds->dev, "Mailbox interrupts are unsupported");
> +
My favorite moan. Unrelated whitespace change! Push it to patch 2 that introduced
that dev_dbg() I think.

>  	return 0;
>  }
>  


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/7] cxl/mem: Wire up Sanitation support
  2023-04-21  9:23 ` [PATCH 4/7] cxl/mem: Wire up Sanitation support Davidlohr Bueso
  2023-04-21 20:04   ` kernel test robot
  2023-04-21 20:24   ` kernel test robot
@ 2023-05-11 15:07   ` Jonathan Cameron
  2023-05-11 17:23     ` Davidlohr Bueso
  2 siblings, 1 reply; 38+ messages in thread
From: Jonathan Cameron @ 2023-05-11 15:07 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Fri, 21 Apr 2023 02:23:18 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:

> Implement support for CXL 3.0 8.2.9.8.5.1 Sanitize. This is done by
> adding a security/sanitize' memdev sysfs file, which is poll(2)-capable
> for completion. Unlike all other background commands, this is the
> only operation that is special and monopolizes the device for long
> periods of time.
> 
> In addition to the traditional pmem security requirements, all regions
> must also be offline in order to perform the operation.
> This permits
> avoiding explicit global CPU cache management, relying instead on
> attach_target() setting CXL_REGION_F_INCOHERENT upon reconnect.
> 
> The expectation is that userspace can use it such as:
> 
>     cxl disable-memdev memX
>     echo 1 > /sys/bus/cxl/devices/memX/security/sanitize
>     cxl wait-sanitize memX
>     cxl enable-memdev memX
> 
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 19 ++++++
>  drivers/cxl/core/mbox.c                 | 56 ++++++++++++++++
>  drivers/cxl/core/memdev.c               | 86 +++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h                    |  4 ++
>  drivers/cxl/pci.c                       |  5 ++
>  5 files changed, 170 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 3acf2f17a73f..2e98ec9220ca 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -58,6 +58,25 @@ Description:
>  		affinity for this device.
>  
>  
> +What:           /sys/bus/cxl/devices/memX/security/sanitize
> +Date:           May, 2023
> +KernelVersion:  v6.5
> +Contact:        linux-cxl@vger.kernel.org
> +Description:
> +		(RW) Write a boolean 'true' string value to this attribute to
> +		sanitize the device to securely re-purpose or decommission it.
> +		This is done by ensuring that all user data and meta-data,
> +		whether it resides in persistent capacity, volatile capacity,
> +		or the LSA, is made permanently unavailable by whatever means
> +		is appropriate for the media type. This functionality requires
> +		the device to be not be actively decoding any HPA ranges.
> +
> +		Reading this file shows either "disabled" when not running, or
> +		"sanitize" during the duration of the sanitize operation. This
> +		sysfs entry is select/poll capable from userspace to notify upon
> +		completion.

A sysfs attribute that reads different from what is written is not very intuitive.
The one file one thing rule suggests to me that you should have a separate
santize_status or similar.  Or just have this read true when in progress making
it a self resetting toggle that returns -EBUSY if anyone tries to unset it.

> +
> +
>  What:		/sys/bus/cxl/devices/*/devtype
>  Date:		June, 2021
>  KernelVersion:	v5.14
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index cde7270c6037..28daf7dcdec4 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1021,6 +1021,62 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_dev_state_identify, CXL);
>  
> +/**
> + * cxl_mem_sanitize() - Send a sanitation command to the device.
> + * @cxlds: The device data for the operation
> + * @cmd: The specific sanitation command opcode
> + *
> + * Return: 0 if the command was executed successfully, regardless of
> + * whether or not the actual security operation is done in the background,
> + * such as for the Sanitize case.
> + * Error return values can be the result of the mailbox command, -EINVAL
> + * when security requirements are not met or invalid contexts, or -EBUSY
> + * if the device is not offline.

What does offline mean for the device?  Perhaps a tighter definition needed.

> + *
> + * See CXL 3.0 @8.2.9.8.5.1 Sanitize and @8.2.9.8.5.2 Secure Erase.

This @ syntax would  be fine but it's inconsistent with other references in
this file.

> + */
> +int cxl_mem_sanitize(struct cxl_dev_state *cxlds, u16 cmd)
> +{
> +	int rc;
> +	u32 sec_out = 0;
> +	struct cxl_get_security_output {
> +		__le32 flags;
> +	} out;
> +	struct cxl_mbox_cmd sec_cmd = {
> +		.opcode = CXL_MBOX_OP_GET_SECURITY_STATE,
> +		.payload_out = &out,
> +		.size_out = sizeof(out),
> +	};
> +	struct cxl_mbox_cmd mbox_cmd = { .opcode = cmd };
> +
> +	if (cmd != CXL_MBOX_OP_SANITIZE)
> +		return -EINVAL;
> +
> +	rc = cxl_internal_send_cmd(cxlds, &sec_cmd);
> +	if (rc < 0) {
> +		dev_err(cxlds->dev, "Failed to get security state : %d", rc);
> +		return rc;
> +	}
> +
> +	/*
> +	 * Prior to using these commands, any security applied to
> +	 * the user data areas of the device shall be DISABLED (or
> +	 * UNLOCKED for secure erase case).
> +	 */
> +	sec_out = le32_to_cpu(out.flags);
> +	if (sec_out & CXL_PMEM_SEC_STATE_USER_PASS_SET)
> +		return -EINVAL;
> +
> +	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
> +	if (rc < 0) {
> +		dev_err(cxlds->dev, "Failed to sanitize device : %d", rc);
> +		return rc;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_sanitize, CXL);
> +
>  static int add_dpa_res(struct device *dev, struct resource *parent,
>  		       struct resource *res, resource_size_t start,
>  		       resource_size_t size, const char *type)
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 28a05f2fe32d..70e7158826c9 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -89,6 +89,55 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr,
>  static struct device_attribute dev_attr_pmem_size =
>  	__ATTR(size, 0444, pmem_size_show, NULL);
>  
> +static ssize_t security_sanitize_show(struct device *dev,
> +				      struct device_attribute *attr, char *buf)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	u64 reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> +	u32 pct = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg);
> +	u16 cmd = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
> +
> +	if (cmd == CXL_MBOX_OP_SANITIZE && pct != 100)
> +		return sysfs_emit(buf, "sanitize\n");
> +	else
> +		return sysfs_emit(buf, "disabled\n");

As above. I don't like inconsistency of read and write values.

> +}
> +
> +static ssize_t security_sanitize_store(struct device *dev,
> +				       struct device_attribute *attr,
> +				       const char *buf, size_t len)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	ssize_t rc;
> +	bool sanitize;
> +
> +	rc = kstrtobool(buf, &sanitize);
> +	if (rc)
> +		return rc;
> +
> +	if (sanitize) {

I'd short cut the false case

	if (!sanitize)
		return len;

	...

> +		struct cxl_port *port = dev_get_drvdata(&cxlmd->dev);
> +
> +		if (!port || !is_cxl_endpoint(port))
> +			return -EINVAL;
> +		/* ensure no regions are mapped to this memdev */
> +		if (port->commit_end != -1)
> +			return -EBUSY;
> +
> +		rc = cxl_mem_sanitize(cxlds, CXL_MBOX_OP_SANITIZE);
		if (rc)
			return rc;
	}

	return len;

Simple flow is easier for reviewers to follow.

> +	}
> +
> +	if (rc == 0)
> +		rc = len;
> +	return rc;
> +}
> +
 
> @@ -324,11 +384,19 @@ static const struct file_operations cxl_memdev_fops = {
>  	.llseek = noop_llseek,
>  };
>  
> +static void put_sanitize(void *data)
> +{
> +	struct cxl_dev_state *cxlds = data;
> +
> +	sysfs_put(cxlds->sec.sanitize_state);
> +}
> +
>  struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
>  {
>  	struct cxl_memdev *cxlmd;
>  	struct device *dev;
>  	struct cdev *cdev;
> +	struct kernfs_node *sec;
>  	int rc;
>  
>  	cxlmd = cxl_memdev_alloc(cxlds, &cxl_memdev_fops);
> @@ -355,6 +423,24 @@ struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
>  	rc = devm_add_action_or_reset(cxlds->dev, cxl_memdev_unregister, cxlmd);
>  	if (rc)
>  		return ERR_PTR(rc);
> +
> +	sec = sysfs_get_dirent(dev->kobj.sd, "security");
> +	if (!sec) {
> +		dev_err(dev, "sysfs_get_dirent 'security' failed\n");
> +		rc = -ENODEV;
> +		goto err;

At this stage the devm action is registered to unwind anything above here, so just
		return ERR_PTR(-ENODEV);

> +	}
> +	cxlds->sec.sanitize_state = sysfs_get_dirent(sec, "sanitize");
> +	sysfs_put(sec);
> +	if (!cxlds->sec.sanitize_state) {
> +		dev_err(dev, "sysfs_get_dirent 'sanitize' failed\n");
> +		rc = -ENODEV;
> +		goto err;

return ERR_PTR(-ENODDEV);

> +	}
> +	rc = devm_add_action_or_reset(cxlds->dev, put_sanitize, cxlds);
> +	if (rc)
> +		return ERR_PTR(rc);
> +
>  	return cxlmd;
>  
>  err:
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 17e3ab3c641a..9bd33cfdc0ec 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -223,10 +223,12 @@ struct cxl_event_state {
>  /**
>   * struct cxl_security_state - Device security state
>   *
> + * @sanitize_state: sanitation sysfs file to notify
>   * @sanitize_dwork: self-polling work item for sanitation
>   * @sanitize_tmo: self-polling timeout
>   */
>  struct cxl_security_state {
> +	struct kernfs_node *sanitize_state;
>  	/* below only used if device mbox irqs are not supported */
>  	struct delayed_work sanitize_dwork;
>  	int sanitize_tmo;
> @@ -642,6 +644,8 @@ static inline void cxl_mem_active_dec(void)
>  }
>  #endif
>  
> +int cxl_mem_sanitize(struct cxl_dev_state *cxlds, u16 cmd);
> +
>  struct cxl_hdm {
>  	struct cxl_component_regs regs;
>  	unsigned int decoder_count;
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index bdee5273af5a..2bc3b595f270 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -113,6 +113,9 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>  	opcode = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
>  
>  	if (opcode == CXL_MBOX_OP_SANITIZE) {
> +		if (cxlds->sec.sanitize_state)
> +			sysfs_notify_dirent(cxlds->sec.sanitize_state);
> +
>  		dev_dbg(cxlds->dev, "Sanitation operation ended\n");
>  	} else {
>  		/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
> @@ -138,6 +141,8 @@ static void cxl_mbox_sanitize_work(struct work_struct *work)
>  	if (cxl_mbox_background_complete(cxlds)) {
>  		cxlds->sec.sanitize_tmo = 0;
>  		put_device(cxlds->dev);
> +		if (cxlds->sec.sanitize_state)
> +			sysfs_notify_dirent(cxlds->sec.sanitize_state);
>  
>  		dev_dbg(cxlds->dev, "Sanitation operation ended\n");
>  	} else {


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/7] cxl/test: Add Sanitize opcode support
  2023-04-21  9:23 ` [PATCH 5/7] cxl/test: Add Sanitize opcode support Davidlohr Bueso
@ 2023-05-11 15:09   ` Jonathan Cameron
  2023-05-11 15:13     ` Davidlohr Bueso
  0 siblings, 1 reply; 38+ messages in thread
From: Jonathan Cameron @ 2023-05-11 15:09 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Fri, 21 Apr 2023 02:23:19 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:

> Add support to emulate the "Sanitize" operation, without
> incurring in the background.
> 
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
>  tools/testing/cxl/test/mem.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 9263b04d35f7..d4466cb27947 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -497,6 +497,28 @@ static int mock_partition_info(struct cxl_dev_state *cxlds,
>  	return 0;
>  }
>  
> +static int mock_sanitize(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> +{
> +	struct cxl_mockmem_data *mdata = dev_get_drvdata(cxlds->dev);
> +
> +	if (cmd->size_in != 0)
> +		return -EINVAL;
> +
> +	if (cmd->size_out != 0)
> +		return -EINVAL;
> +
> +	if (mdata->security_state & CXL_PMEM_SEC_STATE_USER_PASS_SET) {
> +		cmd->return_code = CXL_MBOX_CMD_RC_SECURITY;
> +		return -ENXIO;
> +	}
> +	if (mdata->security_state & CXL_PMEM_SEC_STATE_LOCKED) {
> +		cmd->return_code = CXL_MBOX_CMD_RC_SECURITY;
> +		return -ENXIO;
> +	}
> +
> +	return 0; /* assume less than 2 secs, no bg */

Boring ;)

Otherwise this looks fine to me though note I'm far from an expert on the test modules.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> +}
> +
>  static int mock_get_security_state(struct cxl_dev_state *cxlds,
>  				   struct cxl_mbox_cmd *cmd)
>  {
> @@ -924,6 +946,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
>  	case CXL_MBOX_OP_GET_HEALTH_INFO:
>  		rc = mock_health_info(cxlds, cmd);
>  		break;
> +	case CXL_MBOX_OP_SANITIZE:
> +		rc = mock_sanitize(cxlds, cmd);
> +		break;
>  	case CXL_MBOX_OP_GET_SECURITY_STATE:
>  		rc = mock_get_security_state(cxlds, cmd);
>  		break;


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/7] cxl/mem: Support Secure Erase
  2023-04-21  9:23 ` [PATCH 6/7] cxl/mem: Support Secure Erase Davidlohr Bueso
@ 2023-05-11 15:10   ` Jonathan Cameron
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2023-05-11 15:10 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Fri, 21 Apr 2023 02:23:20 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:

> Implement support for the non-pmem exclusive secure erase, per
> CXL specs. Create a write-only 'security/erase' sysfs file to
> perform the requested operation.
> 
> As with the sanitation this requires the device being offline
> and thus no active HPA-DPA decoding.
> 
> The expectation is that userspace can use it such as:
> 
> 	cxl disable-memdev memX
> 	echo 1 > /sys/bus/cxl/devices/memX/security/erase
> 	cxl enable-memdev memX
> 
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 10 ++++++++
>  drivers/cxl/core/mbox.c                 |  6 ++++-
>  drivers/cxl/core/memdev.c               | 34 +++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h                    |  1 +
>  4 files changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 2e98ec9220ca..af7b603faf77 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -77,6 +77,16 @@ Description:
>  		completion.
>  
>  
> +What            /sys/bus/cxl/devices/memX/security/erase
> +Date:           May, 2023
> +KernelVersion:  v6.5
> +Contact:        linux-cxl@vger.kernel.org
> +Description:
> +		(WO) Write a boolean 'true' string value to this attribute to
> +		secure erase user data by changing the media encryption keys for
> +		all user data areas of the device.
> +
> +
>  What:		/sys/bus/cxl/devices/*/devtype
>  Date:		June, 2021
>  KernelVersion:	v5.14
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 28daf7dcdec4..a2180f3e09eb 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1049,7 +1049,7 @@ int cxl_mem_sanitize(struct cxl_dev_state *cxlds, u16 cmd)
>  	};
>  	struct cxl_mbox_cmd mbox_cmd = { .opcode = cmd };
>  
> -	if (cmd != CXL_MBOX_OP_SANITIZE)
> +	if (cmd != CXL_MBOX_OP_SANITIZE && cmd != CXL_MBOX_OP_SECURE_ERASE)
>  		return -EINVAL;
>  
>  	rc = cxl_internal_send_cmd(cxlds, &sec_cmd);
> @@ -1067,6 +1067,10 @@ int cxl_mem_sanitize(struct cxl_dev_state *cxlds, u16 cmd)
>  	if (sec_out & CXL_PMEM_SEC_STATE_USER_PASS_SET)
>  		return -EINVAL;
>  
> +	if (cmd == CXL_MBOX_OP_SECURE_ERASE &&
> +	    sec_out & CXL_PMEM_SEC_STATE_LOCKED)
> +		return -EINVAL;
> +
>  	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
>  	if (rc < 0) {
>  		dev_err(cxlds->dev, "Failed to sanitize device : %d", rc);
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 70e7158826c9..6406e8e47da2 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -138,6 +138,39 @@ static struct device_attribute dev_attr_security_sanitize =
>  	__ATTR(sanitize, 0644,
>  	       security_sanitize_show, security_sanitize_store);
>  
> +static ssize_t security_erase_store(struct device *dev,
> +				    struct device_attribute *attr,
> +				    const char *buf, size_t len)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	ssize_t rc;
> +	bool erase;
> +
> +	rc = kstrtobool(buf, &erase);
> +	if (rc)
> +		return rc;
> +
> +	if (erase) {

As with earlier patch, I'd flip the logic.

> +		struct cxl_port *port = dev_get_drvdata(&cxlmd->dev);
> +
> +		if (!port || !is_cxl_endpoint(port))
> +			return -EINVAL;
> +		/* ensure no regions are mapped to this memdev */
> +		if (port->commit_end != -1)
> +			return -EBUSY;
> +
> +		rc = cxl_mem_sanitize(cxlds, CXL_MBOX_OP_SECURE_ERASE);

and use a simple error check here.

> +	}
> +
> +	if (rc == 0)
> +		rc = len;
> +	return rc;
> +}
> +
> +static struct device_attribute dev_attr_security_erase =
> +	__ATTR(erase, 0200, NULL, security_erase_store);
> +
>  static ssize_t serial_show(struct device *dev, struct device_attribute *attr,
>  			   char *buf)
>  {
> @@ -199,6 +232,7 @@ static struct attribute_group cxl_memdev_pmem_attribute_group = {
>  
>  static struct attribute *cxl_memdev_security_attributes[] = {
>  	&dev_attr_security_sanitize.attr,
> +	&dev_attr_security_erase.attr,
>  	NULL,
>  };
>  
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 9bd33cfdc0ec..f8b513e70c21 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -345,6 +345,7 @@ enum cxl_opcode {
>  	CXL_MBOX_OP_SCAN_MEDIA		= 0x4304,
>  	CXL_MBOX_OP_GET_SCAN_MEDIA	= 0x4305,
>  	CXL_MBOX_OP_SANITIZE		= 0x4400,
> +	CXL_MBOX_OP_SECURE_ERASE		= 0x4401,
>  	CXL_MBOX_OP_GET_SECURITY_STATE	= 0x4500,
>  	CXL_MBOX_OP_SET_PASSPHRASE	= 0x4501,
>  	CXL_MBOX_OP_DISABLE_PASSPHRASE	= 0x4502,


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 7/7] cxl/test: Add Secure Erase opcode support
  2023-04-21  9:23 ` [PATCH 7/7] cxl/test: Add Secure Erase opcode support Davidlohr Bueso
@ 2023-05-11 15:10   ` Jonathan Cameron
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2023-05-11 15:10 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Fri, 21 Apr 2023 02:23:21 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:

> Add support to emulate the CXL the "Secure Erase" operation.
> 
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
LGTM
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  tools/testing/cxl/test/mem.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index d4466cb27947..8a22a4e592c6 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -519,6 +519,30 @@ static int mock_sanitize(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>  	return 0; /* assume less than 2 secs, no bg */
>  }
>  
> +static int mock_secure_erase(struct cxl_dev_state *cxlds,
> +			     struct cxl_mbox_cmd *cmd)
> +{
> +	struct cxl_mockmem_data *mdata = dev_get_drvdata(cxlds->dev);
> +
> +	if (cmd->size_in != 0)
> +		return -EINVAL;
> +
> +	if (cmd->size_out != 0)
> +		return -EINVAL;
> +
> +	if (mdata->security_state & CXL_PMEM_SEC_STATE_USER_PASS_SET) {
> +		cmd->return_code = CXL_MBOX_CMD_RC_SECURITY;
> +		return -ENXIO;
> +	}
> +
> +	if (mdata->security_state & CXL_PMEM_SEC_STATE_LOCKED) {
> +		cmd->return_code = CXL_MBOX_CMD_RC_SECURITY;
> +		return -ENXIO;
> +	}
> +
> +	return 0;
> +}
> +
>  static int mock_get_security_state(struct cxl_dev_state *cxlds,
>  				   struct cxl_mbox_cmd *cmd)
>  {
> @@ -949,6 +973,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
>  	case CXL_MBOX_OP_SANITIZE:
>  		rc = mock_sanitize(cxlds, cmd);
>  		break;
> +	case CXL_MBOX_OP_SECURE_ERASE:
> +		rc = mock_secure_erase(cxlds, cmd);
> +		break;
>  	case CXL_MBOX_OP_GET_SECURITY_STATE:
>  		rc = mock_get_security_state(cxlds, cmd);
>  		break;


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/7] cxl/test: Add Sanitize opcode support
  2023-05-11 15:09   ` Jonathan Cameron
@ 2023-05-11 15:13     ` Davidlohr Bueso
  0 siblings, 0 replies; 38+ messages in thread
From: Davidlohr Bueso @ 2023-05-11 15:13 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: dan.j.williams, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Thu, 11 May 2023, Jonathan Cameron wrote:

>On Fri, 21 Apr 2023 02:23:19 -0700
>Davidlohr Bueso <dave@stgolabs.net> wrote:
>
>> Add support to emulate the "Sanitize" operation, without
>> incurring in the background.
>>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
>> ---
>>  tools/testing/cxl/test/mem.c | 25 +++++++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
>> index 9263b04d35f7..d4466cb27947 100644
>> --- a/tools/testing/cxl/test/mem.c
>> +++ b/tools/testing/cxl/test/mem.c
>> @@ -497,6 +497,28 @@ static int mock_partition_info(struct cxl_dev_state *cxlds,
>>	return 0;
>>  }
>>
>> +static int mock_sanitize(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>> +{
>> +	struct cxl_mockmem_data *mdata = dev_get_drvdata(cxlds->dev);
>> +
>> +	if (cmd->size_in != 0)
>> +		return -EINVAL;
>> +
>> +	if (cmd->size_out != 0)
>> +		return -EINVAL;
>> +
>> +	if (mdata->security_state & CXL_PMEM_SEC_STATE_USER_PASS_SET) {
>> +		cmd->return_code = CXL_MBOX_CMD_RC_SECURITY;
>> +		return -ENXIO;
>> +	}
>> +	if (mdata->security_state & CXL_PMEM_SEC_STATE_LOCKED) {
>> +		cmd->return_code = CXL_MBOX_CMD_RC_SECURITY;
>> +		return -ENXIO;
>> +	}
>> +
>> +	return 0; /* assume less than 2 secs, no bg */
>
>Boring ;)

Yes, but I made up for it in qemu :)

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] cxl/mbox: Add background cmd handling machinery
  2023-05-11 14:23   ` Jonathan Cameron
@ 2023-05-11 16:04     ` Davidlohr Bueso
  2023-05-12 17:05       ` Jonathan Cameron
  0 siblings, 1 reply; 38+ messages in thread
From: Davidlohr Bueso @ 2023-05-11 16:04 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: dan.j.williams, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Thu, 11 May 2023, Jonathan Cameron wrote:

>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index 044a92d9813e..72731a896f58 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -176,14 +176,21 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>>  /* CXL 2.0 8.2.8.4 Mailbox Registers */
>>  #define CXLDEV_MBOX_CAPS_OFFSET 0x00
>>  #define   CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
>> +#define   CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK GENMASK(10, 7)
>> +#define   CXLDEV_MBOX_CAP_BG_CMD_IRQ BIT(6)
>
>Numeric order of bits probably makes more sense. So move this up one line.

Sure.

>
>>  #define CXLDEV_MBOX_CTRL_OFFSET 0x04
>>  #define   CXLDEV_MBOX_CTRL_DOORBELL BIT(0)
>> +#define   CXLDEV_MBOX_CTRL_BG_CMD_IRQ BIT(2)
>>  #define CXLDEV_MBOX_CMD_OFFSET 0x08
>>  #define   CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)
>>  #define   CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK GENMASK_ULL(36, 16)
>>  #define CXLDEV_MBOX_STATUS_OFFSET 0x10
>> +#define   CXLDEV_MBOX_STATUS_BG_CMD BIT(0)
>
>Hmm. Oddly field is called Background Operation.  Still that is then
>described as a command so I guess this is a reasonable bit of consolidation
>of naming.

Yes, I've found that the spec loosely mixes both terms.

>
>>  #define   CXLDEV_MBOX_STATUS_RET_CODE_MASK GENMASK_ULL(47, 32)
>>  #define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18
>> +#define   CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)
>
>Obviously there are line length disadvantages to includes the status part
>in the field names and there isn't a BG_CMD_COMMAND register thankfully
>but I still find the absence of status a bit inconsistent / confusing.
>initial instinct is this a field called OP_CODE in a register called
>BG_CMD_COMMAND
>
>> +#define   CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK GENMASK_ULL(22, 16)
>> +#define   CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK GENMASK_ULL(47, 32)
>
>It might be nice to to do 'something' with the Vendor Specific extended
>status even if it is just put it in a dev_dbg() for anyone who cares?

I had previously considered something like that, I'll add it in the
next iteration.

>
>>  #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
>>
>>  /*
>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>> index 090acebba4fa..8c3302fc7738 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -108,6 +108,9 @@ static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
>>   *            variable sized output commands, it tells the exact number of bytes
>>   *            written.
>>   * @min_out: (input) internal command output payload size validation
>> + * @poll_count: (input)  Number of timeouts to attempt.
>> + * @poll_interval: (input) Number of ms between mailbox background command
>> + *                 polling intervals timeouts.
>
>name it poll_interval_ms: and the units become obvious everywhere without needing
>comments.  Good for the lazy / forgetful reviewer if nothing else...

Makes sense.

>
>>   * @return_code: (output) Error code returned from hardware.
>>   *
>>   * This is the primary mechanism used to send commands to the hardware.
>> @@ -123,6 +126,8 @@ struct cxl_mbox_cmd {
>>	size_t size_in;
>>	size_t size_out;
>>	size_t min_out;
>> +	int poll_count;
>> +	int poll_interval;
>>	u16 return_code;
>>  };
>>
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index 39b829a29f6c..aa1bb74a52a1 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -51,6 +51,7 @@
>>  static unsigned short mbox_ready_timeout = 60;
>>  module_param(mbox_ready_timeout, ushort, 0644);
>>  MODULE_PARM_DESC(mbox_ready_timeout, "seconds to wait for mailbox ready");
>> +static DECLARE_WAIT_QUEUE_HEAD(mbox_wait);
>
>I see in discussion you are moving to a per device approach so I won't review
>that bit on this version.

Right, fyi the latest vesion is here:

https://lore.kernel.org/linux-cxl/gtvozgdx2ak7tekc3heczk5g7gj3cwuoptez6tjmkecader4lo@7t2em7rclcxn/

...
>> +static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>> +{
>> +	struct cxl_dev_state *cxlds = id;
>> +
>> +	/* spurious or raced with hw? */
>
>If talking about a race, I'd like a comment that gives more info.
>What is the potential hardware race?
>
>I can sort of see polling might have noticed completed command and
>launched another one, all before the interrupt actually got handled.
>If that's what you were thinking then eat the interrupt without the
>scary message.

Yes that's the raced with hw motivation, and the warning below can provide
insightful debug info. That said, for spurious irqs (which the kernel/core
considers this a reality, so yeah we should not be printing any message.

I'll get rid of it.

>
>> +	if (!cxl_mbox_background_complete(cxlds)) {
>> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
>> +
>> +		dev_warn(&pdev->dev,
>> +			 "Mailbox background operation IRQ but incomplete\n");
>> +		goto done;
>> +	}
>> +
>> +	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
>> +	wake_up(&mbox_wait);
>> +done:
>> +	return IRQ_HANDLED;
>> +}
>> +
>>  /**
>>   * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
>>   * @cxlds: The device state to communicate with.
>> @@ -178,7 +206,59 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
>>	mbox_cmd->return_code =
>>		FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
>>
>> -	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
>> +	/*
>> +	 * Handle the background command in a synchronous manner.
>> +	 *
>> +	 * All other mailbox commands will serialize/queue on the mbox_mutex,
>> +	 * which we currently hold. Furthermore this also guarantees that
>> +	 * cxl_mbox_background_complete() checks are safe amongst each other,
>> +	 * in that no new bg operation can occur in between.
>> +	 *
>> +	 * Background operations are timesliced in accordance with the nature
>> +	 * of the command. In the event of timeout, the mailbox state is
>> +	 * indeterminate until the next successful command submission and the
>> +	 * driver can get back in sync with the hardware state.
>> +	 */
>> +	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
>> +		u64 bg_status_reg;
>> +		int i;
>> +
>> +		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
>> +			mbox_cmd->opcode);
>> +
>> +		for (i = 0; i < mbox_cmd->poll_count; i++) {
>> +			int ret = wait_event_interruptible_timeout(
>
>We already have an rc floating around in here.  Having ret as well with more limited
>scope isn't great for readability.  I think you can just use rc here.

I was thinking the same, I can rename it 'wait_ret' (it also ends up being a long in
the latest version).

>
>> +				mbox_wait, cxl_mbox_background_complete(cxlds),
>> +				msecs_to_jiffies(mbox_cmd->poll_interval));
>> +			if (ret > 0)
>> +				break;
>> +
>I'd drop this blank line to keep all the handling of ret in one block of code.
>It looks a bit too separate to me otherwise.
>
>> +			/* interrupted by a signal */
>> +			if (ret < 0)
>> +				return ret;
>> +		}
>> +
>> +		if (!cxl_mbox_background_complete(cxlds)) {
>> +			u64 md_status =
>> +				readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
>> +
>> +			cxl_cmd_err(cxlds->dev, mbox_cmd, md_status,
>> +				    "background timeout");
>
>Why are we interested in the memory device status at this point? A timeout on background
>command isn't really a cxl_cmd_err() in my head at least.

Will revisit.

>
>> +			return -ETIMEDOUT;
>> +		}
>> +
>> +		bg_status_reg = readq(cxlds->regs.mbox +
>> +				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
>> +		mbox_cmd->return_code =
>> +			FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
>> +				  bg_status_reg);
>> +		dev_dbg(dev,
>> +			"Mailbox background operation (0x%04x) completed\n",
>> +			mbox_cmd->opcode);
>
>Here is where I'd like to also log the vendor specific extended status.

ok

>
>> +	}
>> +
>> +	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS &&
>> +	    mbox_cmd->return_code != CXL_MBOX_CMD_RC_BACKGROUND) {
>
>You overrode the original return code with the background command return code. So if you get
>here and it's still RC_BACKGROUND I think something went wrong.

Yes, Ming had previously brought this up and is addressed in the last version.

>
>>		dev_dbg(dev, "Mailbox operation had an error: %s\n",
>>			cxl_mbox_cmd_rc2str(mbox_cmd));
>>		return 0; /* completed but caller must check return_code */
>> @@ -224,6 +304,7 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>>	const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET);
>>	unsigned long timeout;
>>	u64 md_status;
>> +	int rc, irq;
>>
>>	timeout = jiffies + mbox_ready_timeout * HZ;
>>	do {
>> @@ -272,6 +353,27 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>>	dev_dbg(cxlds->dev, "Mailbox payload sized %zu",
>>		cxlds->payload_size);
>>
>> +	if (cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ) {
>> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
>> +
>> +		irq = pci_irq_vector(pdev,
>> +			     FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap));
>> +		if (irq < 0)
>> +			goto mbox_poll;
>> +
>> +		rc = devm_request_irq(cxlds->dev, irq, cxl_pci_mbox_irq,
>> +				      IRQF_SHARED, "mailbox", cxlds);
>> +		if (rc)
>> +			goto mbox_poll;
>
>Hmm. The old argument of whether to carry on when something unexpected happens.

Well yes and no. The reason I am very tolerant upon errors here is that the
background cmd polling will be done regardless of the device's interrupt
capability. So I find it way too harsh to just fail the probe altogether
when effectively no harm is done.

>I'd argue in this case at least and possibly the one above we should fail
>hard as we really want to know if interrupt allocations are failing, not just
>fall back quietly to polling.  I'd rather fail to probe the driver in a fashion
>that lets us figure out what broke.
>
>> +
>> +		writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
>> +		       cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);
>> +
>> +		return 0;
>> +	}
>> +
>> +mbox_poll:
>> +	dev_dbg(cxlds->dev, "Mailbox interrupts are unsupported");
>>	return 0;
>>  }
>>

Thanks for reviewing!

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] cxl/mbox: Add sanitation handling machinery
  2023-05-11 14:45   ` Jonathan Cameron
@ 2023-05-11 16:48     ` Davidlohr Bueso
  2023-05-12 17:02       ` Jonathan Cameron
  0 siblings, 1 reply; 38+ messages in thread
From: Davidlohr Bueso @ 2023-05-11 16:48 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: dan.j.williams, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Thu, 11 May 2023, Jonathan Cameron wrote:

>> +/*
>> + * Sanitation operation polling mode.
>> + */
>> +static void cxl_mbox_sanitize_work(struct work_struct *work)
>> +{
>> +	struct cxl_dev_state *cxlds;
>> +
>> +	cxlds = container_of(work, struct cxl_dev_state,
>> +			     sec.sanitize_dwork.work);
>> +
>> +	WARN_ON(cxlds->sec.sanitize_tmo == -1);
>
>Overly paranoid?

I don't see the harm, but regardless, it's racy - needs to be done
under the mbox_mutex.

>> +
>> +	mutex_lock(&cxlds->mbox_mutex);
>> +	if (cxl_mbox_background_complete(cxlds)) {
>> +		cxlds->sec.sanitize_tmo = 0;
>> +		put_device(cxlds->dev);
>> +
>> +		dev_dbg(cxlds->dev, "Sanitation operation ended\n");
>> +	} else {
>> +		int tmo = cxlds->sec.sanitize_tmo + 10;
>
>Add some units to the naming of variables.

ok

>> +
>> +		cxlds->sec.sanitize_tmo = min(15 * 60, tmo);
>
>Why? That feels like it needs a comment to me.  Not that expensive
>to check this so I'm not sure the ramp up is that logical.

Right, this came from a comment from Dave:
https://lore.kernel.org/linux-cxl/bcbe1db2-cb8e-1889-2888-f4618d749bd4@intel.com/

>
>> +		queue_delayed_work(system_wq,
>> +				   &cxlds->sec.sanitize_dwork, tmo * HZ);
>> +	}
>> +	mutex_unlock(&cxlds->mbox_mutex);
>> +}
>> +
>>  /**
>>   * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
>>   * @cxlds: The device state to communicate with.
>> @@ -173,6 +210,16 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
>>		return -EBUSY;
>>	}
>>
>> +	/*
>> +	 * With sanitize polling, hardware might be done and the poller still
>> +	 * not be in sync. Ensure no new command comes in until so. Keep the
>> +	 * hardware semantics and only allow device health status.
>> +	 */
>> +	if (unlikely(cxlds->sec.sanitize_tmo > 0)) {
>> +		if (mbox_cmd->opcode != CXL_MBOX_OP_GET_HEALTH_INFO)
>
>Doesn't this let the value of mbox_cmd->opcode change to HEALTH_INFO so that
>when we get here again we could carry on without other commands though still not in
>sync (if things are very weird).

I don't quite follow, mbox_cmd is local to each caller. Below I touch on this.

>> +			return -EBUSY;
>> +	}
>> +
>>	cmd_reg = FIELD_PREP(CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK,
>>			     mbox_cmd->opcode);
>>	if (mbox_cmd->size_in) {
>> @@ -223,6 +270,27 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
>>		u64 bg_status_reg;
>>		int i;
>>
>> +		/*
>> +		 * Sanitation is a special case which monopolizes the device
>> +		 * in an uninterruptible state and thus cannot be timesliced.
>> +		 * Return immediately instead and allow userspace to poll(2)
>> +		 * for completion.
>> +		 */
>> +		if (mbox_cmd->opcode == CXL_MBOX_OP_SANITIZE) {
>> +			if (cxlds->sec.sanitize_tmo != -1) {
>
>As below. Have a self explanatory variable called sec.polling or sec.interrupt
>
>> +				/* give first timeout a second */
>> +				cxlds->sec.sanitize_tmo = 1;
>
>If this was named santize_tmo_secs then comment not needed.
>
>> +				/* hold the device throughout */
>> +				get_device(cxlds->dev);
>> +				queue_delayed_work(system_wq,
>> +						   &cxlds->sec.sanitize_dwork,
>> +						   cxlds->sec.sanitize_tmo * HZ);
>> +			}
>> +
>> +			dev_dbg(dev, "Sanitation operation started\n");
>> +			return 0;
>> +		}
>> +
>>		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
>>			mbox_cmd->opcode);
>>
>> @@ -366,6 +434,9 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>>		if (rc)
>>			goto mbox_poll;
>>
>> +		/* flag that irqs are enabled */
>> +		cxlds->sec.sanitize_tmo = -1;
>
>That's confusing.  I'd add a separate structure element for it instead with
>appropriate naming.

Agreed, can be nicer. Another alternative is doing away with it altogether and only
allow sanitation if interrupts are supported/enabled. Considering the potential runtimes,
it's not a crazy ask to the hw to at least give some notification mechanism instead
of having sw trying to stay up to date.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/7] cxl/mem: Wire up Sanitation support
  2023-05-11 15:07   ` Jonathan Cameron
@ 2023-05-11 17:23     ` Davidlohr Bueso
  2023-05-12 17:00       ` Jonathan Cameron
  0 siblings, 1 reply; 38+ messages in thread
From: Davidlohr Bueso @ 2023-05-11 17:23 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: dan.j.williams, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Thu, 11 May 2023, Jonathan Cameron wrote:

>> +What:           /sys/bus/cxl/devices/memX/security/sanitize
>> +Date:           May, 2023
>> +KernelVersion:  v6.5
>> +Contact:        linux-cxl@vger.kernel.org
>> +Description:
>> +		(RW) Write a boolean 'true' string value to this attribute to
>> +		sanitize the device to securely re-purpose or decommission it.
>> +		This is done by ensuring that all user data and meta-data,
>> +		whether it resides in persistent capacity, volatile capacity,
>> +		or the LSA, is made permanently unavailable by whatever means
>> +		is appropriate for the media type. This functionality requires
>> +		the device to be not be actively decoding any HPA ranges.
>> +
>> +		Reading this file shows either "disabled" when not running, or
>> +		"sanitize" during the duration of the sanitize operation. This
>> +		sysfs entry is select/poll capable from userspace to notify upon
>> +		completion.
>
>A sysfs attribute that reads different from what is written is not very intuitive.
>The one file one thing rule suggests to me that you should have a separate
>santize_status or similar.  Or just have this read true when in progress making
>it a self resetting toggle that returns -EBUSY if anyone tries to unset it.

So the plan is to also to have the (cached) pmem security status (read-only):
     /sys/bus/cxl/devices/memX/security/status

sanitize could nicely be incorporated there and just read/poll that file for all
things security. So security/sanitize file goes to being write-only, just like
its secure erase counter part.

>> +
>> +
>>  What:		/sys/bus/cxl/devices/*/devtype
>>  Date:		June, 2021
>>  KernelVersion:	v5.14
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>> index cde7270c6037..28daf7dcdec4 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -1021,6 +1021,62 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
>>  }
>>  EXPORT_SYMBOL_NS_GPL(cxl_dev_state_identify, CXL);
>>
>> +/**
>> + * cxl_mem_sanitize() - Send a sanitation command to the device.
>> + * @cxlds: The device data for the operation
>> + * @cmd: The specific sanitation command opcode
>> + *
>> + * Return: 0 if the command was executed successfully, regardless of
>> + * whether or not the actual security operation is done in the background,
>> + * such as for the Sanitize case.
>> + * Error return values can be the result of the mailbox command, -EINVAL
>> + * when security requirements are not met or invalid contexts, or -EBUSY
>> + * if the device is not offline.
>
>What does offline mean for the device?  Perhaps a tighter definition needed.

I can expand. But overall, with Alison's poison work being picked up, now we
can add a cxl_memdev_active() helper to ensure no regions are mapped to this
memdev.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/7] cxl/mem: Wire up Sanitation support
  2023-05-11 17:23     ` Davidlohr Bueso
@ 2023-05-12 17:00       ` Jonathan Cameron
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2023-05-12 17:00 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

On Thu, 11 May 2023 10:23:31 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:

> On Thu, 11 May 2023, Jonathan Cameron wrote:
> 
> >> +What:           /sys/bus/cxl/devices/memX/security/sanitize
> >> +Date:           May, 2023
> >> +KernelVersion:  v6.5
> >> +Contact:        linux-cxl@vger.kernel.org
> >> +Description:
> >> +		(RW) Write a boolean 'true' string value to this attribute to
> >> +		sanitize the device to securely re-purpose or decommission it.
> >> +		This is done by ensuring that all user data and meta-data,
> >> +		whether it resides in persistent capacity, volatile capacity,
> >> +		or the LSA, is made permanently unavailable by whatever means
> >> +		is appropriate for the media type. This functionality requires
> >> +		the device to be not be actively decoding any HPA ranges.
> >> +
> >> +		Reading this file shows either "disabled" when not running, or
> >> +		"sanitize" during the duration of the sanitize operation. This
> >> +		sysfs entry is select/poll capable from userspace to notify upon
> >> +		completion.  
> >
> >A sysfs attribute that reads different from what is written is not very intuitive.
> >The one file one thing rule suggests to me that you should have a separate
> >santize_status or similar.  Or just have this read true when in progress making
> >it a self resetting toggle that returns -EBUSY if anyone tries to unset it.  
> 
> So the plan is to also to have the (cached) pmem security status (read-only):
>      /sys/bus/cxl/devices/memX/security/status
> 
> sanitize could nicely be incorporated there and just read/poll that file for all
> things security. So security/sanitize file goes to being write-only, just like
> its secure erase counter part.

That works nicely. Good plan.

> 
> >> +
> >> +
> >>  What:		/sys/bus/cxl/devices/*/devtype
> >>  Date:		June, 2021
> >>  KernelVersion:	v5.14
> >> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> >> index cde7270c6037..28daf7dcdec4 100644
> >> --- a/drivers/cxl/core/mbox.c
> >> +++ b/drivers/cxl/core/mbox.c
> >> @@ -1021,6 +1021,62 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
> >>  }
> >>  EXPORT_SYMBOL_NS_GPL(cxl_dev_state_identify, CXL);
> >>
> >> +/**
> >> + * cxl_mem_sanitize() - Send a sanitation command to the device.
> >> + * @cxlds: The device data for the operation
> >> + * @cmd: The specific sanitation command opcode
> >> + *
> >> + * Return: 0 if the command was executed successfully, regardless of
> >> + * whether or not the actual security operation is done in the background,
> >> + * such as for the Sanitize case.
> >> + * Error return values can be the result of the mailbox command, -EINVAL
> >> + * when security requirements are not met or invalid contexts, or -EBUSY
> >> + * if the device is not offline.  
> >
> >What does offline mean for the device?  Perhaps a tighter definition needed.  
> 
> I can expand. But overall, with Alison's poison work being picked up, now we
> can add a cxl_memdev_active() helper to ensure no regions are mapped to this
> memdev.

Ok.
> 
> Thanks,
> Davidlohr


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] cxl/mbox: Add sanitation handling machinery
  2023-05-11 16:48     ` Davidlohr Bueso
@ 2023-05-12 17:02       ` Jonathan Cameron
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2023-05-12 17:02 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl

 
> >  
> >> +		queue_delayed_work(system_wq,
> >> +				   &cxlds->sec.sanitize_dwork, tmo * HZ);
> >> +	}
> >> +	mutex_unlock(&cxlds->mbox_mutex);
> >> +}
> >> +
> >>  /**
> >>   * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
> >>   * @cxlds: The device state to communicate with.
> >> @@ -173,6 +210,16 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
> >>		return -EBUSY;
> >>	}
> >>
> >> +	/*
> >> +	 * With sanitize polling, hardware might be done and the poller still
> >> +	 * not be in sync. Ensure no new command comes in until so. Keep the
> >> +	 * hardware semantics and only allow device health status.
> >> +	 */
> >> +	if (unlikely(cxlds->sec.sanitize_tmo > 0)) {
> >> +		if (mbox_cmd->opcode != CXL_MBOX_OP_GET_HEALTH_INFO)  
> >
> >Doesn't this let the value of mbox_cmd->opcode change to HEALTH_INFO so that
> >when we get here again we could carry on without other commands though still not in
> >sync (if things are very weird).  
> 
> I don't quite follow, mbox_cmd is local to each caller. Below I touch on this.

Indeed. Comment was result of a misread.

> >
> >That's confusing.  I'd add a separate structure element for it instead with
> >appropriate naming.  
> 
> Agreed, can be nicer. Another alternative is doing away with it altogether and only
> allow sanitation if interrupts are supported/enabled. Considering the potential runtimes,
> it's not a crazy ask to the hw to at least give some notification mechanism instead
> of having sw trying to stay up to date.

Whilst I fear someone will build it, we can be mean to them and make them add the support ;)
I don't mind either way.

Jonathan

> 
> Thanks,
> Davidlohr


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] cxl/mbox: Add background cmd handling machinery
  2023-05-11 16:04     ` Davidlohr Bueso
@ 2023-05-12 17:05       ` Jonathan Cameron
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2023-05-12 17:05 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, dave.jiang, alison.schofield, ira.weiny,
	vishal.l.verma, fan.ni, a.manzanares, linux-cxl


> >  
> >>   * @return_code: (output) Error code returned from hardware.
> >>   *
> >>   * This is the primary mechanism used to send commands to the hardware.
> >> @@ -123,6 +126,8 @@ struct cxl_mbox_cmd {
> >>	size_t size_in;
> >>	size_t size_out;
> >>	size_t min_out;
> >> +	int poll_count;
> >> +	int poll_interval;
> >>	u16 return_code;
> >>  };
> >>
> >> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> >> index 39b829a29f6c..aa1bb74a52a1 100644
> >> --- a/drivers/cxl/pci.c
> >> +++ b/drivers/cxl/pci.c
> >> @@ -51,6 +51,7 @@
> >>  static unsigned short mbox_ready_timeout = 60;
> >>  module_param(mbox_ready_timeout, ushort, 0644);
> >>  MODULE_PARM_DESC(mbox_ready_timeout, "seconds to wait for mailbox ready");
> >> +static DECLARE_WAIT_QUEUE_HEAD(mbox_wait);  
> >
> >I see in discussion you are moving to a per device approach so I won't review
> >that bit on this version.  
> 
> Right, fyi the latest vesion is here:
> 

> https://lore.kernel.org/linux-cxl/gtvozgdx2ak7tekc3heczk5g7gj3cwuoptez6tjmkecader4lo@7t2em7rclcxn/
I'll pretend I'll look at that :)
(81 more emails to read on CXL alone.. sigh


> >  
> >>		dev_dbg(dev, "Mailbox operation had an error: %s\n",
> >>			cxl_mbox_cmd_rc2str(mbox_cmd));
> >>		return 0; /* completed but caller must check return_code */
> >> @@ -224,6 +304,7 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
> >>	const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET);
> >>	unsigned long timeout;
> >>	u64 md_status;
> >> +	int rc, irq;
> >>
> >>	timeout = jiffies + mbox_ready_timeout * HZ;
> >>	do {
> >> @@ -272,6 +353,27 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
> >>	dev_dbg(cxlds->dev, "Mailbox payload sized %zu",
> >>		cxlds->payload_size);
> >>
> >> +	if (cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ) {
> >> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> >> +
> >> +		irq = pci_irq_vector(pdev,
> >> +			     FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap));
> >> +		if (irq < 0)
> >> +			goto mbox_poll;
> >> +
> >> +		rc = devm_request_irq(cxlds->dev, irq, cxl_pci_mbox_irq,
> >> +				      IRQF_SHARED, "mailbox", cxlds);
> >> +		if (rc)
> >> +			goto mbox_poll;  
> >
> >Hmm. The old argument of whether to carry on when something unexpected happens.  
> 
> Well yes and no. The reason I am very tolerant upon errors here is that the
> background cmd polling will be done regardless of the device's interrupt
> capability. So I find it way too harsh to just fail the probe altogether
> when effectively no harm is done.

We'll never teach people to do things right if their broken config works anyway! :)


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 4/7] cxl/mem: Wire up Sanitation support
  2023-06-12 18:10 [PATCH v6 0/7] cxl: Support " Davidlohr Bueso
@ 2023-06-12 18:10 ` Davidlohr Bueso
  2023-06-25 22:34   ` Dan Williams
  0 siblings, 1 reply; 38+ messages in thread
From: Davidlohr Bueso @ 2023-06-12 18:10 UTC (permalink / raw)
  To: dan.j.williams
  Cc: dave.jiang, vishal.l.verma, Jonathan.Cameron, fan.ni,
	a.manzanares, dave, linux-cxl

Implement support for CXL 3.0 8.2.9.8.5.1 Sanitize. This is done by
adding a security/sanitize' memdev sysfs file to trigger the operation
and extend the status file to make it poll(2)-capable for completion.
Unlike all other background commands, this is the only operation that
is special and monopolizes the device for long periods of time.

In addition to the traditional pmem security requirements, all regions
must also be offline in order to perform the operation. This permits
avoiding explicit global CPU cache management, relying instead on
attach_target() setting CXL_REGION_F_INCOHERENT upon reconnect.

The expectation is that userspace can use it such as:

    cxl disable-memdev memX
    echo 1 > /sys/bus/cxl/devices/memX/security/sanitize
    cxl wait-sanitize memX
    cxl enable-memdev memX

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 Documentation/ABI/testing/sysfs-bus-cxl | 21 +++++++-
 drivers/cxl/core/mbox.c                 | 55 ++++++++++++++++++++
 drivers/cxl/core/memdev.c               | 67 +++++++++++++++++++++++++
 drivers/cxl/cxlmem.h                    |  4 ++
 drivers/cxl/pci.c                       |  6 +++
 5 files changed, 151 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 721a44d8a482..5753cba98692 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -64,8 +64,25 @@ KernelVersion:	v6.5
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RO) Reading this file will display the CXL security state for
-		that device. Such states can be: 'disabled', or those available
-		only for persistent memory: 'locked', 'unlocked' or 'frozen'.
+		that device. Such states can be: 'disabled', 'sanitize', when
+		a sanitation is currently underway; or those available only
+		for persistent memory: 'locked', 'unlocked' or 'frozen'. This
+		sysfs entry is select/poll capable from userspace to notify
+		upon completion of a sanitize operation.
+
+
+What:           /sys/bus/cxl/devices/memX/security/sanitize
+Date:           June, 2023
+KernelVersion:  v6.5
+Contact:        linux-cxl@vger.kernel.org
+Description:
+		(WO) Write a boolean 'true' string value to this attribute to
+		sanitize the device to securely re-purpose or decommission it.
+		This is done by ensuring that all user data and meta-data,
+		whether it resides in persistent capacity, volatile capacity,
+		or the LSA, is made permanently unavailable by whatever means
+		is appropriate for the media type. This functionality requires
+		the device to be not be actively decoding any HPA ranges.
 
 
 What:		/sys/bus/cxl/devices/*/devtype
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 5993261e3e08..51c64829f20a 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1075,6 +1075,61 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_dev_state_identify, CXL);
 
+/**
+ * cxl_mem_sanitize() - Send a sanitation command to the device.
+ * @cxlds: The device data for the operation
+ * @cmd: The specific sanitation command opcode
+ *
+ * Return: 0 if the command was executed successfully, regardless of
+ * whether or not the actual security operation is done in the background,
+ * such as for the Sanitize case.
+ * Error return values can be the result of the mailbox command, -EINVAL
+ * when security requirements are not met or invalid contexts.
+ *
+ * See CXL 3.0 @8.2.9.8.5.1 Sanitize and @8.2.9.8.5.2 Secure Erase.
+ */
+int cxl_mem_sanitize(struct cxl_dev_state *cxlds, u16 cmd)
+{
+	int rc;
+	u32 sec_out = 0;
+	struct cxl_get_security_output {
+		__le32 flags;
+	} out;
+	struct cxl_mbox_cmd sec_cmd = {
+		.opcode = CXL_MBOX_OP_GET_SECURITY_STATE,
+		.payload_out = &out,
+		.size_out = sizeof(out),
+	};
+	struct cxl_mbox_cmd mbox_cmd = { .opcode = cmd };
+
+	if (cmd != CXL_MBOX_OP_SANITIZE)
+		return -EINVAL;
+
+	rc = cxl_internal_send_cmd(cxlds, &sec_cmd);
+	if (rc < 0) {
+		dev_err(cxlds->dev, "Failed to get security state : %d", rc);
+		return rc;
+	}
+
+	/*
+	 * Prior to using these commands, any security applied to
+	 * the user data areas of the device shall be DISABLED (or
+	 * UNLOCKED for secure erase case).
+	 */
+	sec_out = le32_to_cpu(out.flags);
+	if (sec_out & CXL_PMEM_SEC_STATE_USER_PASS_SET)
+		return -EINVAL;
+
+	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
+	if (rc < 0) {
+		dev_err(cxlds->dev, "Failed to sanitize device : %d", rc);
+		return rc;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_sanitize, CXL);
+
 static int add_dpa_res(struct device *dev, struct resource *parent,
 		       struct resource *res, resource_size_t start,
 		       resource_size_t size, const char *type)
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 834f418b6bcb..bdd1edfd62e8 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright(c) 2020 Intel Corporation. */
 
+#include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/device.h>
 #include <linux/slab.h>
 #include <linux/idr.h>
@@ -114,6 +115,12 @@ static ssize_t security_state_show(struct device *dev,
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
 	unsigned long state = cxlds->security.state;
+	u64 reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
+	u32 pct = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg);
+	u16 cmd = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
+
+	if (cmd == CXL_MBOX_OP_SANITIZE && pct != 100)
+		return sysfs_emit(buf, "sanitize\n");
 
 	if (!(state & CXL_PMEM_SEC_STATE_USER_PASS_SET))
 		return sysfs_emit(buf, "disabled\n");
@@ -129,6 +136,33 @@ static ssize_t security_state_show(struct device *dev,
 static struct device_attribute dev_attr_security_state =
 	__ATTR(state, 0444, security_state_show, NULL);
 
+static ssize_t security_sanitize_store(struct device *dev,
+				       struct device_attribute *attr,
+				       const char *buf, size_t len)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_port *port = dev_get_drvdata(&cxlmd->dev);
+	ssize_t rc;
+	bool sanitize;
+
+	if (kstrtobool(buf, &sanitize) || !sanitize)
+		return -EINVAL;
+
+	if (!port || !is_cxl_endpoint(port))
+		return -EINVAL;
+
+	/* ensure no regions are mapped to this memdev */
+	if (port->commit_end != -1)
+		return -EBUSY;
+
+	rc = cxl_mem_sanitize(cxlds, CXL_MBOX_OP_SANITIZE);
+
+	return rc ? rc : len;
+}
+static struct device_attribute dev_attr_security_sanitize =
+	__ATTR(sanitize, 0200, NULL, security_sanitize_store);
+
 static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
 {
 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
@@ -376,6 +410,7 @@ static struct attribute *cxl_memdev_ram_attributes[] = {
 
 static struct attribute *cxl_memdev_security_attributes[] = {
 	&dev_attr_security_state.attr,
+	&dev_attr_security_sanitize.attr,
 	NULL,
 };
 
@@ -594,6 +629,34 @@ static const struct file_operations cxl_memdev_fops = {
 	.llseek = noop_llseek,
 };
 
+static void put_sanitize(void *data)
+{
+	struct cxl_dev_state *cxlds = data;
+
+	sysfs_put(cxlds->security.sanitize_node);
+}
+
+static int cxl_memdev_security_init(struct cxl_memdev *cxlmd)
+{
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct device *dev = &cxlmd->dev;
+	struct kernfs_node *sec;
+
+	sec = sysfs_get_dirent(dev->kobj.sd, "security");
+	if (!sec) {
+		dev_err(dev, "sysfs_get_dirent 'security' failed\n");
+		return -ENODEV;
+	}
+	cxlds->security.sanitize_node = sysfs_get_dirent(sec, "state");
+	sysfs_put(sec);
+	if (!cxlds->security.sanitize_node) {
+		dev_err(dev, "sysfs_get_dirent 'state' failed\n");
+		return -ENODEV;
+	}
+
+	return devm_add_action_or_reset(cxlds->dev, put_sanitize, cxlds);
+ }
+
 struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
 {
 	struct cxl_memdev *cxlmd;
@@ -622,6 +685,10 @@ struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
 	if (rc)
 		goto err;
 
+	rc = cxl_memdev_security_init(cxlmd);
+	if (rc)
+		goto err;
+
 	rc = devm_add_action_or_reset(cxlds->dev, cxl_memdev_unregister, cxlmd);
 	if (rc)
 		return ERR_PTR(rc);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 3a9df1044144..177a76578a94 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -267,12 +267,14 @@ struct cxl_poison_state {
  * @poll: polling for sanitation is enabled, device has no mbox irq support
  * @poll_tmo_secs: polling timeout
  * @poll_dwork: polling work item
+ * @sanitize_node: sanitation sysfs file to notify
  */
 struct cxl_security_state {
 	unsigned long state;
 	bool poll;
 	int poll_tmo_secs;
 	struct delayed_work poll_dwork;
+	struct kernfs_node *sanitize_node;
 };
 
 /**
@@ -746,6 +748,8 @@ static inline void cxl_mem_active_dec(void)
 }
 #endif
 
+int cxl_mem_sanitize(struct cxl_dev_state *cxlds, u16 cmd);
+
 struct cxl_hdm {
 	struct cxl_component_regs regs;
 	unsigned int decoder_count;
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index c92eab55a5a7..d1df23c19245 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -126,6 +126,9 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
 	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
 	opcode = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
 	if (opcode == CXL_MBOX_OP_SANITIZE) {
+		if (cxlds->security.sanitize_node)
+			sysfs_notify_dirent(cxlds->security.sanitize_node);
+
 		dev_dbg(cxlds->dev, "Sanitation operation ended\n");
 	} else {
 		/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
@@ -150,6 +153,9 @@ static void cxl_mbox_sanitize_work(struct work_struct *work)
 		cxlds->security.poll_tmo_secs = 0;
 		put_device(cxlds->dev);
 
+		if (cxlds->security.sanitize_node)
+			sysfs_notify_dirent(cxlds->security.sanitize_node);
+
 		dev_dbg(cxlds->dev, "Sanitation operation ended\n");
 	} else {
 		int timeout = cxlds->security.poll_tmo_secs + 10;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* RE: [PATCH 4/7] cxl/mem: Wire up Sanitation support
  2023-06-12 18:10 ` [PATCH 4/7] cxl/mem: Wire up Sanitation support Davidlohr Bueso
@ 2023-06-25 22:34   ` Dan Williams
  0 siblings, 0 replies; 38+ messages in thread
From: Dan Williams @ 2023-06-25 22:34 UTC (permalink / raw)
  To: Davidlohr Bueso, dan.j.williams
  Cc: dave.jiang, vishal.l.verma, Jonathan.Cameron, fan.ni,
	a.manzanares, dave, linux-cxl

Davidlohr Bueso wrote:
> Implement support for CXL 3.0 8.2.9.8.5.1 Sanitize. This is done by
> adding a security/sanitize' memdev sysfs file to trigger the operation
> and extend the status file to make it poll(2)-capable for completion.
> Unlike all other background commands, this is the only operation that
> is special and monopolizes the device for long periods of time.
> 
> In addition to the traditional pmem security requirements, all regions
> must also be offline in order to perform the operation. This permits
> avoiding explicit global CPU cache management, relying instead on
> attach_target() setting CXL_REGION_F_INCOHERENT upon reconnect.

CXL_REGION_F_INCOHERENT is going away, but the sentiment still holds. I
will update this to:

"This permits avoiding explicit global CPU cache management, relying
instead on the implict cache management when a region transitions
between CXL_CONFIG_ACTIVE and CXL_CONFIG_COMMIT."

> 
> The expectation is that userspace can use it such as:
> 
>     cxl disable-memdev memX
>     echo 1 > /sys/bus/cxl/devices/memX/security/sanitize

I assume this will become 'cxl sanitize-memdev' and handle all the busy
reporting etc for the user?

>     cxl wait-sanitize memX
>     cxl enable-memdev memX
> 
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 21 +++++++-
>  drivers/cxl/core/mbox.c                 | 55 ++++++++++++++++++++
>  drivers/cxl/core/memdev.c               | 67 +++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h                    |  4 ++
>  drivers/cxl/pci.c                       |  6 +++
>  5 files changed, 151 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 721a44d8a482..5753cba98692 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -64,8 +64,25 @@ KernelVersion:	v6.5
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RO) Reading this file will display the CXL security state for
> -		that device. Such states can be: 'disabled', or those available
> -		only for persistent memory: 'locked', 'unlocked' or 'frozen'.
> +		that device. Such states can be: 'disabled', 'sanitize', when
> +		a sanitation is currently underway; or those available only
> +		for persistent memory: 'locked', 'unlocked' or 'frozen'. This
> +		sysfs entry is select/poll capable from userspace to notify
> +		upon completion of a sanitize operation.
> +
> +
> +What:           /sys/bus/cxl/devices/memX/security/sanitize
> +Date:           June, 2023
> +KernelVersion:  v6.5
> +Contact:        linux-cxl@vger.kernel.org
> +Description:
> +		(WO) Write a boolean 'true' string value to this attribute to
> +		sanitize the device to securely re-purpose or decommission it.
> +		This is done by ensuring that all user data and meta-data,
> +		whether it resides in persistent capacity, volatile capacity,
> +		or the LSA, is made permanently unavailable by whatever means
> +		is appropriate for the media type. This functionality requires
> +		the device to be not be actively decoding any HPA ranges.

I notice this attribute is unconditionally available. It would be nice
to hide it on devices that do not support the optional sanitize command.

This is a minor fixup that just needs to be in place before v6.5-final.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2023-06-25 22:34 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-21  9:23 [PATCH v4 0/7] cxl: Background cmds and device sanitation Davidlohr Bueso
2023-04-21  9:23 ` [PATCH 1/7] cxl/pci: Allocate irq vectors earlier in pci probe Davidlohr Bueso
2023-04-28 16:09   ` Dave Jiang
2023-05-11 13:55   ` Jonathan Cameron
2023-04-21  9:23 ` [PATCH 2/7] cxl/mbox: Add background cmd handling machinery Davidlohr Bueso
2023-04-23  7:54   ` Li, Ming
2023-04-23 20:51     ` Davidlohr Bueso
2023-04-28 16:21   ` Dave Jiang
2023-04-28 17:18     ` Davidlohr Bueso
2023-04-28 21:04       ` Dave Jiang
2023-04-28 22:03         ` Davidlohr Bueso
2023-05-01 15:56           ` Davidlohr Bueso
2023-05-11 14:23   ` Jonathan Cameron
2023-05-11 16:04     ` Davidlohr Bueso
2023-05-12 17:05       ` Jonathan Cameron
2023-04-21  9:23 ` [PATCH 3/7] cxl/mbox: Add sanitation " Davidlohr Bueso
2023-04-28 16:43   ` Dave Jiang
2023-04-28 16:46     ` Davidlohr Bueso
2023-04-28 17:37       ` Dave Jiang
2023-05-11 14:45   ` Jonathan Cameron
2023-05-11 16:48     ` Davidlohr Bueso
2023-05-12 17:02       ` Jonathan Cameron
2023-04-21  9:23 ` [PATCH 4/7] cxl/mem: Wire up Sanitation support Davidlohr Bueso
2023-04-21 20:04   ` kernel test robot
2023-04-21 20:24   ` kernel test robot
2023-05-11 15:07   ` Jonathan Cameron
2023-05-11 17:23     ` Davidlohr Bueso
2023-05-12 17:00       ` Jonathan Cameron
2023-04-21  9:23 ` [PATCH 5/7] cxl/test: Add Sanitize opcode support Davidlohr Bueso
2023-05-11 15:09   ` Jonathan Cameron
2023-05-11 15:13     ` Davidlohr Bueso
2023-04-21  9:23 ` [PATCH 6/7] cxl/mem: Support Secure Erase Davidlohr Bueso
2023-05-11 15:10   ` Jonathan Cameron
2023-04-21  9:23 ` [PATCH 7/7] cxl/test: Add Secure Erase opcode support Davidlohr Bueso
2023-05-11 15:10   ` Jonathan Cameron
2023-04-23  2:05 ` [PATCH v4 0/7] cxl: Background cmds and device sanitation Davidlohr Bueso
  -- strict thread matches above, loose matches on Subject: below --
2023-06-12 18:10 [PATCH v6 0/7] cxl: Support " Davidlohr Bueso
2023-06-12 18:10 ` [PATCH 4/7] cxl/mem: Wire up Sanitation support Davidlohr Bueso
2023-06-25 22:34   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox