[PATCH 0/3] cxl/mbox: support background operation abort requests

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] cxl/mbox: support background operation abort requests
@ 2024-10-22  3:18 Davidlohr Bueso
  2024-10-22  3:18 ` [PATCH 1/3] cxl/pci: lockless background synchronous polling Davidlohr Bueso
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Davidlohr Bueso @ 2024-10-22  3:18 UTC (permalink / raw)
  To: dan.j.williams, dave.jiang
  Cc: jonathan.cameron, alison.schofield, vishal.l.verma, ira.weiny,
	fan.ni, a.manzanares, sthanneeru.opensrc, emirakhur, ajayjoshi,
	Ravis.OpenSrc, sthanneeru, dave, linux-cxl, linux-kernel

Hello,

This series implements the abortable capabilities for mailbox background
operations, introduced in CXL 3.1.

Details in each patch, but patch 1 moves non-sanitize background command
polling out of the mbox_mutex, such that patch 2 can add the machinery
to request aborting the on going command. Patch 3 simply renames a call.

While an overal policy could include command-priorities (which could
be trivially added); the current policy is that, if there is an on going
background operation, a new incoming bg-capable command will blindly
request it to be cancelled, if capable.

Considering the interest of this for Logs[0], perhaps Micron folks could
this? Does this work along the lines of what that discussion concluded?

Applies against Linus' latest tree.

[0]: https://lore.kernel.org/all/20240313071218.729-1-sthanneeru.opensrc@micron.com/

Testing:

lockdep enabled kernel + the qemu counterpart patches:
	https://lore.kernel.org/linux-cxl/20240813221255.179200-1-dave@stgolabs.net/

#> echo 1 > /sys/bus/cxl/devices/mem1/security/sanitize && sleep 3 && echo 1 > /sys/bus/cxl/devices/mem1/security/sanitize
[  185.928276] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0d:00.0: Sending command: 0x4400cxl/devices/mem1/security/sanitize
[  185.930024] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0d:00.0: Doorbell wait took 0ms
[  185.931608] cxl_pci:__cxl_pci_mbox_send_cmd:306: cxl_pci 0000:0d:00.0: Sanitization operation started
[  188.936583] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0d:00.0: Sending command: 0x0005
[  188.943956] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0d:00.0: Doorbell wait took 0ms
[  188.951786] cxl_pci:cxl_try_to_cancel_background:376: cxl_pci 0000:0d:00.0: Sanitization operation aborted
[  188.957762] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0d:00.0: Sending command: 0x4400
[  188.959886] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0d:00.0: Doorbell wait took 0ms
[  188.962325] cxl_pci:__cxl_pci_mbox_send_cmd:306: cxl_pci 0000:0d:00.0: Sanitization operation started
[  197.034644] cxl_pci:cxl_mbox_sanitize_work:164: cxl_pci 0000:0d:00.0: Sanitization operation ended

#> cxl update-firmware -F img.fw mem1 && sleep 3 && echo 1 > /sys/bus/cxl/devices/mem1/security/sanitize
[   14.443484] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0e:00.0: Sending command: 0x0200
[   14.445884] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0e:00.0: Doorbell wait took 0ms
[   14.448744] cxl_core:cxl_query_cmd:539: cxl_mem mem1: Query IOCTL
[   14.450458] cxl_core:cxl_query_cmd:539: cxl_mem mem1: Query IOCTL
[   14.452307] cxl_core:cxl_send_cmd:644: cxl_mem mem1: Send IOCTL
[   14.453686] cxl_core:handle_mailbox_cmd_from_user:602: cxl_pci 0000:0e:00.0: Submitting Get FW Info command for user
[   14.453686] 	opcode: 200
[   14.453686] 	size: 0
[   14.460453] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0e:00.0: Sending command: 0x0201
[   14.462313] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0e:00.0: Doorbell wait took 0ms
[   14.464166] cxl_pci:__cxl_pci_mbox_send_cmd:310: cxl_pci 0000:0e:00.0: Mailbox background operation (0x0201) started
[   14.466536] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0e:00.0: Sending command: 0x0200
[   14.468380] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0e:00.0: Doorbell wait took 0ms
{
  "memdev":"mem1",
  "pmem_size":"1024.00 MiB (1073.74 MB)",
  "serial":"0",
  "host":"0000:0e:00.0",
  "firmware":{
    "num_slots":2,
    "active_slot":1,
    "online_activate_capable":true,
    "slot_1_version":"BWFW VERSION 0",
    "fw_update_in_progress":true,
    "remaining_size":"48.83 MiB (51.20 MB)"
  }
}
cxl memdev: cmd_update_fw: firmware update started on 1 mem device
[   17.484680] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0e:00.0: Sending command: 0x0005
[   17.486993] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0e:00.0: Doorbell wait took 0ms
[   17.489510] cxl_pci:cxl_pci_mbox_send:451: cxl_pci 0000:0e:00.0: Mailbox background operation (0x0201) aborted
[   17.492476] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0e:00.0: Sending command: 0x4400
[   17.494937] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0e:00.0: Doorbell wait took 0ms
[   17.497598] cxl_pci:__cxl_pci_mbox_send_cmd:306: cxl_pci 0000:0e:00.0: Sanitization operation started
[   25.682631] cxl_pci:cxl_mbox_sanitize_work:164: cxl_pci 0000:0e:00.0: Sanitization operation ended

Thanks!

Davidlohr Bueso (3):
  cxl/pci: lockless background synchronous polling
  cxl/mbox: support aborting the current background operation
  cxl/pci: rename cxl_mbox_background_complete()

 drivers/cxl/core/mbox.c |   1 +
 drivers/cxl/cxlmem.h    |  14 +++
 drivers/cxl/pci.c       | 189 ++++++++++++++++++++++++++++------------
 include/cxl/mailbox.h   |   2 +
 4 files changed, 152 insertions(+), 54 deletions(-)

--
2.46.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] cxl/pci: lockless background synchronous polling
  2024-10-22  3:18 [PATCH 0/3] cxl/mbox: support background operation abort requests Davidlohr Bueso
@ 2024-10-22  3:18 ` Davidlohr Bueso
  2025-01-14 16:12   ` Dave Jiang
  2024-10-22  3:18 ` [PATCH 2/3] cxl/mbox: support aborting the current background operation Davidlohr Bueso
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Davidlohr Bueso @ 2024-10-22  3:18 UTC (permalink / raw)
  To: dan.j.williams, dave.jiang
  Cc: jonathan.cameron, alison.schofield, vishal.l.verma, ira.weiny,
	fan.ni, a.manzanares, sthanneeru.opensrc, emirakhur, ajayjoshi,
	Ravis.OpenSrc, sthanneeru, dave, linux-cxl, linux-kernel

For non-sanitize background commands we rely on holding the mbox_mutex
throughout the duration of the operation. This causes other incoming
commands to queue up behind, and interleaving executions while the
background command is timesliced by the user.

However, in order to support the mbox request cancel background
operation command, the lock will need to be available to actually
perform the request. Semantically this allows other commands to
many times be at the mercy of hardware returning the respective error.
Potentially users would be exposed to changes in the form of errors
instead of commands taking longer to run - but this is not forseen
as a problem.

In order to not loose sync with the hardware, introduce a mailbox
atomic that blocks any other incoming bg operations while the driver
is still polling (synchronously) on the current one.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 drivers/cxl/core/mbox.c |   1 +
 drivers/cxl/cxlmem.h    |  13 +++++
 drivers/cxl/pci.c       | 114 +++++++++++++++++++++++-----------------
 include/cxl/mailbox.h   |   2 +
 4 files changed, 82 insertions(+), 48 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 5175138c4fb7..9af3cd16d23d 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1429,6 +1429,7 @@ int cxl_mailbox_init(struct cxl_mailbox *cxl_mbox, struct device *host)
 
 	cxl_mbox->host = host;
 	mutex_init(&cxl_mbox->mbox_mutex);
+	atomic_set(&cxl_mbox->poll_bgop, 0);
 	rcuwait_init(&cxl_mbox->mbox_wait);
 
 	return 0;
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 2a25d1957ddb..b933fb73ef8a 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -557,6 +557,19 @@ enum cxl_opcode {
 	CXL_MBOX_OP_MAX			= 0x10000
 };
 
+static inline bool cxl_is_background_cmd(u16 opcode)
+{
+	switch (opcode) {
+	case CXL_MBOX_OP_TRANSFER_FW:
+	case CXL_MBOX_OP_ACTIVATE_FW:
+	case CXL_MBOX_OP_SCAN_MEDIA:
+	case CXL_MBOX_OP_SANITIZE:
+		return true;
+	default:
+		return false;
+	}
+}
+
 #define DEFINE_CXL_CEL_UUID                                                    \
 	UUID_INIT(0xda9c0b5, 0xbf41, 0x4b78, 0x8f, 0x79, 0x96, 0xb1, 0x62,     \
 		  0x3b, 0x3f, 0x17)
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 188412d45e0d..f2378604669b 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -278,29 +278,15 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_mailbox *cxl_mbox,
 	mbox_cmd->return_code =
 		FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
 
-	/*
-	 * Handle the background command in a synchronous manner.
-	 *
-	 * All other mailbox commands will serialize/queue on the mbox_mutex,
-	 * which we currently hold. Furthermore this also guarantees that
-	 * cxl_mbox_background_complete() checks are safe amongst each other,
-	 * in that no new bg operation can occur in between.
-	 *
-	 * Background operations are timesliced in accordance with the nature
-	 * of the command. In the event of timeout, the mailbox state is
-	 * indeterminate until the next successful command submission and the
-	 * driver can get back in sync with the hardware state.
-	 */
 	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
-		u64 bg_status_reg;
-		int i, timeout;
-
 		/*
 		 * Sanitization is a special case which monopolizes the device
 		 * and cannot be timesliced. Handle asynchronously instead,
 		 * and allow userspace to poll(2) for completion.
 		 */
 		if (mbox_cmd->opcode == CXL_MBOX_OP_SANITIZE) {
+			int timeout;
+
 			if (mds->security.sanitize_active)
 				return -EBUSY;
 
@@ -311,44 +297,19 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_mailbox *cxl_mbox,
 			schedule_delayed_work(&mds->security.poll_dwork,
 					      timeout * HZ);
 			dev_dbg(dev, "Sanitization operation started\n");
-			goto success;
-		}
-
-		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
-			mbox_cmd->opcode);
-
-		timeout = mbox_cmd->poll_interval_ms;
-		for (i = 0; i < mbox_cmd->poll_count; i++) {
-			if (rcuwait_wait_event_timeout(&cxl_mbox->mbox_wait,
-						       cxl_mbox_background_complete(cxlds),
-						       TASK_UNINTERRUPTIBLE,
-						       msecs_to_jiffies(timeout)) > 0)
-				break;
-		}
-
-		if (!cxl_mbox_background_complete(cxlds)) {
-			dev_err(dev, "timeout waiting for background (%d ms)\n",
-				timeout * mbox_cmd->poll_count);
-			return -ETIMEDOUT;
+		} else {
+			/* pairs with release/acquire semantics */
+			WARN_ON_ONCE(atomic_xchg(&cxl_mbox->poll_bgop,
+						 mbox_cmd->opcode));
+			dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
+				mbox_cmd->opcode);
 		}
-
-		bg_status_reg = readq(cxlds->regs.mbox +
-				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
-		mbox_cmd->return_code =
-			FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
-				  bg_status_reg);
-		dev_dbg(dev,
-			"Mailbox background operation (0x%04x) completed\n",
-			mbox_cmd->opcode);
-	}
-
-	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
+	} else if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
 		dev_dbg(dev, "Mailbox operation had an error: %s\n",
 			cxl_mbox_cmd_rc2str(mbox_cmd));
 		return 0; /* completed but caller must check return_code */
 	}
 
-success:
 	/* #7 */
 	cmd_reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_CMD_OFFSET);
 	out_len = FIELD_GET(CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK, cmd_reg);
@@ -378,11 +339,68 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
 			     struct cxl_mbox_cmd *cmd)
 {
 	int rc;
+	struct cxl_dev_state *cxlds = mbox_to_cxlds(cxl_mbox);
+	struct device *dev = cxlds->dev;
 
 	mutex_lock_io(&cxl_mbox->mbox_mutex);
+	/*
+	 * Ensure cxl_mbox_background_complete() checks are safe amongst
+	 * each other: no new bg operation can occur in between while polling.
+	 */
+	if (cxl_is_background_cmd(cmd->opcode)) {
+		if (atomic_read_acquire(&cxl_mbox->poll_bgop)) {
+			mutex_unlock(&cxl_mbox->mbox_mutex);
+			return -EBUSY;
+		}
+	}
+
 	rc = __cxl_pci_mbox_send_cmd(cxl_mbox, cmd);
 	mutex_unlock(&cxl_mbox->mbox_mutex);
 
+	if (cmd->return_code != CXL_MBOX_CMD_RC_BACKGROUND)
+		return rc;
+
+	/*
+	 * Handle the background command in a synchronous manner. Background
+	 * operations are timesliced in accordance with the nature of the
+	 * command.
+	 */
+	if (cmd->opcode != CXL_MBOX_OP_SANITIZE) {
+		int i, timeout;
+		u64 bg_status_reg;
+
+		timeout = cmd->poll_interval_ms;
+		for (i = 0; i < cmd->poll_count; i++) {
+			if (rcuwait_wait_event_timeout(&cxl_mbox->mbox_wait,
+				       cxl_mbox_background_complete(cxlds),
+				       TASK_UNINTERRUPTIBLE,
+				       msecs_to_jiffies(timeout)) > 0)
+				break;
+		}
+
+		/*
+		 * In the event of timeout, the mailbox state is indeterminate
+		 * until the next successful command submission and the driver
+		 * can get back in sync with the hardware state.
+		 */
+		if (!cxl_mbox_background_complete(cxlds)) {
+			dev_err(dev, "timeout waiting for background (%d ms)\n",
+				timeout * cmd->poll_count);
+			rc = -ETIMEDOUT;
+			goto done;
+		}
+
+		bg_status_reg = readq(cxlds->regs.mbox +
+				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
+		cmd->return_code = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
+					     bg_status_reg);
+		dev_dbg(dev,
+			"Mailbox background operation (0x%04x) completed\n",
+			cmd->opcode);
+done:
+		atomic_set_release(&cxl_mbox->poll_bgop, 0);
+	}
+
 	return rc;
 }
 
diff --git a/include/cxl/mailbox.h b/include/cxl/mailbox.h
index bacd111e75f1..23282a3c44f1 100644
--- a/include/cxl/mailbox.h
+++ b/include/cxl/mailbox.h
@@ -13,6 +13,7 @@ struct cxl_mbox_cmd;
  *                (CXL 3.1 8.2.8.4.3 Mailbox Capabilities Register)
  * @mbox_mutex: mutex protects device mailbox and firmware
  * @mbox_wait: rcuwait for mailbox
+ * @poll_bgop: current background operation being polled on
  * @mbox_send: @dev specific transport for transmitting mailbox commands
  */
 struct cxl_mailbox {
@@ -20,6 +21,7 @@ struct cxl_mailbox {
 	size_t payload_size;
 	struct mutex mbox_mutex; /* lock to protect mailbox context */
 	struct rcuwait mbox_wait;
+	atomic_t poll_bgop;
 	int (*mbox_send)(struct cxl_mailbox *cxl_mbox, struct cxl_mbox_cmd *cmd);
 };
 
-- 
2.46.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] cxl/mbox: support aborting the current background operation
  2024-10-22  3:18 [PATCH 0/3] cxl/mbox: support background operation abort requests Davidlohr Bueso
  2024-10-22  3:18 ` [PATCH 1/3] cxl/pci: lockless background synchronous polling Davidlohr Bueso
@ 2024-10-22  3:18 ` Davidlohr Bueso
  2025-01-14 17:00   ` Dave Jiang
  2024-10-22  3:18 ` [PATCH 3/3] cxl/pci: rename cxl_mbox_background_complete() Davidlohr Bueso
  2024-10-23  5:32 ` [PATCH 0/3] cxl/mbox: support background operation abort requests Ravis OpenSrc
  3 siblings, 1 reply; 16+ messages in thread
From: Davidlohr Bueso @ 2024-10-22  3:18 UTC (permalink / raw)
  To: dan.j.williams, dave.jiang
  Cc: jonathan.cameron, alison.schofield, vishal.l.verma, ira.weiny,
	fan.ni, a.manzanares, sthanneeru.opensrc, emirakhur, ajayjoshi,
	Ravis.OpenSrc, sthanneeru, dave, linux-cxl, linux-kernel

CXL 3.1 introduced the ability to request that the current on-going
background command be aborted. Add support for this, where the current
policy is for the request to occur whenever a new incoming bg command
wants to run. As such everything is left to user discretion and it
becomes impossible to hog the device/mailbox.

The context of doing the cancellation request is the same as the new
incoming command, and will always hold the mbox_mutex, guaranteeing
that any successful cancel does not race with a third thread coming
in and stealing the effort.

- For Sanitize, the thread doing the will cancel the work, and clean
on behalf of the respective wq callback that will never come.

- For the other bg commands, the sleeping thread is kicked and we
busy-wait until the polling flag is cleared.

In both scenarios, we guarantee that the aborted op's thread is no
longer around, giving the new bg op full authority to submit the command.

Semantics for devices that do not support such functionality are left
unchanged, and hence, with this, the driver benefits in both scenarios.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 drivers/cxl/cxlmem.h |  1 +
 drivers/cxl/pci.c    | 81 +++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 73 insertions(+), 9 deletions(-)

diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index b933fb73ef8a..e843ffc3c23a 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -516,6 +516,7 @@ to_cxl_memdev_state(struct cxl_dev_state *cxlds)
 enum cxl_opcode {
 	CXL_MBOX_OP_INVALID		= 0x0000,
 	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
+	CXL_MBOX_OP_REQUEST_ABORT_BG_OP = 0x0005,
 	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
 	CXL_MBOX_OP_CLEAR_EVENT_RECORD	= 0x0101,
 	CXL_MBOX_OP_GET_EVT_INT_POLICY	= 0x0102,
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index f2378604669b..5da50e26e4c4 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -115,8 +115,8 @@ static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
 {
 	u64 reg;
 
-	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
-	return FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg) == 100;
+	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_STATUS_OFFSET);
+	return FIELD_GET(CXLDEV_MBOX_STATUS_BG_CMD, reg) == 0;
 }
 
 static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
@@ -241,7 +241,8 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_mailbox *cxl_mbox,
 	 * hardware semantics and only allow device health status.
 	 */
 	if (mds->security.poll_tmo_secs > 0) {
-		if (mbox_cmd->opcode != CXL_MBOX_OP_GET_HEALTH_INFO)
+		if (mbox_cmd->opcode != CXL_MBOX_OP_GET_HEALTH_INFO &&
+		    mbox_cmd->opcode != CXL_MBOX_OP_REQUEST_ABORT_BG_OP)
 			return -EBUSY;
 	}
 
@@ -335,11 +336,64 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_mailbox *cxl_mbox,
 	return 0;
 }
 
+/*
+ * Return true implies that the request was successful and the on-going
+ * background operation was in fact aborted. This also guarantees that
+ * the respective thread is done.
+ */
+static bool cxl_try_to_cancel_background(struct cxl_mailbox *cxl_mbox)
+{
+	int rc;
+	struct cxl_dev_state *cxlds = mbox_to_cxlds(cxl_mbox);
+	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+	struct device *dev = cxlds->dev;
+	struct cxl_mbox_cmd cmd = {
+		.opcode = CXL_MBOX_OP_REQUEST_ABORT_BG_OP
+	};
+
+	lockdep_assert_held(&cxl_mbox->mbox_mutex);
+
+	rc = __cxl_pci_mbox_send_cmd(cxl_mbox, &cmd);
+	if (rc) {
+		dev_dbg(dev, "Failed to send abort request : %d\n", rc);
+		return false;
+	}
+
+	if (!cxl_mbox_background_complete(cxlds))
+		return false;
+
+	if (mds->security.sanitize_active) {
+		/*
+		 * Cancel the work and cleanup on its behalf - we hold
+		 * the mbox_mutex, cannot race with cxl_mbox_sanitize_work().
+		 */
+		cancel_delayed_work_sync(&mds->security.poll_dwork);
+		mds->security.poll_tmo_secs = 0;
+		if (mds->security.sanitize_node)
+			sysfs_notify_dirent(mds->security.sanitize_node);
+		mds->security.sanitize_active = false;
+
+		dev_dbg(cxlds->dev, "Sanitization operation aborted\n");
+	} else {
+		/*
+		 * Kick the poller and wait for it to be done - no one else
+		 * can touch mbox regs. rcuwait_wake_up() provides full
+		 * barriers such that wake up occurs before waiting on the
+		 * bgpoll atomic to be cleared.
+		 */
+		rcuwait_wake_up(&cxl_mbox->mbox_wait);
+		atomic_cond_read_acquire(&cxl_mbox->poll_bgop, !VAL);
+	}
+
+	return true;
+}
+
 static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
 			     struct cxl_mbox_cmd *cmd)
 {
 	int rc;
 	struct cxl_dev_state *cxlds = mbox_to_cxlds(cxl_mbox);
+	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
 	struct device *dev = cxlds->dev;
 
 	mutex_lock_io(&cxl_mbox->mbox_mutex);
@@ -348,10 +402,18 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
 	 * each other: no new bg operation can occur in between while polling.
 	 */
 	if (cxl_is_background_cmd(cmd->opcode)) {
-		if (atomic_read_acquire(&cxl_mbox->poll_bgop)) {
-			mutex_unlock(&cxl_mbox->mbox_mutex);
-			return -EBUSY;
+		if (mds->security.sanitize_active ||
+		    atomic_read_acquire(&cxl_mbox->poll_bgop)) {
+			if (!cxl_try_to_cancel_background(cxl_mbox)) {
+				mutex_unlock(&cxl_mbox->mbox_mutex);
+				return -EBUSY;
+			}
 		}
+		/*
+		 * ... at this point we know that the canceled
+		 * bgop context is gone, and we are the _only_
+		 * background command in town. Proceed to send it.
+		 */
 	}
 
 	rc = __cxl_pci_mbox_send_cmd(cxl_mbox, cmd);
@@ -394,10 +456,11 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
 				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
 		cmd->return_code = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
 					     bg_status_reg);
-		dev_dbg(dev,
-			"Mailbox background operation (0x%04x) completed\n",
-			cmd->opcode);
+
+		dev_dbg(dev, "Mailbox background operation (0x%04x) %s\n",
+			cmd->opcode, !cmd->return_code ? "completed":"aborted");
 done:
+		/* ensure clearing poll_bop is the last operation */
 		atomic_set_release(&cxl_mbox->poll_bgop, 0);
 	}
 
-- 
2.46.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] cxl/pci: rename cxl_mbox_background_complete()
  2024-10-22  3:18 [PATCH 0/3] cxl/mbox: support background operation abort requests Davidlohr Bueso
  2024-10-22  3:18 ` [PATCH 1/3] cxl/pci: lockless background synchronous polling Davidlohr Bueso
  2024-10-22  3:18 ` [PATCH 2/3] cxl/mbox: support aborting the current background operation Davidlohr Bueso
@ 2024-10-22  3:18 ` Davidlohr Bueso
  2025-01-14 17:26   ` Dave Jiang
  2024-10-23  5:32 ` [PATCH 0/3] cxl/mbox: support background operation abort requests Ravis OpenSrc
  3 siblings, 1 reply; 16+ messages in thread
From: Davidlohr Bueso @ 2024-10-22  3:18 UTC (permalink / raw)
  To: dan.j.williams, dave.jiang
  Cc: jonathan.cameron, alison.schofield, vishal.l.verma, ira.weiny,
	fan.ni, a.manzanares, sthanneeru.opensrc, emirakhur, ajayjoshi,
	Ravis.OpenSrc, sthanneeru, dave, linux-cxl, linux-kernel

With the abort functionality, this is a misleading name,
rename to cxl_mbox_background_done() instead.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 drivers/cxl/pci.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 5da50e26e4c4..7b0fad7f6c4d 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -111,7 +111,7 @@ static int cxl_request_irq(struct cxl_dev_state *cxlds, int irq,
 					 dev_id);
 }
 
-static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
+static bool cxl_mbox_background_done(struct cxl_dev_state *cxlds)
 {
 	u64 reg;
 
@@ -128,7 +128,7 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
 	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
 	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
 
-	if (!cxl_mbox_background_complete(cxlds))
+	if (!cxl_mbox_background_done(cxlds))
 		return IRQ_NONE;
 
 	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
@@ -157,7 +157,7 @@ static void cxl_mbox_sanitize_work(struct work_struct *work)
 	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
 
 	mutex_lock(&cxl_mbox->mbox_mutex);
-	if (cxl_mbox_background_complete(cxlds)) {
+	if (cxl_mbox_background_done(cxlds)) {
 		mds->security.poll_tmo_secs = 0;
 		if (mds->security.sanitize_node)
 			sysfs_notify_dirent(mds->security.sanitize_node);
@@ -359,7 +359,7 @@ static bool cxl_try_to_cancel_background(struct cxl_mailbox *cxl_mbox)
 		return false;
 	}
 
-	if (!cxl_mbox_background_complete(cxlds))
+	if (!cxl_mbox_background_done(cxlds))
 		return false;
 
 	if (mds->security.sanitize_active) {
@@ -398,7 +398,7 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
 
 	mutex_lock_io(&cxl_mbox->mbox_mutex);
 	/*
-	 * Ensure cxl_mbox_background_complete() checks are safe amongst
+	 * Ensure cxl_mbox_background_done() checks are safe amongst
 	 * each other: no new bg operation can occur in between while polling.
 	 */
 	if (cxl_is_background_cmd(cmd->opcode)) {
@@ -434,7 +434,7 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
 		timeout = cmd->poll_interval_ms;
 		for (i = 0; i < cmd->poll_count; i++) {
 			if (rcuwait_wait_event_timeout(&cxl_mbox->mbox_wait,
-				       cxl_mbox_background_complete(cxlds),
+				       cxl_mbox_background_done(cxlds),
 				       TASK_UNINTERRUPTIBLE,
 				       msecs_to_jiffies(timeout)) > 0)
 				break;
@@ -445,7 +445,7 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
 		 * until the next successful command submission and the driver
 		 * can get back in sync with the hardware state.
 		 */
-		if (!cxl_mbox_background_complete(cxlds)) {
+		if (!cxl_mbox_background_done(cxlds)) {
 			dev_err(dev, "timeout waiting for background (%d ms)\n",
 				timeout * cmd->poll_count);
 			rc = -ETIMEDOUT;
-- 
2.46.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] cxl/mbox: support background operation abort requests
  2024-10-22  3:18 [PATCH 0/3] cxl/mbox: support background operation abort requests Davidlohr Bueso
                   ` (2 preceding siblings ...)
  2024-10-22  3:18 ` [PATCH 3/3] cxl/pci: rename cxl_mbox_background_complete() Davidlohr Bueso
@ 2024-10-23  5:32 ` Ravis OpenSrc
  2024-10-23 14:29   ` Davidlohr Bueso
  3 siblings, 1 reply; 16+ messages in thread
From: Ravis OpenSrc @ 2024-10-23  5:32 UTC (permalink / raw)
  To: Davidlohr Bueso, dan.j.williams@intel.com, dave.jiang@intel.com
  Cc: jonathan.cameron@huawei.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com, fan.ni@samsung.com,
	a.manzanares@samsung.com, Srinivasulu Opensrc, Eishan Mirakhur,
	Ajay Joshi, Srinivasulu Thanneeru, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org

Hi Davidlohr,

   We have recently submitted an RFC to implement Request Background Abort
in case a timeout is encountered while executing a background command.
The patch series disables those background commands which do not support abort
based on their individual CEL properties.

https://lore.kernel.org/linux-cxl/20241017163215.00000547@Huawei.com/T/#ma6710971908b85a5f8c5da2e23b8102b5a6e277c

This implementation is based on Dan's suggestion in a previous thread.
https://lore.kernel.org/linux-cxl/66035c2e8ba17_770232948b@dwillia2-xfh.jf.intel.com.notmuch/

We can discuss more on how we can potentially merge and arrive at a common ground.

--Ravi.
>On Mon, 21 Oct , Davidlohr Bueso wrote:
>Hello,
>
>This series implements the abortable capabilities for mailbox background
>operations, introduced in CXL 3.1.
>
>Details in each patch, but patch 1 moves non-sanitize background command
>polling out of the mbox_mutex, such that patch 2 can add the machinery
>to request aborting the on going command. Patch 3 simply renames a call.
>
>While an overal policy could include command-priorities (which could
>be trivially added); the current policy is that, if there is an on going
>background operation, a new incoming bg-capable command will blindly
>request it to be cancelled, if capable.
>
>Considering the interest of this for Logs[0], perhaps Micron folks could
>this? Does this work along the lines of what that discussion concluded?
>
>Applies against Linus' latest tree.
>
>[0]: https://lore.kernel.org/all/20240313071218.729-1-sthanneeru.opensrc@micron.com/
>
>Testing:
>
>lockdep enabled kernel + the qemu counterpart patches:
>    https://lore.kernel.org/linux-cxl/20240813221255.179200-1-dave@stgolabs.net/ 
>
>#> echo 1 > /sys/bus/cxl/devices/mem1/security/sanitize && sleep 3 && echo 1 > /sys/bus/cxl/devices/mem1/security/sanitize
>[  185.928276] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0d:00.0: Sending command: 0x4400cxl/devices/mem1/security/sanitize
>[  185.930024] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0d:00.0: Doorbell wait took 0ms
>[  185.931608] cxl_pci:__cxl_pci_mbox_send_cmd:306: cxl_pci 0000:0d:00.0: Sanitization operation started
>[  188.936583] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0d:00.0: Sending command: 0x0005
>[  188.943956] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0d:00.0: Doorbell wait took 0ms
>[  188.951786] cxl_pci:cxl_try_to_cancel_background:376: cxl_pci 0000:0d:00.0: Sanitization operation aborted
>[  188.957762] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0d:00.0: Sending command: 0x4400
>[  188.959886] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0d:00.0: Doorbell wait took 0ms
>[  188.962325] cxl_pci:__cxl_pci_mbox_send_cmd:306: cxl_pci 0000:0d:00.0: Sanitization operation started
>[  197.034644] cxl_pci:cxl_mbox_sanitize_work:164: cxl_pci 0000:0d:00.0: Sanitization operation ended
>
>#> cxl update-firmware -F img.fw mem1 && sleep 3 && echo 1 > /sys/bus/cxl/devices/mem1/security/sanitize
>[   14.443484] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0e:00.0: Sending command: 0x0200
>[   14.445884] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0e:00.0: Doorbell wait took 0ms
>[   14.448744] cxl_core:cxl_query_cmd:539: cxl_mem mem1: Query IOCTL
>[   14.450458] cxl_core:cxl_query_cmd:539: cxl_mem mem1: Query IOCTL
>[   14.452307] cxl_core:cxl_send_cmd:644: cxl_mem mem1: Send IOCTL
>[   14.453686] cxl_core:handle_mailbox_cmd_from_user:602: cxl_pci 0000:0e:00.0: Submitting Get FW Info command for user
>[   14.453686]  opcode: 200
>[   14.453686]  size: 0
>[   14.460453] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0e:00.0: Sending command: 0x0201
>[   14.462313] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0e:00.0: Doorbell wait took 0ms
>[   14.464166] cxl_pci:__cxl_pci_mbox_send_cmd:310: cxl_pci 0000:0e:00.0: Mailbox background operation (0x0201) started
>[   14.466536] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0e:00.0: Sending command: 0x0200
>[   14.468380] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0e:00.0: Doorbell wait took 0ms
>{
>  "memdev":"mem1",
>  "pmem_size":"1024.00 MiB (1073.74 MB)",
>  "serial":"0",
>  "host":"0000:0e:00.0",
>  "firmware":{
>    "num_slots":2,
>    "active_slot":1,
>    "online_activate_capable":true,
>    "slot_1_version":"BWFW VERSION 0",
>    "fw_update_in_progress":true,
>    "remaining_size":"48.83 MiB (51.20 MB)"
>  }
>}
>cxl memdev: cmd_update_fw: firmware update started on 1 mem device
>[   17.484680] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0e:00.0: Sending command: 0x0005
>[   17.486993] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0e:00.0: Doorbell wait took 0ms
>[   17.489510] cxl_pci:cxl_pci_mbox_send:451: cxl_pci 0000:0e:00.0: Mailbox background operation (0x0201) aborted
>[   17.492476] cxl_pci:__cxl_pci_mbox_send_cmd:262: cxl_pci 0000:0e:00.0: Sending command: 0x4400
>[   17.494937] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0e:00.0: Doorbell wait took 0ms
>[   17.497598] cxl_pci:__cxl_pci_mbox_send_cmd:306: cxl_pci 0000:0e:00.0: Sanitization operation started
>[   25.682631] cxl_pci:cxl_mbox_sanitize_work:164: cxl_pci 0000:0e:00.0: Sanitization operation ended
>
>Thanks!
>
>Davidlohr Bueso (3):
>  cxl/pci: lockless background synchronous polling
>  cxl/mbox: support aborting the current background operation
>  cxl/pci: rename cxl_mbox_background_complete()
>
> drivers/cxl/core/mbox.c |   1 +
> drivers/cxl/cxlmem.h    |  14 +++
> drivers/cxl/pci.c       | 189 ++++++++++++++++++++++++++++------------
> include/cxl/mailbox.h   |   2 +
> 4 files changed, 152 insertions(+), 54 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] cxl/mbox: support background operation abort requests
  2024-10-23  5:32 ` [PATCH 0/3] cxl/mbox: support background operation abort requests Ravis OpenSrc
@ 2024-10-23 14:29   ` Davidlohr Bueso
  2024-10-23 21:17     ` Ravis OpenSrc
  0 siblings, 1 reply; 16+ messages in thread
From: Davidlohr Bueso @ 2024-10-23 14:29 UTC (permalink / raw)
  To: Ravis OpenSrc
  Cc: dan.j.williams@intel.com, dave.jiang@intel.com,
	jonathan.cameron@huawei.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com, fan.ni@samsung.com,
	a.manzanares@samsung.com, Srinivasulu Opensrc, Eishan Mirakhur,
	Ajay Joshi, Srinivasulu Thanneeru, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Wed, 23 Oct 2024, Ravis OpenSrc wrote:\n
>Hi Davidlohr,
>
>   We have recently submitted an RFC to implement Request Background Abort
>in case a timeout is encountered while executing a background command.
>The patch series disables those background commands which do not support abort
>based on their individual CEL properties.
>
>https://lore.kernel.org/linux-cxl/20241017163215.00000547@Huawei.com/T/#ma6710971908b85a5f8c5da2e23b8102b5a6e277c

*sigh* it would have been nice to be Cc'ed.

>
>This implementation is based on Dan's suggestion in a previous thread.
>https://lore.kernel.org/linux-cxl/66035c2e8ba17_770232948b@dwillia2-xfh.jf.intel.com.notmuch/
>
>We can discuss more on how we can potentially merge and arrive at a common ground.

I think that my approach is way more robust then defining a timeout as in that series.
It also aligns more with Jonathan's comments about the abort being on demand.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] cxl/mbox: support background operation abort requests
  2024-10-23 14:29   ` Davidlohr Bueso
@ 2024-10-23 21:17     ` Ravis OpenSrc
  2024-10-23 23:35       ` Davidlohr Bueso
  0 siblings, 1 reply; 16+ messages in thread
From: Ravis OpenSrc @ 2024-10-23 21:17 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams@intel.com, dave.jiang@intel.com,
	jonathan.cameron@huawei.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com, fan.ni@samsung.com,
	a.manzanares@samsung.com, Srinivasulu Opensrc, Eishan Mirakhur,
	Ajay Joshi, Srinivasulu Thanneeru, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org

>>On Wed, 23 Oct 2024, Ravis OpenSrc wrote:\n
>>Hi Davidlohr,
>>
>>   We have recently submitted an RFC to implement Request Background Abort
>>in case a timeout is encountered while executing a background command.
>>The patch series disables those background commands which do not support abort
>>based on their individual CEL properties.
>>
>>https://lore.kernel.org/linux-cxl/20241017163215.00000547@Huawei.com/T/#ma6710971908b85a5f8c5da2e23b8102b5a6e277c
>
>*sigh* it would have been nice to be Cc'ed.

Noted. Will take care of this in future.

>
>>
>>This implementation is based on Dan's suggestion in a previous thread.
>>https://lore.kernel.org/linux-cxl/66035c2e8ba17_770232948b@dwillia2-xfh.jf.intel.com.notmuch/
>>
>>We can discuss more on how we can potentially merge and arrive at a common ground.
>
>I think that my approach is way more robust then defining a timeout as in that series.
>It also aligns more with Jonathan's comments about the abort being on demand.

The one major functionality in our original proposal apart from implementing abort is

Allowing background commands to be invoked from user space through the primary mailbox
by ensuring only those background commands are enabled which also support request abort operation
by checking their individual CEL details.

Few questions on abort implementation are:
1. Would abort be required only when a new bg command is issued?
2. What is the path for supporting background commands from primary mailbox if not done through
CEL check?
3. Should all future background commands follow sysfs path?

--Ravi.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] cxl/mbox: support background operation abort requests
  2024-10-23 21:17     ` Ravis OpenSrc
@ 2024-10-23 23:35       ` Davidlohr Bueso
  2024-10-24  6:30         ` Ravis OpenSrc
  0 siblings, 1 reply; 16+ messages in thread
From: Davidlohr Bueso @ 2024-10-23 23:35 UTC (permalink / raw)
  To: Ravis OpenSrc
  Cc: dan.j.williams@intel.com, dave.jiang@intel.com,
	jonathan.cameron@huawei.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com, fan.ni@samsung.com,
	a.manzanares@samsung.com, Srinivasulu Opensrc, Eishan Mirakhur,
	Ajay Joshi, Srinivasulu Thanneeru, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Wed, 23 Oct 2024, Ravis OpenSrc wrote:
>The one major functionality in our original proposal apart from implementing abort is
>
>Allowing background commands to be invoked from user space through the primary mailbox
>by ensuring only those background commands are enabled which also support request abort operation
>by checking their individual CEL details.

Is vendor-specific logs not something that can be done OoB?

If we are strictly talking about CEL details, yes this series could use that, and was
planning on adding it for an eventual v2 -- ie: why send the abort cmd at all if we
know the current one doesn't support it. But that is minutiae, for kernel bg commands.

But yeah, for your needs, the enabled cmds would need that filter.

Then with that, would adding something like the below address your needs and below
questions? Basically, if userspace is running a command, then the kernel comes in
and wants to run its own bg command, it will simply abort *anything* ongoing as a
last resort. And since we have no kernel-context about whatever is running at that
point, is *think* it is safe to assume it was user-initiated.

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 7b0fad7f6c4d..bf0742546ea8 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -374,7 +374,7 @@ static bool cxl_try_to_cancel_background(struct cxl_mailbox *cxl_mbox)
		mds->security.sanitize_active = false;

		dev_dbg(cxlds->dev, "Sanitization operation aborted\n");
-	} else {
+	} else if (atomic_read_acquire(&cxl_mbox->poll_bgop)){
		/*
		 * Kick the poller and wait for it to be done - no one else
		 * can touch mbox regs. rcuwait_wake_up() provides full
@@ -403,7 +403,9 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
	 */
	if (cxl_is_background_cmd(cmd->opcode)) {
		if (mds->security.sanitize_active ||
-		    atomic_read_acquire(&cxl_mbox->poll_bgop)) {
+		    atomic_read_acquire(&cxl_mbox->poll_bgop) ||
+		    /* userspace-initiated ? */
+		    !cxl_mbox_background_done(cxlds)) {
			if (!cxl_try_to_cancel_background(cxl_mbox)) {
				mutex_unlock(&cxl_mbox->mbox_mutex);
				return -EBUSY;

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] cxl/mbox: support background operation abort requests
  2024-10-23 23:35       ` Davidlohr Bueso
@ 2024-10-24  6:30         ` Ravis OpenSrc
  0 siblings, 0 replies; 16+ messages in thread
From: Ravis OpenSrc @ 2024-10-24  6:30 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams@intel.com, dave.jiang@intel.com,
	jonathan.cameron@huawei.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com, fan.ni@samsung.com,
	a.manzanares@samsung.com, Srinivasulu Opensrc, Eishan Mirakhur,
	Ajay Joshi, Srinivasulu Thanneeru, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org

>On Wed, 23 Oct 2024, Davidlohr Bueso wrote:
>>The one major functionality in our original proposal apart from implementing abort is
>>
>>Allowing background commands to be invoked from user space through the primary mailbox
>>by ensuring only those background commands are enabled which also support request abort operation
>>by checking their individual CEL details.
>
>Is vendor-specific logs not something that can be done OoB?
>
>If we are strictly talking about CEL details, yes this series could use that, and was
>planning on adding it for an eventual v2 -- ie: why send the abort cmd at all if we
>know the current one doesn't support it. But that is minutiae, for kernel bg commands.
>
>But yeah, for your needs, the enabled cmds would need that filter.

Ok. we can merge changes related to CEL checking and filtering of enabled commands.

>
>Then with that, would adding something like the below address your needs and below
>questions? Basically, if userspace is running a command, then the kernel comes in
>and wants to run its own bg command, it will simply abort *anything* ongoing as a
>last resort. And since we have no kernel-context about whatever is running at that
>point, is *think* it is safe to assume it was user-initiated.

Yes, this seems a reasonable assumption.

Currently user space doesn't have a way to initiate background commands
through primary mailbox as there is no provision to provide timeout value.

By filtering out user-initiated background commands which do not support abort,
we can potentially have a default timeout or allow user space to provide a custom timeout value. 
 
Overall, the approach to cancel current background operation when new bg operation
is initiated seems elegant. 
>
>diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>index 7b0fad7f6c4d..bf0742546ea8 100644
>--- a/drivers/cxl/pci.c
>+++ b/drivers/cxl/pci.c
>@@ -374,7 +374,7 @@ static bool cxl_try_to_cancel_background(struct cxl_mailbox *cxl_mbox)
>		mds->security.sanitize_active = false;
>
>		dev_dbg(cxlds->dev, "Sanitization operation aborted\n");
>-	} else {
>+	} else if (atomic_read_acquire(&cxl_mbox->poll_bgop)){
>		/*
>		 * Kick the poller and wait for it to be done - no one else
>		 * can touch mbox regs. rcuwait_wake_up() provides full
>@@ -403,7 +403,9 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
>	 */
>	if (cxl_is_background_cmd(cmd->opcode)) {
>		if (mds->security.sanitize_active ||
>-		    atomic_read_acquire(&cxl_mbox->poll_bgop)) {
>+		    atomic_read_acquire(&cxl_mbox->poll_bgop) ||
>+		    /* userspace-initiated ? */
>+		    !cxl_mbox_background_done(cxlds)) {
>			if (!cxl_try_to_cancel_background(cxl_mbox)) {
>				mutex_unlock(&cxl_mbox->mbox_mutex);
>
--Ravi.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] cxl/pci: lockless background synchronous polling
  2024-10-22  3:18 ` [PATCH 1/3] cxl/pci: lockless background synchronous polling Davidlohr Bueso
@ 2025-01-14 16:12   ` Dave Jiang
  2025-01-14 18:40     ` Jonathan Cameron
  2025-01-14 19:30     ` Davidlohr Bueso
  0 siblings, 2 replies; 16+ messages in thread
From: Dave Jiang @ 2025-01-14 16:12 UTC (permalink / raw)
  To: Davidlohr Bueso, dan.j.williams
  Cc: jonathan.cameron, alison.schofield, vishal.l.verma, ira.weiny,
	fan.ni, a.manzanares, sthanneeru.opensrc, emirakhur, ajayjoshi,
	Ravis.OpenSrc, sthanneeru, linux-cxl, linux-kernel



On 10/21/24 8:18 PM, Davidlohr Bueso wrote:
> For non-sanitize background commands we rely on holding the mbox_mutex
> throughout the duration of the operation. This causes other incoming
> commands to queue up behind, and interleaving executions while the
> background command is timesliced by the user.
> 
> However, in order to support the mbox request cancel background
> operation command, the lock will need to be available to actually
> perform the request. Semantically this allows other commands to
> many times be at the mercy of hardware returning the respective error.
> Potentially users would be exposed to changes in the form of errors
> instead of commands taking longer to run - but this is not forseen
> as a problem.
> 
> In order to not loose sync with the hardware, introduce a mailbox
> atomic that blocks any other incoming bg operations while the driver
> is still polling (synchronously) on the current one.
> 
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
>  drivers/cxl/core/mbox.c |   1 +
>  drivers/cxl/cxlmem.h    |  13 +++++
>  drivers/cxl/pci.c       | 114 +++++++++++++++++++++++-----------------
>  include/cxl/mailbox.h   |   2 +
>  4 files changed, 82 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 5175138c4fb7..9af3cd16d23d 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1429,6 +1429,7 @@ int cxl_mailbox_init(struct cxl_mailbox *cxl_mbox, struct device *host)
>  
>  	cxl_mbox->host = host;
>  	mutex_init(&cxl_mbox->mbox_mutex);
> +	atomic_set(&cxl_mbox->poll_bgop, 0);
>  	rcuwait_init(&cxl_mbox->mbox_wait);
>  
>  	return 0;
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 2a25d1957ddb..b933fb73ef8a 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -557,6 +557,19 @@ enum cxl_opcode {
>  	CXL_MBOX_OP_MAX			= 0x10000
>  };
>  
> +static inline bool cxl_is_background_cmd(u16 opcode)
> +{
> +	switch (opcode) {
> +	case CXL_MBOX_OP_TRANSFER_FW:
> +	case CXL_MBOX_OP_ACTIVATE_FW:
> +	case CXL_MBOX_OP_SCAN_MEDIA:
> +	case CXL_MBOX_OP_SANITIZE:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}

Should this information be pulled from the CEL entry? We should add the code to store the 'command effect' field when walking the CEL and store this information for each command. And then we can just check bit6 of the field rather than statically program that in the kernel.

DJ

> +
>  #define DEFINE_CXL_CEL_UUID                                                    \
>  	UUID_INIT(0xda9c0b5, 0xbf41, 0x4b78, 0x8f, 0x79, 0x96, 0xb1, 0x62,     \
>  		  0x3b, 0x3f, 0x17)
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 188412d45e0d..f2378604669b 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -278,29 +278,15 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_mailbox *cxl_mbox,
>  	mbox_cmd->return_code =
>  		FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
>  
> -	/*
> -	 * Handle the background command in a synchronous manner.
> -	 *
> -	 * All other mailbox commands will serialize/queue on the mbox_mutex,
> -	 * which we currently hold. Furthermore this also guarantees that
> -	 * cxl_mbox_background_complete() checks are safe amongst each other,
> -	 * in that no new bg operation can occur in between.
> -	 *
> -	 * Background operations are timesliced in accordance with the nature
> -	 * of the command. In the event of timeout, the mailbox state is
> -	 * indeterminate until the next successful command submission and the
> -	 * driver can get back in sync with the hardware state.
> -	 */
>  	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
> -		u64 bg_status_reg;
> -		int i, timeout;
> -
>  		/*
>  		 * Sanitization is a special case which monopolizes the device
>  		 * and cannot be timesliced. Handle asynchronously instead,
>  		 * and allow userspace to poll(2) for completion.
>  		 */
>  		if (mbox_cmd->opcode == CXL_MBOX_OP_SANITIZE) {
> +			int timeout;
> +
>  			if (mds->security.sanitize_active)
>  				return -EBUSY;
>  
> @@ -311,44 +297,19 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_mailbox *cxl_mbox,
>  			schedule_delayed_work(&mds->security.poll_dwork,
>  					      timeout * HZ);
>  			dev_dbg(dev, "Sanitization operation started\n");
> -			goto success;
> -		}
> -
> -		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
> -			mbox_cmd->opcode);
> -
> -		timeout = mbox_cmd->poll_interval_ms;
> -		for (i = 0; i < mbox_cmd->poll_count; i++) {
> -			if (rcuwait_wait_event_timeout(&cxl_mbox->mbox_wait,
> -						       cxl_mbox_background_complete(cxlds),
> -						       TASK_UNINTERRUPTIBLE,
> -						       msecs_to_jiffies(timeout)) > 0)
> -				break;
> -		}
> -
> -		if (!cxl_mbox_background_complete(cxlds)) {
> -			dev_err(dev, "timeout waiting for background (%d ms)\n",
> -				timeout * mbox_cmd->poll_count);
> -			return -ETIMEDOUT;
> +		} else {
> +			/* pairs with release/acquire semantics */
> +			WARN_ON_ONCE(atomic_xchg(&cxl_mbox->poll_bgop,
> +						 mbox_cmd->opcode));
> +			dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
> +				mbox_cmd->opcode);
>  		}
> -
> -		bg_status_reg = readq(cxlds->regs.mbox +
> -				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> -		mbox_cmd->return_code =
> -			FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
> -				  bg_status_reg);
> -		dev_dbg(dev,
> -			"Mailbox background operation (0x%04x) completed\n",
> -			mbox_cmd->opcode);
> -	}
> -
> -	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
> +	} else if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
>  		dev_dbg(dev, "Mailbox operation had an error: %s\n",
>  			cxl_mbox_cmd_rc2str(mbox_cmd));
>  		return 0; /* completed but caller must check return_code */
>  	}
>  
> -success:
>  	/* #7 */
>  	cmd_reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_CMD_OFFSET);
>  	out_len = FIELD_GET(CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK, cmd_reg);
> @@ -378,11 +339,68 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
>  			     struct cxl_mbox_cmd *cmd)
>  {
>  	int rc;
> +	struct cxl_dev_state *cxlds = mbox_to_cxlds(cxl_mbox);
> +	struct device *dev = cxlds->dev;
>  
>  	mutex_lock_io(&cxl_mbox->mbox_mutex);
> +	/*
> +	 * Ensure cxl_mbox_background_complete() checks are safe amongst
> +	 * each other: no new bg operation can occur in between while polling.
> +	 */
> +	if (cxl_is_background_cmd(cmd->opcode)) {
> +		if (atomic_read_acquire(&cxl_mbox->poll_bgop)) {
> +			mutex_unlock(&cxl_mbox->mbox_mutex);
> +			return -EBUSY;
> +		}
> +	}
> +
>  	rc = __cxl_pci_mbox_send_cmd(cxl_mbox, cmd);
>  	mutex_unlock(&cxl_mbox->mbox_mutex);
>  
> +	if (cmd->return_code != CXL_MBOX_CMD_RC_BACKGROUND)
> +		return rc;
> +
> +	/*
> +	 * Handle the background command in a synchronous manner. Background
> +	 * operations are timesliced in accordance with the nature of the
> +	 * command.
> +	 */
> +	if (cmd->opcode != CXL_MBOX_OP_SANITIZE) {
> +		int i, timeout;
> +		u64 bg_status_reg;
> +
> +		timeout = cmd->poll_interval_ms;
> +		for (i = 0; i < cmd->poll_count; i++) {
> +			if (rcuwait_wait_event_timeout(&cxl_mbox->mbox_wait,
> +				       cxl_mbox_background_complete(cxlds),
> +				       TASK_UNINTERRUPTIBLE,
> +				       msecs_to_jiffies(timeout)) > 0)
> +				break;
> +		}
> +
> +		/*
> +		 * In the event of timeout, the mailbox state is indeterminate
> +		 * until the next successful command submission and the driver
> +		 * can get back in sync with the hardware state.
> +		 */
> +		if (!cxl_mbox_background_complete(cxlds)) {
> +			dev_err(dev, "timeout waiting for background (%d ms)\n",
> +				timeout * cmd->poll_count);
> +			rc = -ETIMEDOUT;
> +			goto done;
> +		}
> +
> +		bg_status_reg = readq(cxlds->regs.mbox +
> +				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> +		cmd->return_code = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
> +					     bg_status_reg);
> +		dev_dbg(dev,
> +			"Mailbox background operation (0x%04x) completed\n",
> +			cmd->opcode);
> +done:
> +		atomic_set_release(&cxl_mbox->poll_bgop, 0);
> +	}
> +
>  	return rc;
>  }
>  
> diff --git a/include/cxl/mailbox.h b/include/cxl/mailbox.h
> index bacd111e75f1..23282a3c44f1 100644
> --- a/include/cxl/mailbox.h
> +++ b/include/cxl/mailbox.h
> @@ -13,6 +13,7 @@ struct cxl_mbox_cmd;
>   *                (CXL 3.1 8.2.8.4.3 Mailbox Capabilities Register)
>   * @mbox_mutex: mutex protects device mailbox and firmware
>   * @mbox_wait: rcuwait for mailbox
> + * @poll_bgop: current background operation being polled on
>   * @mbox_send: @dev specific transport for transmitting mailbox commands
>   */
>  struct cxl_mailbox {
> @@ -20,6 +21,7 @@ struct cxl_mailbox {
>  	size_t payload_size;
>  	struct mutex mbox_mutex; /* lock to protect mailbox context */
>  	struct rcuwait mbox_wait;
> +	atomic_t poll_bgop;
>  	int (*mbox_send)(struct cxl_mailbox *cxl_mbox, struct cxl_mbox_cmd *cmd);
>  };
>  


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] cxl/mbox: support aborting the current background operation
  2024-10-22  3:18 ` [PATCH 2/3] cxl/mbox: support aborting the current background operation Davidlohr Bueso
@ 2025-01-14 17:00   ` Dave Jiang
  2025-01-14 19:33     ` Davidlohr Bueso
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Jiang @ 2025-01-14 17:00 UTC (permalink / raw)
  To: Davidlohr Bueso, dan.j.williams
  Cc: jonathan.cameron, alison.schofield, vishal.l.verma, ira.weiny,
	fan.ni, a.manzanares, sthanneeru.opensrc, emirakhur, ajayjoshi,
	Ravis.OpenSrc, sthanneeru, linux-cxl, linux-kernel



On 10/21/24 8:18 PM, Davidlohr Bueso wrote:
> CXL 3.1 introduced the ability to request that the current on-going
> background command be aborted. Add support for this, where the current
> policy is for the request to occur whenever a new incoming bg command
> wants to run. As such everything is left to user discretion and it
> becomes impossible to hog the device/mailbox.

Are you trying to say that the patch is changing the current behavior to where every time a new bg command comes in, it will abort the previous one?

> 
> The context of doing the cancellation request is the same as the new
> incoming command, and will always hold the mbox_mutex, guaranteeing
> that any successful cancel does not race with a third thread coming
> in and stealing the effort.
> 
> - For Sanitize, the thread doing the will cancel the work, and clean

doing the? seems to be missing a word here.

> on behalf of the respective wq callback that will never come.
> 
> - For the other bg commands, the sleeping thread is kicked and we
> busy-wait until the polling flag is cleared.
> 
> In both scenarios, we guarantee that the aborted op's thread is no
> longer around, giving the new bg op full authority to submit the command.
> 
> Semantics for devices that do not support such functionality are left
> unchanged, and hence, with this, the driver benefits in both scenarios.
> 
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
>  drivers/cxl/cxlmem.h |  1 +
>  drivers/cxl/pci.c    | 81 +++++++++++++++++++++++++++++++++++++++-----
>  2 files changed, 73 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index b933fb73ef8a..e843ffc3c23a 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -516,6 +516,7 @@ to_cxl_memdev_state(struct cxl_dev_state *cxlds)
>  enum cxl_opcode {
>  	CXL_MBOX_OP_INVALID		= 0x0000,
>  	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
> +	CXL_MBOX_OP_REQUEST_ABORT_BG_OP = 0x0005,
>  	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
>  	CXL_MBOX_OP_CLEAR_EVENT_RECORD	= 0x0101,
>  	CXL_MBOX_OP_GET_EVT_INT_POLICY	= 0x0102,
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index f2378604669b..5da50e26e4c4 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -115,8 +115,8 @@ static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
>  {
>  	u64 reg;
>  
> -	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> -	return FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg) == 100;
> +	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_STATUS_OFFSET);
> +	return FIELD_GET(CXLDEV_MBOX_STATUS_BG_CMD, reg) == 0;
>  }
>  
>  static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
> @@ -241,7 +241,8 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_mailbox *cxl_mbox,
>  	 * hardware semantics and only allow device health status.
>  	 */
>  	if (mds->security.poll_tmo_secs > 0) {
> -		if (mbox_cmd->opcode != CXL_MBOX_OP_GET_HEALTH_INFO)
> +		if (mbox_cmd->opcode != CXL_MBOX_OP_GET_HEALTH_INFO &&
> +		    mbox_cmd->opcode != CXL_MBOX_OP_REQUEST_ABORT_BG_OP)
>  			return -EBUSY;
>  	}
>  
> @@ -335,11 +336,64 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_mailbox *cxl_mbox,
>  	return 0;
>  }
>  
> +/*
> + * Return true implies that the request was successful and the on-going
> + * background operation was in fact aborted. This also guarantees that
> + * the respective thread is done.
> + */
> +static bool cxl_try_to_cancel_background(struct cxl_mailbox *cxl_mbox)
> +{
> +	int rc;
> +	struct cxl_dev_state *cxlds = mbox_to_cxlds(cxl_mbox);
> +	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> +	struct device *dev = cxlds->dev;
> +	struct cxl_mbox_cmd cmd = {
> +		.opcode = CXL_MBOX_OP_REQUEST_ABORT_BG_OP
> +	};
> +
> +	lockdep_assert_held(&cxl_mbox->mbox_mutex);
> +
> +	rc = __cxl_pci_mbox_send_cmd(cxl_mbox, &cmd);
> +	if (rc) {
> +		dev_dbg(dev, "Failed to send abort request : %d\n", rc);
> +		return false;
> +	}
> +
> +	if (!cxl_mbox_background_complete(cxlds))
> +		return false;
> +
> +	if (mds->security.sanitize_active) {
> +		/*
> +		 * Cancel the work and cleanup on its behalf - we hold
> +		 * the mbox_mutex, cannot race with cxl_mbox_sanitize_work().
> +		 */
> +		cancel_delayed_work_sync(&mds->security.poll_dwork);
> +		mds->security.poll_tmo_secs = 0;
> +		if (mds->security.sanitize_node)
> +			sysfs_notify_dirent(mds->security.sanitize_node);
> +		mds->security.sanitize_active = false;

Should this line happen before the sysfs notification?

DJ

> +
> +		dev_dbg(cxlds->dev, "Sanitization operation aborted\n");
> +	} else {
> +		/*
> +		 * Kick the poller and wait for it to be done - no one else
> +		 * can touch mbox regs. rcuwait_wake_up() provides full
> +		 * barriers such that wake up occurs before waiting on the
> +		 * bgpoll atomic to be cleared.
> +		 */
> +		rcuwait_wake_up(&cxl_mbox->mbox_wait);
> +		atomic_cond_read_acquire(&cxl_mbox->poll_bgop, !VAL);
> +	}
> +
> +	return true;
> +}
> +
>  static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
>  			     struct cxl_mbox_cmd *cmd)
>  {
>  	int rc;
>  	struct cxl_dev_state *cxlds = mbox_to_cxlds(cxl_mbox);
> +	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>  	struct device *dev = cxlds->dev;
>  
>  	mutex_lock_io(&cxl_mbox->mbox_mutex);
> @@ -348,10 +402,18 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
>  	 * each other: no new bg operation can occur in between while polling.
>  	 */
>  	if (cxl_is_background_cmd(cmd->opcode)) {
> -		if (atomic_read_acquire(&cxl_mbox->poll_bgop)) {
> -			mutex_unlock(&cxl_mbox->mbox_mutex);
> -			return -EBUSY;
> +		if (mds->security.sanitize_active ||
> +		    atomic_read_acquire(&cxl_mbox->poll_bgop)) {
> +			if (!cxl_try_to_cancel_background(cxl_mbox)) {
> +				mutex_unlock(&cxl_mbox->mbox_mutex);
> +				return -EBUSY;
> +			}
>  		}
> +		/*
> +		 * ... at this point we know that the canceled
> +		 * bgop context is gone, and we are the _only_
> +		 * background command in town. Proceed to send it.
> +		 */
>  	}
>  
>  	rc = __cxl_pci_mbox_send_cmd(cxl_mbox, cmd);
> @@ -394,10 +456,11 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
>  				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
>  		cmd->return_code = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
>  					     bg_status_reg);
> -		dev_dbg(dev,
> -			"Mailbox background operation (0x%04x) completed\n",
> -			cmd->opcode);
> +
> +		dev_dbg(dev, "Mailbox background operation (0x%04x) %s\n",
> +			cmd->opcode, !cmd->return_code ? "completed":"aborted");
>  done:
> +		/* ensure clearing poll_bop is the last operation */
>  		atomic_set_release(&cxl_mbox->poll_bgop, 0);
>  	}
>  


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] cxl/pci: rename cxl_mbox_background_complete()
  2024-10-22  3:18 ` [PATCH 3/3] cxl/pci: rename cxl_mbox_background_complete() Davidlohr Bueso
@ 2025-01-14 17:26   ` Dave Jiang
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Jiang @ 2025-01-14 17:26 UTC (permalink / raw)
  To: Davidlohr Bueso, dan.j.williams
  Cc: jonathan.cameron, alison.schofield, vishal.l.verma, ira.weiny,
	fan.ni, a.manzanares, sthanneeru.opensrc, emirakhur, ajayjoshi,
	Ravis.OpenSrc, sthanneeru, linux-cxl, linux-kernel



On 10/21/24 8:18 PM, Davidlohr Bueso wrote:
> With the abort functionality, this is a misleading name,
> rename to cxl_mbox_background_done() instead.
> 
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>  drivers/cxl/pci.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 5da50e26e4c4..7b0fad7f6c4d 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -111,7 +111,7 @@ static int cxl_request_irq(struct cxl_dev_state *cxlds, int irq,
>  					 dev_id);
>  }
>  
> -static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
> +static bool cxl_mbox_background_done(struct cxl_dev_state *cxlds)
>  {
>  	u64 reg;
>  
> @@ -128,7 +128,7 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>  	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
>  	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>  
> -	if (!cxl_mbox_background_complete(cxlds))
> +	if (!cxl_mbox_background_done(cxlds))
>  		return IRQ_NONE;
>  
>  	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> @@ -157,7 +157,7 @@ static void cxl_mbox_sanitize_work(struct work_struct *work)
>  	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
>  
>  	mutex_lock(&cxl_mbox->mbox_mutex);
> -	if (cxl_mbox_background_complete(cxlds)) {
> +	if (cxl_mbox_background_done(cxlds)) {
>  		mds->security.poll_tmo_secs = 0;
>  		if (mds->security.sanitize_node)
>  			sysfs_notify_dirent(mds->security.sanitize_node);
> @@ -359,7 +359,7 @@ static bool cxl_try_to_cancel_background(struct cxl_mailbox *cxl_mbox)
>  		return false;
>  	}
>  
> -	if (!cxl_mbox_background_complete(cxlds))
> +	if (!cxl_mbox_background_done(cxlds))
>  		return false;
>  
>  	if (mds->security.sanitize_active) {
> @@ -398,7 +398,7 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
>  
>  	mutex_lock_io(&cxl_mbox->mbox_mutex);
>  	/*
> -	 * Ensure cxl_mbox_background_complete() checks are safe amongst
> +	 * Ensure cxl_mbox_background_done() checks are safe amongst
>  	 * each other: no new bg operation can occur in between while polling.
>  	 */
>  	if (cxl_is_background_cmd(cmd->opcode)) {
> @@ -434,7 +434,7 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
>  		timeout = cmd->poll_interval_ms;
>  		for (i = 0; i < cmd->poll_count; i++) {
>  			if (rcuwait_wait_event_timeout(&cxl_mbox->mbox_wait,
> -				       cxl_mbox_background_complete(cxlds),
> +				       cxl_mbox_background_done(cxlds),
>  				       TASK_UNINTERRUPTIBLE,
>  				       msecs_to_jiffies(timeout)) > 0)
>  				break;
> @@ -445,7 +445,7 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
>  		 * until the next successful command submission and the driver
>  		 * can get back in sync with the hardware state.
>  		 */
> -		if (!cxl_mbox_background_complete(cxlds)) {
> +		if (!cxl_mbox_background_done(cxlds)) {
>  			dev_err(dev, "timeout waiting for background (%d ms)\n",
>  				timeout * cmd->poll_count);
>  			rc = -ETIMEDOUT;


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] cxl/pci: lockless background synchronous polling
  2025-01-14 16:12   ` Dave Jiang
@ 2025-01-14 18:40     ` Jonathan Cameron
  2025-01-14 19:30     ` Davidlohr Bueso
  1 sibling, 0 replies; 16+ messages in thread
From: Jonathan Cameron @ 2025-01-14 18:40 UTC (permalink / raw)
  To: Dave Jiang
  Cc: Davidlohr Bueso, dan.j.williams, alison.schofield, vishal.l.verma,
	ira.weiny, fan.ni, a.manzanares, sthanneeru.opensrc, emirakhur,
	ajayjoshi, Ravis.OpenSrc, sthanneeru, linux-cxl, linux-kernel

On Tue, 14 Jan 2025 09:12:09 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> On 10/21/24 8:18 PM, Davidlohr Bueso wrote:
> > For non-sanitize background commands we rely on holding the mbox_mutex
> > throughout the duration of the operation. This causes other incoming
> > commands to queue up behind, and interleaving executions while the
> > background command is timesliced by the user.
> > 
> > However, in order to support the mbox request cancel background
> > operation command, the lock will need to be available to actually
> > perform the request. Semantically this allows other commands to
> > many times be at the mercy of hardware returning the respective error.
> > Potentially users would be exposed to changes in the form of errors
> > instead of commands taking longer to run - but this is not forseen
> > as a problem.
> > 
> > In order to not loose sync with the hardware, introduce a mailbox
> > atomic that blocks any other incoming bg operations while the driver
> > is still polling (synchronously) on the current one.
> > 
> > Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> > ---
> >  drivers/cxl/core/mbox.c |   1 +
> >  drivers/cxl/cxlmem.h    |  13 +++++
> >  drivers/cxl/pci.c       | 114 +++++++++++++++++++++++-----------------
> >  include/cxl/mailbox.h   |   2 +
> >  4 files changed, 82 insertions(+), 48 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 5175138c4fb7..9af3cd16d23d 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -1429,6 +1429,7 @@ int cxl_mailbox_init(struct cxl_mailbox *cxl_mbox, struct device *host)
> >  
> >  	cxl_mbox->host = host;
> >  	mutex_init(&cxl_mbox->mbox_mutex);
> > +	atomic_set(&cxl_mbox->poll_bgop, 0);
> >  	rcuwait_init(&cxl_mbox->mbox_wait);
> >  
> >  	return 0;
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 2a25d1957ddb..b933fb73ef8a 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -557,6 +557,19 @@ enum cxl_opcode {
> >  	CXL_MBOX_OP_MAX			= 0x10000
> >  };
> >  
> > +static inline bool cxl_is_background_cmd(u16 opcode)
> > +{
> > +	switch (opcode) {
> > +	case CXL_MBOX_OP_TRANSFER_FW:
> > +	case CXL_MBOX_OP_ACTIVATE_FW:
> > +	case CXL_MBOX_OP_SCAN_MEDIA:
> > +	case CXL_MBOX_OP_SANITIZE:
> > +		return true;
> > +	default:
> > +		return false;
> > +	}
> > +}  
> 
> Should this information be pulled from the CEL entry? We should add the code to store the 'command effect' field when walking the CEL and store this information for each command. And then we can just check bit6 of the field rather than statically program that in the kernel.
It might make it easier to maintain.

If nothing else the maintenance command stuff in Shiju's patch set
will need to be added to here possibly, but they are allowed to not
be background commands I think depending on the implementation.

So I'm in favor of using the CEL for this as well.

Jonathan

> 
> DJ
> 
> > +
> >  #define DEFINE_CXL_CEL_UUID                                                    \
> >  	UUID_INIT(0xda9c0b5, 0xbf41, 0x4b78, 0x8f, 0x79, 0x96, 0xb1, 0x62,     \
> >  		  0x3b, 0x3f, 0x17)
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 188412d45e0d..f2378604669b 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -278,29 +278,15 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_mailbox *cxl_mbox,
> >  	mbox_cmd->return_code =
> >  		FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
> >  
> > -	/*
> > -	 * Handle the background command in a synchronous manner.
> > -	 *
> > -	 * All other mailbox commands will serialize/queue on the mbox_mutex,
> > -	 * which we currently hold. Furthermore this also guarantees that
> > -	 * cxl_mbox_background_complete() checks are safe amongst each other,
> > -	 * in that no new bg operation can occur in between.
> > -	 *
> > -	 * Background operations are timesliced in accordance with the nature
> > -	 * of the command. In the event of timeout, the mailbox state is
> > -	 * indeterminate until the next successful command submission and the
> > -	 * driver can get back in sync with the hardware state.
> > -	 */
> >  	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
> > -		u64 bg_status_reg;
> > -		int i, timeout;
> > -
> >  		/*
> >  		 * Sanitization is a special case which monopolizes the device
> >  		 * and cannot be timesliced. Handle asynchronously instead,
> >  		 * and allow userspace to poll(2) for completion.
> >  		 */
> >  		if (mbox_cmd->opcode == CXL_MBOX_OP_SANITIZE) {
> > +			int timeout;
> > +
> >  			if (mds->security.sanitize_active)
> >  				return -EBUSY;
> >  
> > @@ -311,44 +297,19 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_mailbox *cxl_mbox,
> >  			schedule_delayed_work(&mds->security.poll_dwork,
> >  					      timeout * HZ);
> >  			dev_dbg(dev, "Sanitization operation started\n");
> > -			goto success;
> > -		}
> > -
> > -		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
> > -			mbox_cmd->opcode);
> > -
> > -		timeout = mbox_cmd->poll_interval_ms;
> > -		for (i = 0; i < mbox_cmd->poll_count; i++) {
> > -			if (rcuwait_wait_event_timeout(&cxl_mbox->mbox_wait,
> > -						       cxl_mbox_background_complete(cxlds),
> > -						       TASK_UNINTERRUPTIBLE,
> > -						       msecs_to_jiffies(timeout)) > 0)
> > -				break;
> > -		}
> > -
> > -		if (!cxl_mbox_background_complete(cxlds)) {
> > -			dev_err(dev, "timeout waiting for background (%d ms)\n",
> > -				timeout * mbox_cmd->poll_count);
> > -			return -ETIMEDOUT;
> > +		} else {
> > +			/* pairs with release/acquire semantics */
> > +			WARN_ON_ONCE(atomic_xchg(&cxl_mbox->poll_bgop,
> > +						 mbox_cmd->opcode));
> > +			dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
> > +				mbox_cmd->opcode);
> >  		}
> > -
> > -		bg_status_reg = readq(cxlds->regs.mbox +
> > -				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> > -		mbox_cmd->return_code =
> > -			FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
> > -				  bg_status_reg);
> > -		dev_dbg(dev,
> > -			"Mailbox background operation (0x%04x) completed\n",
> > -			mbox_cmd->opcode);
> > -	}
> > -
> > -	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
> > +	} else if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
> >  		dev_dbg(dev, "Mailbox operation had an error: %s\n",
> >  			cxl_mbox_cmd_rc2str(mbox_cmd));
> >  		return 0; /* completed but caller must check return_code */
> >  	}
> >  
> > -success:
> >  	/* #7 */
> >  	cmd_reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_CMD_OFFSET);
> >  	out_len = FIELD_GET(CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK, cmd_reg);
> > @@ -378,11 +339,68 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
> >  			     struct cxl_mbox_cmd *cmd)
> >  {
> >  	int rc;
> > +	struct cxl_dev_state *cxlds = mbox_to_cxlds(cxl_mbox);
> > +	struct device *dev = cxlds->dev;
> >  
> >  	mutex_lock_io(&cxl_mbox->mbox_mutex);
> > +	/*
> > +	 * Ensure cxl_mbox_background_complete() checks are safe amongst
> > +	 * each other: no new bg operation can occur in between while polling.
> > +	 */
> > +	if (cxl_is_background_cmd(cmd->opcode)) {
> > +		if (atomic_read_acquire(&cxl_mbox->poll_bgop)) {
> > +			mutex_unlock(&cxl_mbox->mbox_mutex);
> > +			return -EBUSY;
> > +		}
> > +	}
> > +
> >  	rc = __cxl_pci_mbox_send_cmd(cxl_mbox, cmd);
> >  	mutex_unlock(&cxl_mbox->mbox_mutex);
> >  
> > +	if (cmd->return_code != CXL_MBOX_CMD_RC_BACKGROUND)
> > +		return rc;
> > +
> > +	/*
> > +	 * Handle the background command in a synchronous manner. Background
> > +	 * operations are timesliced in accordance with the nature of the
> > +	 * command.
> > +	 */
> > +	if (cmd->opcode != CXL_MBOX_OP_SANITIZE) {
> > +		int i, timeout;
> > +		u64 bg_status_reg;
> > +
> > +		timeout = cmd->poll_interval_ms;
> > +		for (i = 0; i < cmd->poll_count; i++) {
> > +			if (rcuwait_wait_event_timeout(&cxl_mbox->mbox_wait,
> > +				       cxl_mbox_background_complete(cxlds),
> > +				       TASK_UNINTERRUPTIBLE,
> > +				       msecs_to_jiffies(timeout)) > 0)
> > +				break;
> > +		}
> > +
> > +		/*
> > +		 * In the event of timeout, the mailbox state is indeterminate
> > +		 * until the next successful command submission and the driver
> > +		 * can get back in sync with the hardware state.
> > +		 */
> > +		if (!cxl_mbox_background_complete(cxlds)) {
> > +			dev_err(dev, "timeout waiting for background (%d ms)\n",
> > +				timeout * cmd->poll_count);
> > +			rc = -ETIMEDOUT;
> > +			goto done;
> > +		}
> > +
> > +		bg_status_reg = readq(cxlds->regs.mbox +
> > +				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> > +		cmd->return_code = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
> > +					     bg_status_reg);
> > +		dev_dbg(dev,
> > +			"Mailbox background operation (0x%04x) completed\n",
> > +			cmd->opcode);
> > +done:
> > +		atomic_set_release(&cxl_mbox->poll_bgop, 0);
> > +	}
> > +
> >  	return rc;
> >  }
> >  
> > diff --git a/include/cxl/mailbox.h b/include/cxl/mailbox.h
> > index bacd111e75f1..23282a3c44f1 100644
> > --- a/include/cxl/mailbox.h
> > +++ b/include/cxl/mailbox.h
> > @@ -13,6 +13,7 @@ struct cxl_mbox_cmd;
> >   *                (CXL 3.1 8.2.8.4.3 Mailbox Capabilities Register)
> >   * @mbox_mutex: mutex protects device mailbox and firmware
> >   * @mbox_wait: rcuwait for mailbox
> > + * @poll_bgop: current background operation being polled on
> >   * @mbox_send: @dev specific transport for transmitting mailbox commands
> >   */
> >  struct cxl_mailbox {
> > @@ -20,6 +21,7 @@ struct cxl_mailbox {
> >  	size_t payload_size;
> >  	struct mutex mbox_mutex; /* lock to protect mailbox context */
> >  	struct rcuwait mbox_wait;
> > +	atomic_t poll_bgop;
> >  	int (*mbox_send)(struct cxl_mailbox *cxl_mbox, struct cxl_mbox_cmd *cmd);
> >  };
> >    
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] cxl/pci: lockless background synchronous polling
  2025-01-14 16:12   ` Dave Jiang
  2025-01-14 18:40     ` Jonathan Cameron
@ 2025-01-14 19:30     ` Davidlohr Bueso
  1 sibling, 0 replies; 16+ messages in thread
From: Davidlohr Bueso @ 2025-01-14 19:30 UTC (permalink / raw)
  To: Dave Jiang
  Cc: dan.j.williams, jonathan.cameron, alison.schofield,
	vishal.l.verma, ira.weiny, fan.ni, a.manzanares,
	sthanneeru.opensrc, emirakhur, ajayjoshi, Ravis.OpenSrc,
	sthanneeru, linux-cxl, linux-kernel

On Tue, 14 Jan 2025, Dave Jiang wrote:

>On 10/21/24 8:18 PM, Davidlohr Bueso wrote:
>> +static inline bool cxl_is_background_cmd(u16 opcode)
>> +{
>> +	switch (opcode) {
>> +	case CXL_MBOX_OP_TRANSFER_FW:
>> +	case CXL_MBOX_OP_ACTIVATE_FW:
>> +	case CXL_MBOX_OP_SCAN_MEDIA:
>> +	case CXL_MBOX_OP_SANITIZE:
>> +		return true;
>> +	default:
>> +		return false;
>> +	}
>> +}
>
>Should this information be pulled from the CEL entry? We should add the code to store the 'command effect' field when walking the CEL and store this information for each command. And then we can just check bit6 of the field rather than statically program that in the kernel.

Yeah, Will do.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] cxl/mbox: support aborting the current background operation
  2025-01-14 17:00   ` Dave Jiang
@ 2025-01-14 19:33     ` Davidlohr Bueso
  2025-01-14 20:25       ` Dave Jiang
  0 siblings, 1 reply; 16+ messages in thread
From: Davidlohr Bueso @ 2025-01-14 19:33 UTC (permalink / raw)
  To: Dave Jiang
  Cc: dan.j.williams, jonathan.cameron, alison.schofield,
	vishal.l.verma, ira.weiny, fan.ni, a.manzanares,
	sthanneeru.opensrc, emirakhur, ajayjoshi, Ravis.OpenSrc,
	sthanneeru, linux-cxl, linux-kernel

On Tue, 14 Jan 2025, Dave Jiang wrote:

>On 10/21/24 8:18 PM, Davidlohr Bueso wrote:
>> CXL 3.1 introduced the ability to request that the current on-going
>> background command be aborted. Add support for this, where the current
>> policy is for the request to occur whenever a new incoming bg command
>> wants to run. As such everything is left to user discretion and it
>> becomes impossible to hog the device/mailbox.
>
>Are you trying to say that the patch is changing the current behavior to where every time a new bg command comes in, it will abort the previous one?

Yes.

>
>>
>> The context of doing the cancellation request is the same as the new
>> incoming command, and will always hold the mbox_mutex, guaranteeing
>> that any successful cancel does not race with a third thread coming
>> in and stealing the effort.
>>
>> - For Sanitize, the thread doing the will cancel the work, and clean
>
>doing the? seems to be missing a word here.

'doing the request', will update.

...

>> +/*
>> + * Return true implies that the request was successful and the on-going
>> + * background operation was in fact aborted. This also guarantees that
>> + * the respective thread is done.
>> + */
>> +static bool cxl_try_to_cancel_background(struct cxl_mailbox *cxl_mbox)
>> +{
>> +	int rc;
>> +	struct cxl_dev_state *cxlds = mbox_to_cxlds(cxl_mbox);
>> +	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>> +	struct device *dev = cxlds->dev;
>> +	struct cxl_mbox_cmd cmd = {
>> +		.opcode = CXL_MBOX_OP_REQUEST_ABORT_BG_OP
>> +	};
>> +
>> +	lockdep_assert_held(&cxl_mbox->mbox_mutex);
>> +
>> +	rc = __cxl_pci_mbox_send_cmd(cxl_mbox, &cmd);
>> +	if (rc) {
>> +		dev_dbg(dev, "Failed to send abort request : %d\n", rc);
>> +		return false;
>> +	}
>> +
>> +	if (!cxl_mbox_background_complete(cxlds))
>> +		return false;
>> +
>> +	if (mds->security.sanitize_active) {
>> +		/*
>> +		 * Cancel the work and cleanup on its behalf - we hold
>> +		 * the mbox_mutex, cannot race with cxl_mbox_sanitize_work().
>> +		 */
>> +		cancel_delayed_work_sync(&mds->security.poll_dwork);
>> +		mds->security.poll_tmo_secs = 0;
>> +		if (mds->security.sanitize_node)
>> +			sysfs_notify_dirent(mds->security.sanitize_node);
>> +		mds->security.sanitize_active = false;
>
>Should this line happen before the sysfs notification?

It is benign as we hold the lock, but yes. I will also abstract this in a helper,
such that both cxl_mbox_sanitize_work() and cxl_try_to_cancel_background() can
use it.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] cxl/mbox: support aborting the current background operation
  2025-01-14 19:33     ` Davidlohr Bueso
@ 2025-01-14 20:25       ` Dave Jiang
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Jiang @ 2025-01-14 20:25 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, jonathan.cameron, alison.schofield,
	vishal.l.verma, ira.weiny, fan.ni, a.manzanares,
	sthanneeru.opensrc, emirakhur, ajayjoshi, Ravis.OpenSrc,
	sthanneeru, linux-cxl, linux-kernel



On 1/14/25 12:33 PM, Davidlohr Bueso wrote:
> On Tue, 14 Jan 2025, Dave Jiang wrote:
> 
>> On 10/21/24 8:18 PM, Davidlohr Bueso wrote:
>>> CXL 3.1 introduced the ability to request that the current on-going
>>> background command be aborted. Add support for this, where the current
>>> policy is for the request to occur whenever a new incoming bg command
>>> wants to run. As such everything is left to user discretion and it
>>> becomes impossible to hog the device/mailbox.
>>
>> Are you trying to say that the patch is changing the current behavior to where every time a new bg command comes in, it will abort the previous one?
> 
> Yes.
Perhaps consider:
Add support for the policy where a new command request will supersede and abort the current running background command. With this new behavior, everything is left to the user's discretion. It will no longer be possible to hog the device/mailbox.

DJ

> 
>>
>>>
>>> The context of doing the cancellation request is the same as the new
>>> incoming command, and will always hold the mbox_mutex, guaranteeing
>>> that any successful cancel does not race with a third thread coming
>>> in and stealing the effort.
>>>
>>> - For Sanitize, the thread doing the will cancel the work, and clean
>>
>> doing the? seems to be missing a word here.
> 
> 'doing the request', will update.
> 
> ...
> 
>>> +/*
>>> + * Return true implies that the request was successful and the on-going
>>> + * background operation was in fact aborted. This also guarantees that
>>> + * the respective thread is done.
>>> + */
>>> +static bool cxl_try_to_cancel_background(struct cxl_mailbox *cxl_mbox)
>>> +{
>>> +    int rc;
>>> +    struct cxl_dev_state *cxlds = mbox_to_cxlds(cxl_mbox);
>>> +    struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>>> +    struct device *dev = cxlds->dev;
>>> +    struct cxl_mbox_cmd cmd = {
>>> +        .opcode = CXL_MBOX_OP_REQUEST_ABORT_BG_OP
>>> +    };
>>> +
>>> +    lockdep_assert_held(&cxl_mbox->mbox_mutex);
>>> +
>>> +    rc = __cxl_pci_mbox_send_cmd(cxl_mbox, &cmd);
>>> +    if (rc) {
>>> +        dev_dbg(dev, "Failed to send abort request : %d\n", rc);
>>> +        return false;
>>> +    }
>>> +
>>> +    if (!cxl_mbox_background_complete(cxlds))
>>> +        return false;
>>> +
>>> +    if (mds->security.sanitize_active) {
>>> +        /*
>>> +         * Cancel the work and cleanup on its behalf - we hold
>>> +         * the mbox_mutex, cannot race with cxl_mbox_sanitize_work().
>>> +         */
>>> +        cancel_delayed_work_sync(&mds->security.poll_dwork);
>>> +        mds->security.poll_tmo_secs = 0;
>>> +        if (mds->security.sanitize_node)
>>> +            sysfs_notify_dirent(mds->security.sanitize_node);
>>> +        mds->security.sanitize_active = false;
>>
>> Should this line happen before the sysfs notification?
> 
> It is benign as we hold the lock, but yes. I will also abstract this in a helper,
> such that both cxl_mbox_sanitize_work() and cxl_try_to_cancel_background() can
> use it.
> 
> Thanks,
> Davidlohr
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-01-14 20:25 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-22  3:18 [PATCH 0/3] cxl/mbox: support background operation abort requests Davidlohr Bueso
2024-10-22  3:18 ` [PATCH 1/3] cxl/pci: lockless background synchronous polling Davidlohr Bueso
2025-01-14 16:12   ` Dave Jiang
2025-01-14 18:40     ` Jonathan Cameron
2025-01-14 19:30     ` Davidlohr Bueso
2024-10-22  3:18 ` [PATCH 2/3] cxl/mbox: support aborting the current background operation Davidlohr Bueso
2025-01-14 17:00   ` Dave Jiang
2025-01-14 19:33     ` Davidlohr Bueso
2025-01-14 20:25       ` Dave Jiang
2024-10-22  3:18 ` [PATCH 3/3] cxl/pci: rename cxl_mbox_background_complete() Davidlohr Bueso
2025-01-14 17:26   ` Dave Jiang
2024-10-23  5:32 ` [PATCH 0/3] cxl/mbox: support background operation abort requests Ravis OpenSrc
2024-10-23 14:29   ` Davidlohr Bueso
2024-10-23 21:17     ` Ravis OpenSrc
2024-10-23 23:35       ` Davidlohr Bueso
2024-10-24  6:30         ` Ravis OpenSrc

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox