[PATCH 0/3] cxl: Handle background commands

Linux CXL
 help / color / mirror / Atom feed

* [PATCH 0/3] cxl: Handle background commands
@ 2023-05-02 17:18 Davidlohr Bueso
  2023-05-02 17:18 ` [PATCH 1/3] rcuwait: Support timeouts Davidlohr Bueso
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Davidlohr Bueso @ 2023-05-02 17:18 UTC (permalink / raw)
  To: dan.j.williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, dave, linux-cxl

Hi,

This decouples the general synchronous approach series to bg commands
from the sanitation changes[0], and is sent as a stand alone series with
the following changes:

o Introduced patch 1 which gives us the correct semantics to wait/wake without
  using queued wait flavors.

o Replaced global waitqueue with per-device wait. (Dave)

o The current tear down is left as is simply because kicking the wait doesn't
  necessarily mean that there is no further waits left (->poll_count), so it
  really doesn't gain us much -- if this is ever an actual issue it could be
  revisited.

o Removed bogus bg return code check at the end of __cxl_pci_mbox_send_cmd (Ming).

o Small cosmetic updates + picked up Dave's review for patch 2.

Currently there are no users, but it is expected that firmware update and scan
media be the first two to use it.

Applies against 'next' from cxl.git.

[0] https://lore.kernel.org/linux-cxl/20230421092321.12741-1-dave@stgolabs.net/

Thanks!

Davidlohr Bueso (3):
  rcuwait: Support timeouts
  cxl/pci: Allocate irq vectors earlier in pci probe
  cxl/mbox: Add background cmd handling machinery

 drivers/cxl/core/mbox.c |   3 +-
 drivers/cxl/cxl.h       |   7 +++
 drivers/cxl/cxlmem.h    |   7 +++
 drivers/cxl/pci.c       | 110 ++++++++++++++++++++++++++++++++++++++--
 include/linux/rcuwait.h |  23 +++++++--
 5 files changed, 142 insertions(+), 8 deletions(-)

--
2.40.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/3] rcuwait: Support timeouts
  2023-05-02 17:18 [PATCH 0/3] cxl: Handle background commands Davidlohr Bueso
@ 2023-05-02 17:18 ` Davidlohr Bueso
  2023-05-19 21:38   ` Dan Williams
  2023-05-02 17:18 ` [PATCH 2/3] cxl/pci: Allocate irq vectors earlier in pci probe Davidlohr Bueso
  2023-05-02 17:18 ` [PATCH 3/3] cxl/mbox: Add background cmd handling machinery Davidlohr Bueso
  2 siblings, 1 reply; 24+ messages in thread
From: Davidlohr Bueso @ 2023-05-02 17:18 UTC (permalink / raw)
  To: dan.j.williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, dave, linux-cxl, peterz, mingo

Introduce rcuwait_wait_event_timeout(), with semantics equivalent
to calls for the simple and regular waitqueue counterparts.

Cc: peterz@infradead.org
Cc: mingo@redhat.com
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
Only cc'ing sched folks this patch to avoid spamming. fyi the actual user
comes up in patch 3 (cxl/mbox: Add background cmd handling machinery),
but found a few other potential users in various drivers, so more could
be added.

 include/linux/rcuwait.h | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h
index 8052d34da782..27343424225c 100644
--- a/include/linux/rcuwait.h
+++ b/include/linux/rcuwait.h
@@ -49,9 +49,9 @@ static inline void prepare_to_rcuwait(struct rcuwait *w)
 
 extern void finish_rcuwait(struct rcuwait *w);
 
-#define rcuwait_wait_event(w, condition, state)				\
+#define ___rcuwait_wait_event(w, condition, state, ret, cmd)		\
 ({									\
-	int __ret = 0;							\
+	long __ret = ret;						\
 	prepare_to_rcuwait(w);						\
 	for (;;) {							\
 		/*							\
@@ -67,10 +67,27 @@ extern void finish_rcuwait(struct rcuwait *w);
 			break;						\
 		}							\
 									\
-		schedule();						\
+		cmd;							\
 	}								\
 	finish_rcuwait(w);						\
 	__ret;								\
 })
 
+#define rcuwait_wait_event(w, condition, state)				\
+	___rcuwait_wait_event(w, condition, state, 0, schedule())
+
+#define __rcuwait_wait_event_timeout(w, condition, state, timeout)	\
+	___rcuwait_wait_event(w, ___wait_cond_timeout(condition),	\
+			      state, timeout,				\
+			      __ret = schedule_timeout(__ret))
+
+#define rcuwait_wait_event_timeout(w, condition, state, timeout)	\
+({									\
+	long __ret = timeout;						\
+	if (!___wait_cond_timeout(condition))				\
+		__ret = __rcuwait_wait_event_timeout(w, condition,	\
+						     state, timeout);	\
+	__ret;								\
+})
+
 #endif /* _LINUX_RCUWAIT_H_ */
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* RE: [PATCH 1/3] rcuwait: Support timeouts
  2023-05-02 17:18 ` [PATCH 1/3] rcuwait: Support timeouts Davidlohr Bueso
@ 2023-05-19 21:38   ` Dan Williams
  2023-05-19 22:55     ` Davidlohr Bueso
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2023-05-19 21:38 UTC (permalink / raw)
  To: Davidlohr Bueso, dan.j.williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, dave, linux-cxl, peterz, mingo

Davidlohr Bueso wrote:
> Introduce rcuwait_wait_event_timeout(), with semantics equivalent
> to calls for the simple and regular waitqueue counterparts.

So my first reaction to this is "what is rcuwait, never heard of that
before?", and "how did Davidlohr find this facility and why does he
think it suitable here?". Then I go to read the history to get smarter
about this thing and the author is: Davidlohr Bueso <dave@stgolabs.net>,
aha!

The critical insight for me was:

https://lore.kernel.org/all/1484148146-14210-2-git-send-email-dave@stgolabs.net/

...i.e. that rcuwait is suitable for waiting for a condition to fire
while holding another exclusive lock like a mutex() as is the case with
the CXL mbox code. Did I get that right?

My recommendation would be to include some text like "Recall that
rcuwait is suitable in this scenario because..." reminders at least
until more driver people understand when and how to use it.

> Cc: peterz@infradead.org
> Cc: mingo@redhat.com
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> Only cc'ing sched folks this patch to avoid spamming. fyi the actual user
> comes up in patch 3 (cxl/mbox: Add background cmd handling machinery),
> but found a few other potential users in various drivers, so more could
> be added.

FWIW this looks like a straightforward addition of timeout support to
rcuwait_wait_event() to me.

What acks are required to take this through the CXL tree? Or are you
the rcuwait maintainer?

> 
>  include/linux/rcuwait.h | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h
> index 8052d34da782..27343424225c 100644
> --- a/include/linux/rcuwait.h
> +++ b/include/linux/rcuwait.h
> @@ -49,9 +49,9 @@ static inline void prepare_to_rcuwait(struct rcuwait *w)
>  
>  extern void finish_rcuwait(struct rcuwait *w);
>  
> -#define rcuwait_wait_event(w, condition, state)				\
> +#define ___rcuwait_wait_event(w, condition, state, ret, cmd)		\
>  ({									\
> -	int __ret = 0;							\
> +	long __ret = ret;						\
>  	prepare_to_rcuwait(w);						\
>  	for (;;) {							\
>  		/*							\
> @@ -67,10 +67,27 @@ extern void finish_rcuwait(struct rcuwait *w);
>  			break;						\
>  		}							\
>  									\
> -		schedule();						\
> +		cmd;							\
>  	}								\
>  	finish_rcuwait(w);						\
>  	__ret;								\
>  })
>  
> +#define rcuwait_wait_event(w, condition, state)				\
> +	___rcuwait_wait_event(w, condition, state, 0, schedule())
> +
> +#define __rcuwait_wait_event_timeout(w, condition, state, timeout)	\
> +	___rcuwait_wait_event(w, ___wait_cond_timeout(condition),	\
> +			      state, timeout,				\
> +			      __ret = schedule_timeout(__ret))
> +
> +#define rcuwait_wait_event_timeout(w, condition, state, timeout)	\
> +({									\
> +	long __ret = timeout;						\
> +	if (!___wait_cond_timeout(condition))				\
> +		__ret = __rcuwait_wait_event_timeout(w, condition,	\
> +						     state, timeout);	\
> +	__ret;								\
> +})
> +
>  #endif /* _LINUX_RCUWAIT_H_ */
> -- 
> 2.40.1
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] rcuwait: Support timeouts
  2023-05-19 21:38   ` Dan Williams
@ 2023-05-19 22:55     ` Davidlohr Bueso
  2023-05-20 11:03       ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Davidlohr Bueso @ 2023-05-19 22:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, linux-cxl, peterz, mingo

On Fri, 19 May 2023, Dan Williams wrote:

>Davidlohr Bueso wrote:
>> Introduce rcuwait_wait_event_timeout(), with semantics equivalent
>> to calls for the simple and regular waitqueue counterparts.
>
>So my first reaction to this is "what is rcuwait, never heard of that
>before?", and "how did Davidlohr find this facility and why does he
>think it suitable here?". Then I go to read the history to get smarter
>about this thing and the author is: Davidlohr Bueso <dave@stgolabs.net>,
>aha!
>
>The critical insight for me was:
>
>https://lore.kernel.org/all/1484148146-14210-2-git-send-email-dave@stgolabs.net/

Just a general note that per

154abafc68b (tasks, sched/core: With a grace period after finish_task_switch(), remove unnecessary code)

... that the exit_notify() caveat no longer applies.

>...i.e. that rcuwait is suitable for waiting for a condition to fire
>while holding another exclusive lock like a mutex() as is the case with
>the CXL mbox code. Did I get that right?

Yes. For this particular patchset it made sense to go this way in terms
of semantics (this use-case is obviously not a performance sensitive path,
nor do any of the rt benefits of not having to deal with a q->lock).

>My recommendation would be to include some text like "Recall that
>rcuwait is suitable in this scenario because..." reminders at least
>until more driver people understand when and how to use it.

Sure.

>> Cc: peterz@infradead.org
>> Cc: mingo@redhat.com
>> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
>> ---
>> Only cc'ing sched folks this patch to avoid spamming. fyi the actual user
>> comes up in patch 3 (cxl/mbox: Add background cmd handling machinery),
>> but found a few other potential users in various drivers, so more could
>> be added.
>
>FWIW this looks like a straightforward addition of timeout support to
>rcuwait_wait_event() to me.
>
>What acks are required to take this through the CXL tree? Or are you
>the rcuwait maintainer?

No, I am not the maintainer, changes have been routed usually through
tip folks. Conceptually at least I think Peter was ok with it, but
ultimately an explicit ack/nack would be required.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] rcuwait: Support timeouts
  2023-05-19 22:55     ` Davidlohr Bueso
@ 2023-05-20 11:03       ` Peter Zijlstra
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2023-05-20 11:03 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Dan Williams, dave.jiang, alison.schofield, vishal.l.verma,
	Jonathan.Cameron, fan.ni, a.manzanares, linux-cxl, mingo

On Fri, May 19, 2023 at 03:55:04PM -0700, Davidlohr Bueso wrote:
> On Fri, 19 May 2023, Dan Williams wrote:

> > What acks are required to take this through the CXL tree? Or are you
> > the rcuwait maintainer?
> 
> No, I am not the maintainer, changes have been routed usually through
> tip folks. Conceptually at least I think Peter was ok with it, but
> ultimately an explicit ack/nack would be required.

Yeah, would normally be me routing it through the sched tree -- that
also takes all the other wait (wait and swait) crud.

If this needs to go elsewhere,

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/3] cxl/pci: Allocate irq vectors earlier in pci probe
  2023-05-02 17:18 [PATCH 0/3] cxl: Handle background commands Davidlohr Bueso
  2023-05-02 17:18 ` [PATCH 1/3] rcuwait: Support timeouts Davidlohr Bueso
@ 2023-05-02 17:18 ` Davidlohr Bueso
  2023-05-15 10:08   ` Jonathan Cameron
  2023-05-02 17:18 ` [PATCH 3/3] cxl/mbox: Add background cmd handling machinery Davidlohr Bueso
  2 siblings, 1 reply; 24+ messages in thread
From: Davidlohr Bueso @ 2023-05-02 17:18 UTC (permalink / raw)
  To: dan.j.williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, dave, linux-cxl

Move the cxl_alloc_irq_vectors() call further up in the probing
in order to allow for mailbox interrupt usage. No change in
semantics.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 drivers/cxl/pci.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index f7a5b8e9c102..8bdf58c0c643 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -708,6 +708,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (rc)
 		dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
 
+	rc = cxl_alloc_irq_vectors(pdev);
+	if (rc)
+		return rc;
+
 	rc = cxl_pci_setup_mailbox(cxlds);
 	if (rc)
 		return rc;
@@ -732,10 +736,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (rc)
 		return rc;
 
-	rc = cxl_alloc_irq_vectors(pdev);
-	if (rc)
-		return rc;
-
 	cxlmd = devm_cxl_add_memdev(cxlds);
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] cxl/pci: Allocate irq vectors earlier in pci probe
  2023-05-02 17:18 ` [PATCH 2/3] cxl/pci: Allocate irq vectors earlier in pci probe Davidlohr Bueso
@ 2023-05-15 10:08   ` Jonathan Cameron
  0 siblings, 0 replies; 24+ messages in thread
From: Jonathan Cameron @ 2023-05-15 10:08 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, dave.jiang, alison.schofield, vishal.l.verma,
	fan.ni, a.manzanares, linux-cxl

On Tue,  2 May 2023 10:18:40 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:

> Move the cxl_alloc_irq_vectors() call further up in the probing
> in order to allow for mailbox interrupt usage. No change in
> semantics.
> 
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Chasing this patch through various places it appeared :)

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/pci.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index f7a5b8e9c102..8bdf58c0c643 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -708,6 +708,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (rc)
>  		dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
>  
> +	rc = cxl_alloc_irq_vectors(pdev);
> +	if (rc)
> +		return rc;
> +
>  	rc = cxl_pci_setup_mailbox(cxlds);
>  	if (rc)
>  		return rc;
> @@ -732,10 +736,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (rc)
>  		return rc;
>  
> -	rc = cxl_alloc_irq_vectors(pdev);
> -	if (rc)
> -		return rc;
> -
>  	cxlmd = devm_cxl_add_memdev(cxlds);
>  	if (IS_ERR(cxlmd))
>  		return PTR_ERR(cxlmd);


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 3/3] cxl/mbox: Add background cmd handling machinery
  2023-05-02 17:18 [PATCH 0/3] cxl: Handle background commands Davidlohr Bueso
  2023-05-02 17:18 ` [PATCH 1/3] rcuwait: Support timeouts Davidlohr Bueso
  2023-05-02 17:18 ` [PATCH 2/3] cxl/pci: Allocate irq vectors earlier in pci probe Davidlohr Bueso
@ 2023-05-02 17:18 ` Davidlohr Bueso
  2023-05-02 17:55   ` Davidlohr Bueso
  2023-05-03 14:57   ` [PATCH v2] " Davidlohr Bueso
  2 siblings, 2 replies; 24+ messages in thread
From: Davidlohr Bueso @ 2023-05-02 17:18 UTC (permalink / raw)
  To: dan.j.williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, dave, linux-cxl

This adds support for handling background operations, as defined in
the CXL 3.0 spec. Commands that can take too long (over ~2 seconds)
can run in the background asynchronously (to the hardware).

The driver will deal with such commands synchronously, blocking all
other incoming commands for a specified period of time, allowing
time-slicing the command such that the caller can send incremental
requests to avoid monopolizing the driver/device. This approach
makes the code simpler, where any out of sync (timeout) between the
driver and hardware is just disregarded as an invalid state until
the next successful submission.

On devices where mbox interrupts are supported, this will still use
a poller that will wakeup in the specified wait intervals. The irq
handler will simply awake the blocked cmd, which is also safe vs a
task that is either waking (timing out) or already awoken. Similarly
any irq setup error during the probing falls back to polling, thus
avoids unnecessarily erroring out.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 drivers/cxl/core/mbox.c |   3 +-
 drivers/cxl/cxl.h       |   7 +++
 drivers/cxl/cxlmem.h    |   7 +++
 drivers/cxl/pci.c       | 102 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 23b9ff920d7e..7345ed4118b0 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -220,7 +220,8 @@ int cxl_internal_send_cmd(struct cxl_dev_state *cxlds,
 	if (rc)
 		return rc;
 
-	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS)
+	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS &&
+	    mbox_cmd->return_code != CXL_MBOX_CMD_RC_BACKGROUND)
 		return cxl_mbox_cmd_rc2errno(mbox_cmd);
 
 	if (!out_size)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 044a92d9813e..72731a896f58 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -176,14 +176,21 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
 /* CXL 2.0 8.2.8.4 Mailbox Registers */
 #define CXLDEV_MBOX_CAPS_OFFSET 0x00
 #define   CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
+#define   CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK GENMASK(10, 7)
+#define   CXLDEV_MBOX_CAP_BG_CMD_IRQ BIT(6)
 #define CXLDEV_MBOX_CTRL_OFFSET 0x04
 #define   CXLDEV_MBOX_CTRL_DOORBELL BIT(0)
+#define   CXLDEV_MBOX_CTRL_BG_CMD_IRQ BIT(2)
 #define CXLDEV_MBOX_CMD_OFFSET 0x08
 #define   CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)
 #define   CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK GENMASK_ULL(36, 16)
 #define CXLDEV_MBOX_STATUS_OFFSET 0x10
+#define   CXLDEV_MBOX_STATUS_BG_CMD BIT(0)
 #define   CXLDEV_MBOX_STATUS_RET_CODE_MASK GENMASK_ULL(47, 32)
 #define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18
+#define   CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)
+#define   CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK GENMASK_ULL(22, 16)
+#define   CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK GENMASK_ULL(47, 32)
 #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
 
 /*
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index db12b6313afb..d2f751d6583c 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -5,6 +5,7 @@
 #include <uapi/linux/cxl_mem.h>
 #include <linux/cdev.h>
 #include <linux/uuid.h>
+#include <linux/rcuwait.h>
 #include "cxl.h"
 
 /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -108,6 +109,9 @@ static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
  *            variable sized output commands, it tells the exact number of bytes
  *            written.
  * @min_out: (input) internal command output payload size validation
+ * @poll_count: (input)  Number of timeouts to attempt.
+ * @poll_interval: (input) Number of ms between mailbox background command
+ *                 polling intervals timeouts.
  * @return_code: (output) Error code returned from hardware.
  *
  * This is the primary mechanism used to send commands to the hardware.
@@ -123,6 +127,8 @@ struct cxl_mbox_cmd {
 	size_t size_in;
 	size_t size_out;
 	size_t min_out;
+	int poll_count;
+	int poll_interval;
 	u16 return_code;
 };
 
@@ -329,6 +335,7 @@ struct cxl_dev_state {
 	struct cxl_event_state event;
 	struct cxl_poison_state poison;
 
+	struct rcuwait mbox_wait;
 	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
 };
 
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 8bdf58c0c643..5ca1423a4d92 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -51,6 +51,7 @@ static unsigned short mbox_ready_timeout = 60;
 module_param(mbox_ready_timeout, ushort, 0644);
 MODULE_PARM_DESC(mbox_ready_timeout, "seconds to wait for mailbox ready");
 
+
 static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
 {
 	const unsigned long start = jiffies;
@@ -84,6 +85,33 @@ static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
 			    status & CXLMDEV_DEV_FATAL ? " fatal" : "",        \
 			    status & CXLMDEV_FW_HALT ? " firmware-halt" : "")
 
+static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
+{
+	u64 reg;
+
+	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
+	return FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg) == 100;
+}
+
+static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
+{
+	struct cxl_dev_state *cxlds = id;
+
+	/* spurious or raced with hw? */
+	if (unlikely(!cxl_mbox_background_complete(cxlds))) {
+		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+
+		dev_warn(&pdev->dev,
+			 "Mailbox background operation IRQ but incomplete\n");
+		goto done;
+	}
+
+	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
+	rcuwait_wake_up(&cxlds->mbox_wait);
+done:
+	return IRQ_HANDLED;
+}
+
 /**
  * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
  * @cxlds: The device state to communicate with.
@@ -177,6 +205,57 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
 	mbox_cmd->return_code =
 		FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
 
+	/*
+	 * Handle the background command in a synchronous manner.
+	 *
+	 * All other mailbox commands will serialize/queue on the mbox_mutex,
+	 * which we currently hold. Furthermore this also guarantees that
+	 * cxl_mbox_background_complete() checks are safe amongst each other,
+	 * in that no new bg operation can occur in between.
+	 *
+	 * Background operations are timesliced in accordance with the nature
+	 * of the command. In the event of timeout, the mailbox state is
+	 * indeterminate until the next successful command submission and the
+	 * driver can get back in sync with the hardware state.
+	 */
+	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
+		int i, ret;
+		u64 bg_status_reg;
+		int timeout = mbox_cmd->poll_interval;
+
+		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
+			mbox_cmd->opcode);
+
+		for (i = 0; i < mbox_cmd->poll_count; i++) {
+			ret = rcuwait_wait_event_timeout(&cxlds->mbox_wait,
+					TASK_INTERRUPTIBLE,
+					cxl_mbox_background_complete(cxlds),
+					msecs_to_jiffies(timeout));
+			if (ret > 0)
+				break;
+			if (ret < 0) /* interrupted by a signal */
+				return ret;
+		}
+
+		if (!cxl_mbox_background_complete(cxlds)) {
+			u64 md_status =
+				readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
+
+			cxl_cmd_err(cxlds->dev, mbox_cmd, md_status,
+				    "background timeout");
+			return -ETIMEDOUT;
+		}
+
+		bg_status_reg = readq(cxlds->regs.mbox +
+				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
+		mbox_cmd->return_code =
+			FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
+				  bg_status_reg);
+		dev_dbg(dev,
+			"Mailbox background operation (0x%04x) completed\n",
+			mbox_cmd->opcode);
+	}
+
 	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
 		dev_dbg(dev, "Mailbox operation had an error: %s\n",
 			cxl_mbox_cmd_rc2str(mbox_cmd));
@@ -271,6 +350,29 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
 	dev_dbg(cxlds->dev, "Mailbox payload sized %zu",
 		cxlds->payload_size);
 
+	rcuwait_init(&cxlds->mbox_wait);
+	if (cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ) {
+		int irq, msgnum;
+		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+
+		msgnum = FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap);
+		irq = pci_irq_vector(pdev, msgnum);
+		if (irq < 0)
+			goto mbox_poll;
+
+		if (devm_request_irq(cxlds->dev, irq, cxl_pci_mbox_irq,
+				     IRQF_SHARED, NULL, cxlds))
+			goto mbox_poll;
+
+		/* only enable background cmd mbox irq support */
+		writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
+		       cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);
+
+		return 0;
+	}
+
+mbox_poll:
+	dev_dbg(cxlds->dev, "Mailbox interrupts are unsupported");
 	return 0;
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] cxl/mbox: Add background cmd handling machinery
  2023-05-02 17:18 ` [PATCH 3/3] cxl/mbox: Add background cmd handling machinery Davidlohr Bueso
@ 2023-05-02 17:55   ` Davidlohr Bueso
  2023-05-03 14:57   ` [PATCH v2] " Davidlohr Bueso
  1 sibling, 0 replies; 24+ messages in thread
From: Davidlohr Bueso @ 2023-05-02 17:55 UTC (permalink / raw)
  To: dan.j.williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, linux-cxl

On Tue, 02 May 2023, Davidlohr Bueso wrote:

>+	/*
>+	 * Handle the background command in a synchronous manner.
>+	 *
>+	 * All other mailbox commands will serialize/queue on the mbox_mutex,
>+	 * which we currently hold. Furthermore this also guarantees that
>+	 * cxl_mbox_background_complete() checks are safe amongst each other,
>+	 * in that no new bg operation can occur in between.
>+	 *
>+	 * Background operations are timesliced in accordance with the nature
>+	 * of the command. In the event of timeout, the mailbox state is
>+	 * indeterminate until the next successful command submission and the
>+	 * driver can get back in sync with the hardware state.
>+	 */
>+	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
>+		int i, ret;

bleh this "ret" really wants to be a long.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2] cxl/mbox: Add background cmd handling machinery
  2023-05-02 17:18 ` [PATCH 3/3] cxl/mbox: Add background cmd handling machinery Davidlohr Bueso
  2023-05-02 17:55   ` Davidlohr Bueso
@ 2023-05-03 14:57   ` Davidlohr Bueso
  2023-05-15 10:30     ` Jonathan Cameron
                       ` (2 more replies)
  1 sibling, 3 replies; 24+ messages in thread
From: Davidlohr Bueso @ 2023-05-03 14:57 UTC (permalink / raw)
  To: dan.j.williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, linux-cxl

This adds support for handling background operations, as defined in
the CXL 3.0 spec. Commands that can take too long (over ~2 seconds)
can run in the background asynchronously (to the hardware).

The driver will deal with such commands synchronously, blocking all
other incoming commands for a specified period of time, allowing
time-slicing the command such that the caller can send incremental
requests to avoid monopolizing the driver/device. This approach
makes the code simpler, where any out of sync (timeout) between the
driver and hardware is just disregarded as an invalid state until
the next successful submission.

On devices where mbox interrupts are supported, this will still use
a poller that will wakeup in the specified wait intervals. The irq
handler will simply awake the blocked cmd, which is also safe vs a
task that is either waking (timing out) or already awoken. Similarly
any irq setup error during the probing falls back to polling, thus
avoids unnecessarily erroring out.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---

Changes from v2:
- make the wait ret a long.
- pass correct param orders to the wait (cond and state were inversed).

  drivers/cxl/core/mbox.c |   3 +-
  drivers/cxl/cxl.h       |   7 +++
  drivers/cxl/cxlmem.h    |   7 +++
  drivers/cxl/pci.c       | 102 ++++++++++++++++++++++++++++++++++++++++
  4 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 23b9ff920d7e..7345ed4118b0 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -220,7 +220,8 @@ int cxl_internal_send_cmd(struct cxl_dev_state *cxlds,
	if (rc)
		return rc;

-	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS)
+	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS &&
+	    mbox_cmd->return_code != CXL_MBOX_CMD_RC_BACKGROUND)
		return cxl_mbox_cmd_rc2errno(mbox_cmd);

	if (!out_size)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 044a92d9813e..72731a896f58 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -176,14 +176,21 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
  /* CXL 2.0 8.2.8.4 Mailbox Registers */
  #define CXLDEV_MBOX_CAPS_OFFSET 0x00
  #define   CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
+#define   CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK GENMASK(10, 7)
+#define   CXLDEV_MBOX_CAP_BG_CMD_IRQ BIT(6)
  #define CXLDEV_MBOX_CTRL_OFFSET 0x04
  #define   CXLDEV_MBOX_CTRL_DOORBELL BIT(0)
+#define   CXLDEV_MBOX_CTRL_BG_CMD_IRQ BIT(2)
  #define CXLDEV_MBOX_CMD_OFFSET 0x08
  #define   CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)
  #define   CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK GENMASK_ULL(36, 16)
  #define CXLDEV_MBOX_STATUS_OFFSET 0x10
+#define   CXLDEV_MBOX_STATUS_BG_CMD BIT(0)
  #define   CXLDEV_MBOX_STATUS_RET_CODE_MASK GENMASK_ULL(47, 32)
  #define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18
+#define   CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)
+#define   CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK GENMASK_ULL(22, 16)
+#define   CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK GENMASK_ULL(47, 32)
  #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20

  /*
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index db12b6313afb..d2f751d6583c 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -5,6 +5,7 @@
  #include <uapi/linux/cxl_mem.h>
  #include <linux/cdev.h>
  #include <linux/uuid.h>
+#include <linux/rcuwait.h>
  #include "cxl.h"

  /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -108,6 +109,9 @@ static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
   *            variable sized output commands, it tells the exact number of bytes
   *            written.
   * @min_out: (input) internal command output payload size validation
+ * @poll_count: (input)  Number of timeouts to attempt.
+ * @poll_interval: (input) Number of ms between mailbox background command
+ *                 polling intervals timeouts.
   * @return_code: (output) Error code returned from hardware.
   *
   * This is the primary mechanism used to send commands to the hardware.
@@ -123,6 +127,8 @@ struct cxl_mbox_cmd {
	size_t size_in;
	size_t size_out;
	size_t min_out;
+	int poll_count;
+	int poll_interval;
	u16 return_code;
  };

@@ -329,6 +335,7 @@ struct cxl_dev_state {
	struct cxl_event_state event;
	struct cxl_poison_state poison;

+	struct rcuwait mbox_wait;
	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
  };

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 8bdf58c0c643..10dc4a4fbb4a 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -51,6 +51,7 @@ static unsigned short mbox_ready_timeout = 60;
  module_param(mbox_ready_timeout, ushort, 0644);
  MODULE_PARM_DESC(mbox_ready_timeout, "seconds to wait for mailbox ready");

+
  static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
  {
	const unsigned long start = jiffies;
@@ -84,6 +85,33 @@ static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
			    status & CXLMDEV_DEV_FATAL ? " fatal" : "",        \
			    status & CXLMDEV_FW_HALT ? " firmware-halt" : "")

+static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
+{
+	u64 reg;
+
+	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
+	return FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg) == 100;
+}
+
+static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
+{
+	struct cxl_dev_state *cxlds = id;
+
+	/* spurious or raced with hw? */
+	if (unlikely(!cxl_mbox_background_complete(cxlds))) {
+		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+
+		dev_warn(&pdev->dev,
+			 "Mailbox background operation IRQ but incomplete\n");
+		goto done;
+	}
+
+	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
+	rcuwait_wake_up(&cxlds->mbox_wait);
+done:
+	return IRQ_HANDLED;
+}
+
  /**
   * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
   * @cxlds: The device state to communicate with.
@@ -177,6 +205,57 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
	mbox_cmd->return_code =
		FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);

+	/*
+	 * Handle the background command in a synchronous manner.
+	 *
+	 * All other mailbox commands will serialize/queue on the mbox_mutex,
+	 * which we currently hold. Furthermore this also guarantees that
+	 * cxl_mbox_background_complete() checks are safe amongst each other,
+	 * in that no new bg operation can occur in between.
+	 *
+	 * Background operations are timesliced in accordance with the nature
+	 * of the command. In the event of timeout, the mailbox state is
+	 * indeterminate until the next successful command submission and the
+	 * driver can get back in sync with the hardware state.
+	 */
+	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
+		long ret;
+		u64 bg_status_reg;
+		int i, timeout = mbox_cmd->poll_interval;
+
+		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
+			mbox_cmd->opcode);
+
+		for (i = 0; i < mbox_cmd->poll_count; i++) {
+			ret = rcuwait_wait_event_timeout(&cxlds->mbox_wait,
+					cxl_mbox_background_complete(cxlds),
+					TASK_INTERRUPTIBLE,
+					msecs_to_jiffies(timeout));
+			if (ret > 0)
+				break;
+			if (ret < 0) /* interrupted by a signal */
+				return ret;
+		}
+
+		if (!cxl_mbox_background_complete(cxlds)) {
+			u64 md_status =
+				readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
+
+			cxl_cmd_err(cxlds->dev, mbox_cmd, md_status,
+				    "background timeout");
+			return -ETIMEDOUT;
+		}
+
+		bg_status_reg = readq(cxlds->regs.mbox +
+				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
+		mbox_cmd->return_code =
+			FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
+				  bg_status_reg);
+		dev_dbg(dev,
+			"Mailbox background operation (0x%04x) completed\n",
+			mbox_cmd->opcode);
+	}
+
	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
		dev_dbg(dev, "Mailbox operation had an error: %s\n",
			cxl_mbox_cmd_rc2str(mbox_cmd));
@@ -271,6 +350,29 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
	dev_dbg(cxlds->dev, "Mailbox payload sized %zu",
		cxlds->payload_size);

+	rcuwait_init(&cxlds->mbox_wait);
+	if (cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ) {
+		int irq, msgnum;
+		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+
+		msgnum = FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap);
+		irq = pci_irq_vector(pdev, msgnum);
+		if (irq < 0)
+			goto mbox_poll;
+
+		if (devm_request_irq(cxlds->dev, irq, cxl_pci_mbox_irq,
+				     IRQF_SHARED, NULL, cxlds))
+			goto mbox_poll;
+
+		/* only enable background cmd mbox irq support */
+		writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
+		       cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);
+
+		return 0;
+	}
+
+mbox_poll:
+	dev_dbg(cxlds->dev, "Mailbox interrupts are unsupported");
	return 0;
  }

--
2.40.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] cxl/mbox: Add background cmd handling machinery
  2023-05-03 14:57   ` [PATCH v2] " Davidlohr Bueso
@ 2023-05-15 10:30     ` Jonathan Cameron
  2023-05-15 15:40       ` Davidlohr Bueso
  2023-05-16  7:58     ` Li, Ming
  2023-05-19 23:13     ` Dan Williams
  2 siblings, 1 reply; 24+ messages in thread
From: Jonathan Cameron @ 2023-05-15 10:30 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, dave.jiang, alison.schofield, vishal.l.verma,
	fan.ni, a.manzanares, linux-cxl

On Wed, 3 May 2023 07:57:56 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:

> This adds support for handling background operations, as defined in
> the CXL 3.0 spec. Commands that can take too long (over ~2 seconds)
> can run in the background asynchronously (to the hardware).
> 
> The driver will deal with such commands synchronously, blocking all
> other incoming commands for a specified period of time, allowing
> time-slicing the command such that the caller can send incremental
> requests to avoid monopolizing the driver/device. This approach
> makes the code simpler, where any out of sync (timeout) between the
> driver and hardware is just disregarded as an invalid state until
> the next successful submission.
> 
> On devices where mbox interrupts are supported, this will still use
> a poller that will wakeup in the specified wait intervals. The irq
> handler will simply awake the blocked cmd, which is also safe vs a
> task that is either waking (timing out) or already awoken. Similarly
> any irq setup error during the probing falls back to polling, thus
> avoids unnecessarily erroring out.

This raises the question of why we don't support Doorbell Interrupts.
2 seconds is rather a long time to poll for.  I can't really remember the
reasoning but maybe it's that we don't expect anyone to every produce
hardware that takes that long. 

Ah well, job for another day perhaps.

I'm not following why the rcuwait is needed versus other options.
Perhaps some minimal text here for those of us not familiar with that
particular mechanism vs a completion of similar.

> 
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Otherwise just a comment inline on dev_warn for unexpected interrupt
on an interrupt that you've marked as IRQF_SHARED which pretty much
guarantees you need to handle unexpected interrupts as 'normal'.

Jonathan

> ---
> 
> Changes from v2:
> - make the wait ret a long.
> - pass correct param orders to the wait (cond and state were inversed).
> 
>   drivers/cxl/core/mbox.c |   3 +-
>   drivers/cxl/cxl.h       |   7 +++
>   drivers/cxl/cxlmem.h    |   7 +++
>   drivers/cxl/pci.c       | 102 ++++++++++++++++++++++++++++++++++++++++
>   4 files changed, 118 insertions(+), 1 deletion(-)


> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index db12b6313afb..d2f751d6583c 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -5,6 +5,7 @@
>   #include <uapi/linux/cxl_mem.h>
>   #include <linux/cdev.h>
>   #include <linux/uuid.h>
> +#include <linux/rcuwait.h>
>   #include "cxl.h"
> 
>   /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> @@ -108,6 +109,9 @@ static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
>    *            variable sized output commands, it tells the exact number of bytes
>    *            written.
>    * @min_out: (input) internal command output payload size validation
> + * @poll_count: (input)  Number of timeouts to attempt.
> + * @poll_interval: (input) Number of ms between mailbox background command
> + *                 polling intervals timeouts.
>    * @return_code: (output) Error code returned from hardware.
>    *
>    * This is the primary mechanism used to send commands to the hardware.
> @@ -123,6 +127,8 @@ struct cxl_mbox_cmd {
> 	size_t size_in;
> 	size_t size_out;
> 	size_t min_out;
> +	int poll_count;
> +	int poll_interval;
> 	u16 return_code;
>   };
> 
> @@ -329,6 +335,7 @@ struct cxl_dev_state {
> 	struct cxl_event_state event;
> 	struct cxl_poison_state poison;
> 
> +	struct rcuwait mbox_wait;
> 	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
>   };
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 8bdf58c0c643..10dc4a4fbb4a 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -51,6 +51,7 @@ static unsigned short mbox_ready_timeout = 60;
>   module_param(mbox_ready_timeout, ushort, 0644);
>   MODULE_PARM_DESC(mbox_ready_timeout, "seconds to wait for mailbox ready");
> 
> +

:)  You know what I'm going to say - so I won't bother saying it.


>   static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
>   {
> 	const unsigned long start = jiffies;
> @@ -84,6 +85,33 @@ static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
> 			    status & CXLMDEV_DEV_FATAL ? " fatal" : "",        \
> 			    status & CXLMDEV_FW_HALT ? " firmware-halt" : "")
> 
> +static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
> +{
> +	u64 reg;
> +
> +	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> +	return FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg) == 100;
> +}
> +
> +static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
> +{
> +	struct cxl_dev_state *cxlds = id;
> +
> +	/* spurious or raced with hw? */
> +	if (unlikely(!cxl_mbox_background_complete(cxlds))) {
> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> +
> +		dev_warn(&pdev->dev,
> +			 "Mailbox background operation IRQ but incomplete\n");

It's a shared IRQ, there are many possible reasons we might get here that
aren't things we should warn about.  Would be nice if we could detect
the race might have happened and only return IRQ_HANDLED if that's the case
as opposed to it's someone else's interrupt.

> +		goto done;
> +	}
> +
> +	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
> +	rcuwait_wake_up(&cxlds->mbox_wait);
> +done:
> +	return IRQ_HANDLED;
> +}



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] cxl/mbox: Add background cmd handling machinery
  2023-05-15 10:30     ` Jonathan Cameron
@ 2023-05-15 15:40       ` Davidlohr Bueso
  2023-05-15 16:19         ` Jonathan Cameron
  0 siblings, 1 reply; 24+ messages in thread
From: Davidlohr Bueso @ 2023-05-15 15:40 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: dan.j.williams, dave.jiang, alison.schofield, vishal.l.verma,
	fan.ni, a.manzanares, linux-cxl

On Mon, 15 May 2023, Jonathan Cameron wrote:

>On Wed, 3 May 2023 07:57:56 -0700
>Davidlohr Bueso <dave@stgolabs.net> wrote:
>
>> This adds support for handling background operations, as defined in
>> the CXL 3.0 spec. Commands that can take too long (over ~2 seconds)
>> can run in the background asynchronously (to the hardware).
>>
>> The driver will deal with such commands synchronously, blocking all
>> other incoming commands for a specified period of time, allowing
>> time-slicing the command such that the caller can send incremental
>> requests to avoid monopolizing the driver/device. This approach
>> makes the code simpler, where any out of sync (timeout) between the
>> driver and hardware is just disregarded as an invalid state until
>> the next successful submission.
>>
>> On devices where mbox interrupts are supported, this will still use
>> a poller that will wakeup in the specified wait intervals. The irq
>> handler will simply awake the blocked cmd, which is also safe vs a
>> task that is either waking (timing out) or already awoken. Similarly
>> any irq setup error during the probing falls back to polling, thus
>> avoids unnecessarily erroring out.
>
>This raises the question of why we don't support Doorbell Interrupts.
>2 seconds is rather a long time to poll for.  I can't really remember the
>reasoning but maybe it's that we don't expect anyone to every produce
>hardware that takes that long.
>
>Ah well, job for another day perhaps.
>
>I'm not following why the rcuwait is needed versus other options.
>Perhaps some minimal text here for those of us not familiar with that
>particular mechanism vs a completion of similar.

As mentioned before, rcuwait gives us the single wait/wake semantics we
need (we are serialized by the mbox_mutex) - completions use queued wait,
which is an overkill for this case.

>> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
>Otherwise just a comment inline on dev_warn for unexpected interrupt
>on an interrupt that you've marked as IRQF_SHARED which pretty much
>guarantees you need to handle unexpected interrupts as 'normal'.

Per your feedback in the the previous series, I will remove this dev_warn
altogether.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] cxl/mbox: Add background cmd handling machinery
  2023-05-15 15:40       ` Davidlohr Bueso
@ 2023-05-15 16:19         ` Jonathan Cameron
  0 siblings, 0 replies; 24+ messages in thread
From: Jonathan Cameron @ 2023-05-15 16:19 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dan.j.williams, dave.jiang, alison.schofield, vishal.l.verma,
	fan.ni, a.manzanares, linux-cxl

On Mon, 15 May 2023 08:40:30 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:

> On Mon, 15 May 2023, Jonathan Cameron wrote:
> 
> >On Wed, 3 May 2023 07:57:56 -0700
> >Davidlohr Bueso <dave@stgolabs.net> wrote:
> >  
> >> This adds support for handling background operations, as defined in
> >> the CXL 3.0 spec. Commands that can take too long (over ~2 seconds)
> >> can run in the background asynchronously (to the hardware).
> >>
> >> The driver will deal with such commands synchronously, blocking all
> >> other incoming commands for a specified period of time, allowing
> >> time-slicing the command such that the caller can send incremental
> >> requests to avoid monopolizing the driver/device. This approach
> >> makes the code simpler, where any out of sync (timeout) between the
> >> driver and hardware is just disregarded as an invalid state until
> >> the next successful submission.
> >>
> >> On devices where mbox interrupts are supported, this will still use
> >> a poller that will wakeup in the specified wait intervals. The irq
> >> handler will simply awake the blocked cmd, which is also safe vs a
> >> task that is either waking (timing out) or already awoken. Similarly
> >> any irq setup error during the probing falls back to polling, thus
> >> avoids unnecessarily erroring out.  
> >
> >This raises the question of why we don't support Doorbell Interrupts.
> >2 seconds is rather a long time to poll for.  I can't really remember the
> >reasoning but maybe it's that we don't expect anyone to every produce
> >hardware that takes that long.
> >
> >Ah well, job for another day perhaps.
> >
> >I'm not following why the rcuwait is needed versus other options.
> >Perhaps some minimal text here for those of us not familiar with that
> >particular mechanism vs a completion of similar.  
> 
> As mentioned before, rcuwait gives us the single wait/wake semantics we
> need (we are serialized by the mbox_mutex) - completions use queued wait,
> which is an overkill for this case.
Fair enough I guess. New toy that I'd not come across before.

Jonathan

> 
> >> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>  
> >Otherwise just a comment inline on dev_warn for unexpected interrupt
> >on an interrupt that you've marked as IRQF_SHARED which pretty much
> >guarantees you need to handle unexpected interrupts as 'normal'.  
> 
> Per your feedback in the the previous series, I will remove this dev_warn
> altogether.
> 
> Thanks,
> Davidlohr


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] cxl/mbox: Add background cmd handling machinery
  2023-05-03 14:57   ` [PATCH v2] " Davidlohr Bueso
  2023-05-15 10:30     ` Jonathan Cameron
@ 2023-05-16  7:58     ` Li, Ming
  2023-05-16 17:02       ` Davidlohr Bueso
  2023-05-19 23:13     ` Dan Williams
  2 siblings, 1 reply; 24+ messages in thread
From: Li, Ming @ 2023-05-16  7:58 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, linux-cxl, dan.j.williams

On 5/3/2023 10:57 PM, Davidlohr Bueso wrote:

>  /**
>   * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
>   * @cxlds: The device state to communicate with.
> @@ -177,6 +205,57 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
>     mbox_cmd->return_code =
>         FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
> 
> +    /*
> +     * Handle the background command in a synchronous manner.
> +     *
> +     * All other mailbox commands will serialize/queue on the mbox_mutex,
> +     * which we currently hold. Furthermore this also guarantees that
> +     * cxl_mbox_background_complete() checks are safe amongst each other,
> +     * in that no new bg operation can occur in between.
> +     *
> +     * Background operations are timesliced in accordance with the nature
> +     * of the command. In the event of timeout, the mailbox state is
> +     * indeterminate until the next successful command submission and the
> +     * driver can get back in sync with the hardware state.
> +     */
> +    if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
> +        long ret;
> +        u64 bg_status_reg;
> +        int i, timeout = mbox_cmd->poll_interval;
> +
> +        dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
> +            mbox_cmd->opcode);
> +
> +        for (i = 0; i < mbox_cmd->poll_count; i++) {
> +            ret = rcuwait_wait_event_timeout(&cxlds->mbox_wait,
> +                    cxl_mbox_background_complete(cxlds),
> +                    TASK_INTERRUPTIBLE,
> +                    msecs_to_jiffies(timeout));
> +            if (ret > 0)
> +                break;
> +            if (ret < 0) /* interrupted by a signal */
> +                return ret;

What do you think if adding a dev_dbg() here for outputting current percentage complete for debugging? is it helpful?

Thanks
Ming


> +        }
> +
> +        if (!cxl_mbox_background_complete(cxlds)) {
> +            u64 md_status =
> +                readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> +
> +            cxl_cmd_err(cxlds->dev, mbox_cmd, md_status,
> +                    "background timeout");
> +            return -ETIMEDOUT;
> +        }
> +
> +        bg_status_reg = readq(cxlds->regs.mbox +
> +                      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> +        mbox_cmd->return_code =
> +            FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
> +                  bg_status_reg);
> +        dev_dbg(dev,
> +            "Mailbox background operation (0x%04x) completed\n",
> +            mbox_cmd->opcode);
> +    }
> +
>     if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
>         dev_dbg(dev, "Mailbox operation had an error: %s\n",
>             cxl_mbox_cmd_rc2str(mbox_cmd));
> @@ -271,6 +350,29 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>     dev_dbg(cxlds->dev, "Mailbox payload sized %zu",
>         cxlds->payload_size);
> 
> +    rcuwait_init(&cxlds->mbox_wait);
> +    if (cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ) {
> +        int irq, msgnum;
> +        struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> +
> +        msgnum = FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap);
> +        irq = pci_irq_vector(pdev, msgnum);
> +        if (irq < 0)
> +            goto mbox_poll;
> +
> +        if (devm_request_irq(cxlds->dev, irq, cxl_pci_mbox_irq,
> +                     IRQF_SHARED, NULL, cxlds))
> +            goto mbox_poll;
> +
> +        /* only enable background cmd mbox irq support */
> +        writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
> +               cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);
> +
> +        return 0;
> +    }
> +
> +mbox_poll:
> +    dev_dbg(cxlds->dev, "Mailbox interrupts are unsupported");
>     return 0;
>  }
> 
> -- 
> 2.40.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] cxl/mbox: Add background cmd handling machinery
  2023-05-16  7:58     ` Li, Ming
@ 2023-05-16 17:02       ` Davidlohr Bueso
  0 siblings, 0 replies; 24+ messages in thread
From: Davidlohr Bueso @ 2023-05-16 17:02 UTC (permalink / raw)
  To: Li, Ming
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, linux-cxl, dan.j.williams

On Tue, 16 May 2023, Li, Ming wrote:

>On 5/3/2023 10:57 PM, Davidlohr Bueso wrote:
>
>>  /**
>>   * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
>>   * @cxlds: The device state to communicate with.
>> @@ -177,6 +205,57 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
>>     mbox_cmd->return_code =
>>         FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
>>
>> +    /*
>> +     * Handle the background command in a synchronous manner.
>> +     *
>> +     * All other mailbox commands will serialize/queue on the mbox_mutex,
>> +     * which we currently hold. Furthermore this also guarantees that
>> +     * cxl_mbox_background_complete() checks are safe amongst each other,
>> +     * in that no new bg operation can occur in between.
>> +     *
>> +     * Background operations are timesliced in accordance with the nature
>> +     * of the command. In the event of timeout, the mailbox state is
>> +     * indeterminate until the next successful command submission and the
>> +     * driver can get back in sync with the hardware state.
>> +     */
>> +    if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
>> +        long ret;
>> +        u64 bg_status_reg;
>> +        int i, timeout = mbox_cmd->poll_interval;
>> +
>> +        dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
>> +            mbox_cmd->opcode);
>> +
>> +        for (i = 0; i < mbox_cmd->poll_count; i++) {
>> +            ret = rcuwait_wait_event_timeout(&cxlds->mbox_wait,
>> +                    cxl_mbox_background_complete(cxlds),
>> +                    TASK_INTERRUPTIBLE,
>> +                    msecs_to_jiffies(timeout));
>> +            if (ret > 0)
>> +                break;
>> +            if (ret < 0) /* interrupted by a signal */
>> +                return ret;
>
>What do you think if adding a dev_dbg() here for outputting current percentage complete for debugging? is it helpful?

I don't see it being very relevant here, honestly. Something like that
might be more usful in a longer async context without timeouts.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2] cxl/mbox: Add background cmd handling machinery
  2023-05-03 14:57   ` [PATCH v2] " Davidlohr Bueso
  2023-05-15 10:30     ` Jonathan Cameron
  2023-05-16  7:58     ` Li, Ming
@ 2023-05-19 23:13     ` Dan Williams
  2023-05-22 16:58       ` Davidlohr Bueso
  2 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2023-05-19 23:13 UTC (permalink / raw)
  To: Davidlohr Bueso, dan.j.williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, linux-cxl

Davidlohr Bueso wrote:
> This adds support for handling background operations, as defined in
> the CXL 3.0 spec. Commands that can take too long (over ~2 seconds)
> can run in the background asynchronously (to the hardware).
> 
> The driver will deal with such commands synchronously, blocking all
> other incoming commands for a specified period of time, allowing
> time-slicing the command such that the caller can send incremental
> requests to avoid monopolizing the driver/device. This approach
> makes the code simpler, where any out of sync (timeout) between the
> driver and hardware is just disregarded as an invalid state until
> the next successful submission.
> 
> On devices where mbox interrupts are supported, this will still use
> a poller that will wakeup in the specified wait intervals. The irq
> handler will simply awake the blocked cmd, which is also safe vs a
> task that is either waking (timing out) or already awoken. Similarly
> any irq setup error during the probing falls back to polling, thus
> avoids unnecessarily erroring out.
> 
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> 
> Changes from v2:
> - make the wait ret a long.
> - pass correct param orders to the wait (cond and state were inversed).
> 
>   drivers/cxl/core/mbox.c |   3 +-
>   drivers/cxl/cxl.h       |   7 +++
>   drivers/cxl/cxlmem.h    |   7 +++
>   drivers/cxl/pci.c       | 102 ++++++++++++++++++++++++++++++++++++++++
>   4 files changed, 118 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 23b9ff920d7e..7345ed4118b0 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -220,7 +220,8 @@ int cxl_internal_send_cmd(struct cxl_dev_state *cxlds,
> 	if (rc)
> 		return rc;
> 
> -	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS)
> +	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS &&
> +	    mbox_cmd->return_code != CXL_MBOX_CMD_RC_BACKGROUND)
> 		return cxl_mbox_cmd_rc2errno(mbox_cmd);
> 
> 	if (!out_size)
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 044a92d9813e..72731a896f58 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -176,14 +176,21 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>   /* CXL 2.0 8.2.8.4 Mailbox Registers */
>   #define CXLDEV_MBOX_CAPS_OFFSET 0x00
>   #define   CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
> +#define   CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK GENMASK(10, 7)
> +#define   CXLDEV_MBOX_CAP_BG_CMD_IRQ BIT(6)
>   #define CXLDEV_MBOX_CTRL_OFFSET 0x04
>   #define   CXLDEV_MBOX_CTRL_DOORBELL BIT(0)
> +#define   CXLDEV_MBOX_CTRL_BG_CMD_IRQ BIT(2)
>   #define CXLDEV_MBOX_CMD_OFFSET 0x08
>   #define   CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)
>   #define   CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK GENMASK_ULL(36, 16)
>   #define CXLDEV_MBOX_STATUS_OFFSET 0x10
> +#define   CXLDEV_MBOX_STATUS_BG_CMD BIT(0)
>   #define   CXLDEV_MBOX_STATUS_RET_CODE_MASK GENMASK_ULL(47, 32)
>   #define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18
> +#define   CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)
> +#define   CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK GENMASK_ULL(22, 16)
> +#define   CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK GENMASK_ULL(47, 32)
>   #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
> 
>   /*
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index db12b6313afb..d2f751d6583c 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -5,6 +5,7 @@
>   #include <uapi/linux/cxl_mem.h>
>   #include <linux/cdev.h>
>   #include <linux/uuid.h>
> +#include <linux/rcuwait.h>
>   #include "cxl.h"
> 
>   /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> @@ -108,6 +109,9 @@ static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
>    *            variable sized output commands, it tells the exact number of bytes
>    *            written.
>    * @min_out: (input) internal command output payload size validation
> + * @poll_count: (input)  Number of timeouts to attempt.
> + * @poll_interval: (input) Number of ms between mailbox background command
> + *                 polling intervals timeouts.
>    * @return_code: (output) Error code returned from hardware.
>    *
>    * This is the primary mechanism used to send commands to the hardware.
> @@ -123,6 +127,8 @@ struct cxl_mbox_cmd {
> 	size_t size_in;
> 	size_t size_out;
> 	size_t min_out;
> +	int poll_count;
> +	int poll_interval;
> 	u16 return_code;
>   };
> 
> @@ -329,6 +335,7 @@ struct cxl_dev_state {
> 	struct cxl_event_state event;
> 	struct cxl_poison_state poison;
> 
> +	struct rcuwait mbox_wait;
> 	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
>   };
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 8bdf58c0c643..10dc4a4fbb4a 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -51,6 +51,7 @@ static unsigned short mbox_ready_timeout = 60;
>   module_param(mbox_ready_timeout, ushort, 0644);
>   MODULE_PARM_DESC(mbox_ready_timeout, "seconds to wait for mailbox ready");
> 
> +
>   static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
>   {
> 	const unsigned long start = jiffies;
> @@ -84,6 +85,33 @@ static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
> 			    status & CXLMDEV_DEV_FATAL ? " fatal" : "",        \
> 			    status & CXLMDEV_FW_HALT ? " firmware-halt" : "")
> 
> +static bool cxl_mbox_background_complete(struct cxl_dev_state *cxlds)
> +{
> +	u64 reg;
> +
> +	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> +	return FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK, reg) == 100;
> +}
> +
> +static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
> +{
> +	struct cxl_dev_state *cxlds = id;
> +
> +	/* spurious or raced with hw? */
> +	if (unlikely(!cxl_mbox_background_complete(cxlds))) {

I expect this will fire all the time. While the specification allows for
a vector per-reason there's nothing stopping an implementation from
using one vector for all reasons. There's little incentive for spending
silicon gates on exclusive vectors since none of the CXL interrupts
service software fast paths.

> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> +
> +		dev_warn(&pdev->dev,
> +			 "Mailbox background operation IRQ but incomplete\n");
> +		goto done;
> +	}
> +
> +	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
> +	rcuwait_wake_up(&cxlds->mbox_wait);
> +done:
> +	return IRQ_HANDLED;
> +}
> +
>   /**
>    * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
>    * @cxlds: The device state to communicate with.
> @@ -177,6 +205,57 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
> 	mbox_cmd->return_code =
> 		FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
> 
> +	/*
> +	 * Handle the background command in a synchronous manner.
> +	 *
> +	 * All other mailbox commands will serialize/queue on the mbox_mutex,
> +	 * which we currently hold. Furthermore this also guarantees that
> +	 * cxl_mbox_background_complete() checks are safe amongst each other,
> +	 * in that no new bg operation can occur in between.
> +	 *
> +	 * Background operations are timesliced in accordance with the nature
> +	 * of the command. In the event of timeout, the mailbox state is
> +	 * indeterminate until the next successful command submission and the
> +	 * driver can get back in sync with the hardware state.
> +	 */
> +	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
> +		long ret;
> +		u64 bg_status_reg;
> +		int i, timeout = mbox_cmd->poll_interval;
> +
> +		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
> +			mbox_cmd->opcode);
> +
> +		for (i = 0; i < mbox_cmd->poll_count; i++) {
> +			ret = rcuwait_wait_event_timeout(&cxlds->mbox_wait,
> +					cxl_mbox_background_complete(cxlds),
> +					TASK_INTERRUPTIBLE,
> +					msecs_to_jiffies(timeout));
> +			if (ret > 0)
> +				break;
> +			if (ret < 0) /* interrupted by a signal */

Have you tested this path? I am curious what happens on the next command
submission? As far as I can see foreground commands would not be
impacted but the next background submission would fail instead of wait,
right?

Perhaps a dev_err_ratelimited() message and an EBUSY status for when
that happens to at least indicate, "hardware is still chewing on a
cancelled request".  Later we can think about whether that's recoverable
via a reset.

> +				return ret;
> +		}
> +
> +		if (!cxl_mbox_background_complete(cxlds)) {
> +			u64 md_status =
> +				readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> +
> +			cxl_cmd_err(cxlds->dev, mbox_cmd, md_status,
> +				    "background timeout");
> +			return -ETIMEDOUT;
> +		}
> +
> +		bg_status_reg = readq(cxlds->regs.mbox +
> +				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> +		mbox_cmd->return_code =
> +			FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
> +				  bg_status_reg);
> +		dev_dbg(dev,
> +			"Mailbox background operation (0x%04x) completed\n",
> +			mbox_cmd->opcode);
> +	}
> +
> 	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
> 		dev_dbg(dev, "Mailbox operation had an error: %s\n",
> 			cxl_mbox_cmd_rc2str(mbox_cmd));
> @@ -271,6 +350,29 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
> 	dev_dbg(cxlds->dev, "Mailbox payload sized %zu",
> 		cxlds->payload_size);
> 
> +	rcuwait_init(&cxlds->mbox_wait);
> +	if (cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ) {
> +		int irq, msgnum;
> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> +
> +		msgnum = FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap);
> +		irq = pci_irq_vector(pdev, msgnum);
> +		if (irq < 0)
> +			goto mbox_poll;
> +
> +		if (devm_request_irq(cxlds->dev, irq, cxl_pci_mbox_irq,
> +				     IRQF_SHARED, NULL, cxlds))

This needs IRQF_ONESHOT and that @dev_id needs to be globally unique in
case @irq is shared. So I think you want to factor out a common helper
from cxl_event_req_irq() that this can reuse. I.e. cxl_event_req_irq()
already gets the flags and @dev_id right, so just create a cxl_req_irq()
that takes the irq as an argument and share it.

> +			goto mbox_poll;
> +
> +		/* only enable background cmd mbox irq support */
> +		writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
> +		       cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);

Just in case some other code ever enables doorbell interrupts it would
be nice if this performed a read-modify-write rather than assuming it
can clobber that setting.

> +
> +		return 0;
> +	}
> +
> +mbox_poll:
> +	dev_dbg(cxlds->dev, "Mailbox interrupts are unsupported");
> 	return 0;
>   }
> 
> --
> 2.40.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] cxl/mbox: Add background cmd handling machinery
  2023-05-19 23:13     ` Dan Williams
@ 2023-05-22 16:58       ` Davidlohr Bueso
  2023-05-22 18:19         ` Dan Williams
  0 siblings, 1 reply; 24+ messages in thread
From: Davidlohr Bueso @ 2023-05-22 16:58 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, linux-cxl

On Fri, 19 May 2023, Dan Williams wrote:

>> +static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>> +{
>> +	struct cxl_dev_state *cxlds = id;
>> +
>> +	/* spurious or raced with hw? */
>> +	if (unlikely(!cxl_mbox_background_complete(cxlds))) {
>
>I expect this will fire all the time. While the specification allows for
>a vector per-reason there's nothing stopping an implementation from
>using one vector for all reasons. There's little incentive for spending
>silicon gates on exclusive vectors since none of the CXL interrupts
>service software fast paths.

fyi I have removed the warn below entirely. Handler is now simply:

	if (cxl_mbox_background_complete())
		wake_up();
	return IRQ_HANDLED;

>
>> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
>> +
>> +		dev_warn(&pdev->dev,
>> +			 "Mailbox background operation IRQ but incomplete\n");
>> +		goto done;
>> +	}
>> +
>> +	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
>> +	rcuwait_wake_up(&cxlds->mbox_wait);
>> +done:
>> +	return IRQ_HANDLED;
>> +}
>> +
>>   /**
>>    * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
>>    * @cxlds: The device state to communicate with.
>> @@ -177,6 +205,57 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
>>	mbox_cmd->return_code =
>>		FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
>>
>> +	/*
>> +	 * Handle the background command in a synchronous manner.
>> +	 *
>> +	 * All other mailbox commands will serialize/queue on the mbox_mutex,
>> +	 * which we currently hold. Furthermore this also guarantees that
>> +	 * cxl_mbox_background_complete() checks are safe amongst each other,
>> +	 * in that no new bg operation can occur in between.
>> +	 *
>> +	 * Background operations are timesliced in accordance with the nature
>> +	 * of the command. In the event of timeout, the mailbox state is
>> +	 * indeterminate until the next successful command submission and the
>> +	 * driver can get back in sync with the hardware state.
>> +	 */
>> +	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
>> +		long ret;
>> +		u64 bg_status_reg;
>> +		int i, timeout = mbox_cmd->poll_interval;
>> +
>> +		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
>> +			mbox_cmd->opcode);
>> +
>> +		for (i = 0; i < mbox_cmd->poll_count; i++) {
>> +			ret = rcuwait_wait_event_timeout(&cxlds->mbox_wait,
>> +					cxl_mbox_background_complete(cxlds),
>> +					TASK_INTERRUPTIBLE,
>> +					msecs_to_jiffies(timeout));
>> +			if (ret > 0)
>> +				break;
>> +			if (ret < 0) /* interrupted by a signal */
>
>Have you tested this path? I am curious what happens on the next command
>submission? As far as I can see foreground commands would not be
>impacted but the next background submission would fail instead of wait,
>right?

No, I haven't tested this path. My expectation here is that this would be
similar to the timeout case in that the driver returns but the hw keeps
processing the command. The next command, fg or bg, would be at the mercy
of the hw doing the right thing until it's done with the canceled one.

>Perhaps a dev_err_ratelimited() message and an EBUSY status for when
>that happens to at least indicate, "hardware is still chewing on a
>cancelled request".  Later we can think about whether that's recoverable
>via a reset.

Well but reaching this path already assumes hw didn't error out the current
command because the canceled one was still being run.

Alternatively we could also just make the sleep uninterruptible, but because
this comes from the user triggering the bg op, I thought against it.

>> +				return ret;

rcuwait, unlike regular wait, returns -EINTR instead of -ERESTARTSYS. I've
updated the return value here to the latter.

>> +		}
>> +
>> +		if (!cxl_mbox_background_complete(cxlds)) {
>> +			u64 md_status =
>> +				readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
>> +
>> +			cxl_cmd_err(cxlds->dev, mbox_cmd, md_status,
>> +				    "background timeout");
>> +			return -ETIMEDOUT;
>> +		}
>> +
>> +		bg_status_reg = readq(cxlds->regs.mbox +
>> +				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
>> +		mbox_cmd->return_code =
>> +			FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
>> +				  bg_status_reg);
>> +		dev_dbg(dev,
>> +			"Mailbox background operation (0x%04x) completed\n",
>> +			mbox_cmd->opcode);
>> +	}
>> +
>>	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
>>		dev_dbg(dev, "Mailbox operation had an error: %s\n",
>>			cxl_mbox_cmd_rc2str(mbox_cmd));
>> @@ -271,6 +350,29 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>>	dev_dbg(cxlds->dev, "Mailbox payload sized %zu",
>>		cxlds->payload_size);
>>
>> +	rcuwait_init(&cxlds->mbox_wait);
>> +	if (cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ) {
>> +		int irq, msgnum;
>> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
>> +
>> +		msgnum = FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap);
>> +		irq = pci_irq_vector(pdev, msgnum);
>> +		if (irq < 0)
>> +			goto mbox_poll;
>> +
>> +		if (devm_request_irq(cxlds->dev, irq, cxl_pci_mbox_irq,
>> +				     IRQF_SHARED, NULL, cxlds))
>
>This needs IRQF_ONESHOT and that @dev_id needs to be globally unique in
>case @irq is shared. So I think you want to factor out a common helper
>from cxl_event_req_irq() that this can reuse. I.e. cxl_event_req_irq()
>already gets the flags and @dev_id right, so just create a cxl_req_irq()
>that takes the irq as an argument and share it.

For the oneshot this was a hardirq, so I don't think this is needed, but
regardless, I've added the following helper in a new patch which both
events and mbox can call:

static int cxl_request_irq(struct cxl_dev_state *cxlds,
			  int irq, irq_handler_t handler)
{
        struct device *dev = cxlds->dev;
        struct cxl_dev_id *dev_id;

        /* dev_id must be globally unique and must contain the cxlds */
        dev_id = devm_kzalloc(dev, sizeof(*dev_id), GFP_KERNEL);
        if (!dev_id)
	       return -ENOMEM;
        dev_id->cxlds = cxlds;

        return devm_request_threaded_irq(dev, irq, NULL, handler,
					IRQF_SHARED | IRQF_ONESHOT, NULL,
					dev_id);
}

>
>> +			goto mbox_poll;
>> +
>> +		/* only enable background cmd mbox irq support */
>> +		writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
>> +		       cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);
>
>Just in case some other code ever enables doorbell interrupts it would
>be nice if this performed a read-modify-write rather than assuming it
>can clobber that setting.

Hmm do you mean enabling/disabling dynamically? Otherwise I would have
expected this to be a one time thing which we could just OR both bits.
Maybe this could be revisited when needed? Jonathan suggested enabling
doorbell irqs in the future, which I plan on looking into at some point.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] cxl/mbox: Add background cmd handling machinery
  2023-05-22 16:58       ` Davidlohr Bueso
@ 2023-05-22 18:19         ` Dan Williams
  2023-05-22 18:57           ` Davidlohr Bueso
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2023-05-22 18:19 UTC (permalink / raw)
  To: Davidlohr Bueso, Dan Williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, linux-cxl

Davidlohr Bueso wrote:
> On Fri, 19 May 2023, Dan Williams wrote:
> 
> >> +static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
> >> +{
> >> +	struct cxl_dev_state *cxlds = id;
> >> +
> >> +	/* spurious or raced with hw? */
> >> +	if (unlikely(!cxl_mbox_background_complete(cxlds))) {
> >
> >I expect this will fire all the time. While the specification allows for
> >a vector per-reason there's nothing stopping an implementation from
> >using one vector for all reasons. There's little incentive for spending
> >silicon gates on exclusive vectors since none of the CXL interrupts
> >service software fast paths.
> 
> fyi I have removed the warn below entirely. Handler is now simply:
> 
> 	if (cxl_mbox_background_complete())
> 		wake_up();
> 	return IRQ_HANDLED;
> 
> >
> >> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> >> +
> >> +		dev_warn(&pdev->dev,
> >> +			 "Mailbox background operation IRQ but incomplete\n");
> >> +		goto done;
> >> +	}
> >> +
> >> +	/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
> >> +	rcuwait_wake_up(&cxlds->mbox_wait);
> >> +done:
> >> +	return IRQ_HANDLED;
> >> +}
> >> +
> >>   /**
> >>    * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
> >>    * @cxlds: The device state to communicate with.
> >> @@ -177,6 +205,57 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
> >>	mbox_cmd->return_code =
> >>		FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
> >>
> >> +	/*
> >> +	 * Handle the background command in a synchronous manner.
> >> +	 *
> >> +	 * All other mailbox commands will serialize/queue on the mbox_mutex,
> >> +	 * which we currently hold. Furthermore this also guarantees that
> >> +	 * cxl_mbox_background_complete() checks are safe amongst each other,
> >> +	 * in that no new bg operation can occur in between.
> >> +	 *
> >> +	 * Background operations are timesliced in accordance with the nature
> >> +	 * of the command. In the event of timeout, the mailbox state is
> >> +	 * indeterminate until the next successful command submission and the
> >> +	 * driver can get back in sync with the hardware state.
> >> +	 */
> >> +	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
> >> +		long ret;
> >> +		u64 bg_status_reg;
> >> +		int i, timeout = mbox_cmd->poll_interval;
> >> +
> >> +		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
> >> +			mbox_cmd->opcode);
> >> +
> >> +		for (i = 0; i < mbox_cmd->poll_count; i++) {
> >> +			ret = rcuwait_wait_event_timeout(&cxlds->mbox_wait,
> >> +					cxl_mbox_background_complete(cxlds),
> >> +					TASK_INTERRUPTIBLE,
> >> +					msecs_to_jiffies(timeout));
> >> +			if (ret > 0)
> >> +				break;
> >> +			if (ret < 0) /* interrupted by a signal */
> >
> >Have you tested this path? I am curious what happens on the next command
> >submission? As far as I can see foreground commands would not be
> >impacted but the next background submission would fail instead of wait,
> >right?
> 
> No, I haven't tested this path. My expectation here is that this would be
> similar to the timeout case in that the driver returns but the hw keeps
> processing the command. The next command, fg or bg, would be at the mercy
> of the hw doing the right thing until it's done with the canceled one.

Right, it's that "mercy" part where there needs to be documentation of
what to expect.

If someone sends SIGTERM after triggering a scan media and then
immediately resubmits it maybe the right answer is to go into another
interruptible sleep awaiting the last background state to clear out. It
just seems like returning an error leaves userspace to fend for itself
with little visibility of what it needs to do next. SIGTERM during a
background command means, "I do not care about the previous result".
Given the driver is mitigating long running background cycles that
implies that waiting before submitting the next colliding command would
not be onerous. They can always SIGTERM again if the wait gets too long.


> >Perhaps a dev_err_ratelimited() message and an EBUSY status for when
> >that happens to at least indicate, "hardware is still chewing on a
> >cancelled request".  Later we can think about whether that's recoverable
> >via a reset.
> 
> Well but reaching this path already assumes hw didn't error out the current
> command because the canceled one was still being run.
> 
> Alternatively we could also just make the sleep uninterruptible, but because
> this comes from the user triggering the bg op, I thought against it.

I don't think that would solve the problem of how to detect when the
driver is out of sync with the device and what to do to resolve that
situation. An inopportune SIGTERM is indistinguishable from getting the
polling interval wrong.

> 
> >> +				return ret;
> 
> rcuwait, unlike regular wait, returns -EINTR instead of -ERESTARTSYS. I've
> updated the return value here to the latter.
> 
> >> +		}
> >> +
> >> +		if (!cxl_mbox_background_complete(cxlds)) {
> >> +			u64 md_status =
> >> +				readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> >> +
> >> +			cxl_cmd_err(cxlds->dev, mbox_cmd, md_status,
> >> +				    "background timeout");
> >> +			return -ETIMEDOUT;
> >> +		}
> >> +
> >> +		bg_status_reg = readq(cxlds->regs.mbox +
> >> +				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> >> +		mbox_cmd->return_code =
> >> +			FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
> >> +				  bg_status_reg);
> >> +		dev_dbg(dev,
> >> +			"Mailbox background operation (0x%04x) completed\n",
> >> +			mbox_cmd->opcode);
> >> +	}
> >> +
> >>	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
> >>		dev_dbg(dev, "Mailbox operation had an error: %s\n",
> >>			cxl_mbox_cmd_rc2str(mbox_cmd));
> >> @@ -271,6 +350,29 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
> >>	dev_dbg(cxlds->dev, "Mailbox payload sized %zu",
> >>		cxlds->payload_size);
> >>
> >> +	rcuwait_init(&cxlds->mbox_wait);
> >> +	if (cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ) {
> >> +		int irq, msgnum;
> >> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> >> +
> >> +		msgnum = FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap);
> >> +		irq = pci_irq_vector(pdev, msgnum);
> >> +		if (irq < 0)
> >> +			goto mbox_poll;
> >> +
> >> +		if (devm_request_irq(cxlds->dev, irq, cxl_pci_mbox_irq,
> >> +				     IRQF_SHARED, NULL, cxlds))
> >
> >This needs IRQF_ONESHOT and that @dev_id needs to be globally unique in
> >case @irq is shared. So I think you want to factor out a common helper
> >from cxl_event_req_irq() that this can reuse. I.e. cxl_event_req_irq()
> >already gets the flags and @dev_id right, so just create a cxl_req_irq()
> >that takes the irq as an argument and share it.
> 
> For the oneshot this was a hardirq, so I don't think this is needed, but
> regardless, I've added the following helper in a new patch which both
> events and mbox can call:

Ah, right, this was hard-irq only, I missed that.

I wouldn't be opposed to cxl_request_irq() following pci_request_irq()'s
lead and taking multiple handler arguments to skip the thread invocation
when it is not needed.

> 
> static int cxl_request_irq(struct cxl_dev_state *cxlds,
> 			  int irq, irq_handler_t handler)
> {
>         struct device *dev = cxlds->dev;
>         struct cxl_dev_id *dev_id;
> 
>         /* dev_id must be globally unique and must contain the cxlds */
>         dev_id = devm_kzalloc(dev, sizeof(*dev_id), GFP_KERNEL);
>         if (!dev_id)
> 	       return -ENOMEM;
>         dev_id->cxlds = cxlds;
> 
>         return devm_request_threaded_irq(dev, irq, NULL, handler,
> 					IRQF_SHARED | IRQF_ONESHOT, NULL,
> 					dev_id);
> }
> 
> >
> >> +			goto mbox_poll;
> >> +
> >> +		/* only enable background cmd mbox irq support */
> >> +		writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
> >> +		       cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);
> >
> >Just in case some other code ever enables doorbell interrupts it would
> >be nice if this performed a read-modify-write rather than assuming it
> >can clobber that setting.
> 
> Hmm do you mean enabling/disabling dynamically?

No, grep for:

"write.*CTRL"

...and observe that all writes to control registers are doing a
read-modify-write sequence.

> Otherwise I would have
> expected this to be a one time thing which we could just OR both bits.
> Maybe this could be revisited when needed? Jonathan suggested enabling
> doorbell irqs in the future, which I plan on looking into at some point.

Sure, that code might be in another function also performing its own
read-modify-write, so my comment just asking for this to be consistent
with other control register updates in the driver.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] cxl/mbox: Add background cmd handling machinery
  2023-05-22 18:19         ` Dan Williams
@ 2023-05-22 18:57           ` Davidlohr Bueso
  2023-05-22 20:16             ` Dan Williams
  0 siblings, 1 reply; 24+ messages in thread
From: Davidlohr Bueso @ 2023-05-22 18:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, linux-cxl

On Mon, 22 May 2023, Dan Williams wrote:

>Davidlohr Bueso wrote:
>> >> +	/*
>> >> +	 * Handle the background command in a synchronous manner.
>> >> +	 *
>> >> +	 * All other mailbox commands will serialize/queue on the mbox_mutex,
>> >> +	 * which we currently hold. Furthermore this also guarantees that
>> >> +	 * cxl_mbox_background_complete() checks are safe amongst each other,
>> >> +	 * in that no new bg operation can occur in between.
>> >> +	 *
>> >> +	 * Background operations are timesliced in accordance with the nature
>> >> +	 * of the command. In the event of timeout, the mailbox state is
>> >> +	 * indeterminate until the next successful command submission and the
>> >> +	 * driver can get back in sync with the hardware state.
>> >> +	 */
>> >> +	if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
>> >> +		long ret;
>> >> +		u64 bg_status_reg;
>> >> +		int i, timeout = mbox_cmd->poll_interval;
>> >> +
>> >> +		dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
>> >> +			mbox_cmd->opcode);
>> >> +
>> >> +		for (i = 0; i < mbox_cmd->poll_count; i++) {
>> >> +			ret = rcuwait_wait_event_timeout(&cxlds->mbox_wait,
>> >> +					cxl_mbox_background_complete(cxlds),
>> >> +					TASK_INTERRUPTIBLE,
>> >> +					msecs_to_jiffies(timeout));
>> >> +			if (ret > 0)
>> >> +				break;
>> >> +			if (ret < 0) /* interrupted by a signal */
>> >
>> >Have you tested this path? I am curious what happens on the next command
>> >submission? As far as I can see foreground commands would not be
>> >impacted but the next background submission would fail instead of wait,
>> >right?
>>
>> No, I haven't tested this path. My expectation here is that this would be
>> similar to the timeout case in that the driver returns but the hw keeps
>> processing the command. The next command, fg or bg, would be at the mercy
>> of the hw doing the right thing until it's done with the canceled one.
>
>Right, it's that "mercy" part where there needs to be documentation of
>what to expect.
>
>If someone sends SIGTERM after triggering a scan media and then
>immediately resubmits it maybe the right answer is to go into another
>interruptible sleep awaiting the last background state to clear out. It
>just seems like returning an error leaves userspace to fend for itself
>with little visibility of what it needs to do next. SIGTERM during a
>background command means, "I do not care about the previous result".
>Given the driver is mitigating long running background cycles that
>implies that waiting before submitting the next colliding command would
>not be onerous. They can always SIGTERM again if the wait gets too long.
>
>
>> >Perhaps a dev_err_ratelimited() message and an EBUSY status for when
>> >that happens to at least indicate, "hardware is still chewing on a
>> >cancelled request".  Later we can think about whether that's recoverable
>> >via a reset.
>>
>> Well but reaching this path already assumes hw didn't error out the current
>> command because the canceled one was still being run.
>>
>> Alternatively we could also just make the sleep uninterruptible, but because
>> this comes from the user triggering the bg op, I thought against it.
>
>I don't think that would solve the problem of how to detect when the
>driver is out of sync with the device and what to do to resolve that
>situation. An inopportune SIGTERM is indistinguishable from getting the
>polling interval wrong.

It at least removes one point of potential disambiguity. This out of sync
issue with timeouts was one of the reasons I originally proposed dealing
with all bg commands asynchronously (which doesn't prevent users from still
time-slicing). This approach is supposed to make the code simpler, but leaves
this mess. Currently we state:

	 * In the event of timeout, the mailbox state is
	 * indeterminate until the next successful command submission and the
	 * driver can get back in sync with the hardware state.

>
>>
>> >> +				return ret;
>>
>> rcuwait, unlike regular wait, returns -EINTR instead of -ERESTARTSYS. I've
>> updated the return value here to the latter.
>>
>> >> +		}
>> >> +
>> >> +		if (!cxl_mbox_background_complete(cxlds)) {
>> >> +			u64 md_status =
>> >> +				readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
>> >> +
>> >> +			cxl_cmd_err(cxlds->dev, mbox_cmd, md_status,
>> >> +				    "background timeout");
>> >> +			return -ETIMEDOUT;
>> >> +		}
>> >> +
>> >> +		bg_status_reg = readq(cxlds->regs.mbox +
>> >> +				      CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
>> >> +		mbox_cmd->return_code =
>> >> +			FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
>> >> +				  bg_status_reg);
>> >> +		dev_dbg(dev,
>> >> +			"Mailbox background operation (0x%04x) completed\n",
>> >> +			mbox_cmd->opcode);
>> >> +	}
>> >> +
>> >>	if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
>> >>		dev_dbg(dev, "Mailbox operation had an error: %s\n",
>> >>			cxl_mbox_cmd_rc2str(mbox_cmd));
>> >> @@ -271,6 +350,29 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
>> >>	dev_dbg(cxlds->dev, "Mailbox payload sized %zu",
>> >>		cxlds->payload_size);
>> >>
>> >> +	rcuwait_init(&cxlds->mbox_wait);
>> >> +	if (cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ) {
>> >> +		int irq, msgnum;
>> >> +		struct pci_dev *pdev = to_pci_dev(cxlds->dev);
>> >> +
>> >> +		msgnum = FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap);
>> >> +		irq = pci_irq_vector(pdev, msgnum);
>> >> +		if (irq < 0)
>> >> +			goto mbox_poll;
>> >> +
>> >> +		if (devm_request_irq(cxlds->dev, irq, cxl_pci_mbox_irq,
>> >> +				     IRQF_SHARED, NULL, cxlds))
>> >
>> >This needs IRQF_ONESHOT and that @dev_id needs to be globally unique in
>> >case @irq is shared. So I think you want to factor out a common helper
>> >from cxl_event_req_irq() that this can reuse. I.e. cxl_event_req_irq()
>> >already gets the flags and @dev_id right, so just create a cxl_req_irq()
>> >that takes the irq as an argument and share it.
>>
>> For the oneshot this was a hardirq, so I don't think this is needed, but
>> regardless, I've added the following helper in a new patch which both
>> events and mbox can call:
>
>Ah, right, this was hard-irq only, I missed that.
>
>I wouldn't be opposed to cxl_request_irq() following pci_request_irq()'s
>lead and taking multiple handler arguments to skip the thread invocation
>when it is not needed.

Sure. I was fine with just doing everything in the BH, but allowing that
flexibility seems best.

>>
>> static int cxl_request_irq(struct cxl_dev_state *cxlds,
>>			  int irq, irq_handler_t handler)
>> {
>>         struct device *dev = cxlds->dev;
>>         struct cxl_dev_id *dev_id;
>>
>>         /* dev_id must be globally unique and must contain the cxlds */
>>         dev_id = devm_kzalloc(dev, sizeof(*dev_id), GFP_KERNEL);
>>         if (!dev_id)
>>	       return -ENOMEM;
>>         dev_id->cxlds = cxlds;
>>
>>         return devm_request_threaded_irq(dev, irq, NULL, handler,
>>					IRQF_SHARED | IRQF_ONESHOT, NULL,
>>					dev_id);
>> }
>>
>> >
>> >> +			goto mbox_poll;
>> >> +
>> >> +		/* only enable background cmd mbox irq support */
>> >> +		writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
>> >> +		       cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);
>> >
>> >Just in case some other code ever enables doorbell interrupts it would
>> >be nice if this performed a read-modify-write rather than assuming it
>> >can clobber that setting.
>>
>> Hmm do you mean enabling/disabling dynamically?
>
>No, grep for:
>
>"write.*CTRL"
>
>...and observe that all writes to control registers are doing a
>read-modify-write sequence.

Oh, ok I get it.

>> Otherwise I would have
>> expected this to be a one time thing which we could just OR both bits.
>> Maybe this could be revisited when needed? Jonathan suggested enabling
>> doorbell irqs in the future, which I plan on looking into at some point.
>
>Sure, that code might be in another function also performing its own
>read-modify-write, so my comment just asking for this to be consistent
>with other control register updates in the driver.

Thanks for clarifying, I had not seen it like that - will definitely update
in the next version.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] cxl/mbox: Add background cmd handling machinery
  2023-05-22 18:57           ` Davidlohr Bueso
@ 2023-05-22 20:16             ` Dan Williams
  2023-05-22 20:28               ` Davidlohr Bueso
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2023-05-22 20:16 UTC (permalink / raw)
  To: Davidlohr Bueso, Dan Williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, linux-cxl

Davidlohr Bueso wrote:
> It at least removes one point of potential disambiguity. This out of sync
> issue with timeouts was one of the reasons I originally proposed dealing
> with all bg commands asynchronously (which doesn't prevent users from still
> time-slicing). This approach is supposed to make the code simpler, but leaves
> this mess. Currently we state:
> 
> 	 * In the event of timeout, the mailbox state is
> 	 * indeterminate until the next successful command submission and the
> 	 * driver can get back in sync with the hardware state.

Right, but my struggle is how to explain to userspace what to do next.
Does userspace keep submitting until it gets a successful acceptance?
How does userspace know that its failure is induced by background
command collision?

That's why I think it is ok to block and wait on the previous background
command completion when the next command is submitted. That seems to satisfy
least surprise. The submitter asked for the last command to end in
failure and is asking for the next one to queue up and complete.

I think this means that cxl_mbox_background_complete() needs to be called
before every submission and needs to check for "background operation"
(mailbox status bit0) being cleared to manage potential collisions.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] cxl/mbox: Add background cmd handling machinery
  2023-05-22 20:16             ` Dan Williams
@ 2023-05-22 20:28               ` Davidlohr Bueso
  2023-05-22 21:21                 ` Dan Williams
  0 siblings, 1 reply; 24+ messages in thread
From: Davidlohr Bueso @ 2023-05-22 20:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, linux-cxl

On Mon, 22 May 2023, Dan Williams wrote:

>Davidlohr Bueso wrote:
>> It at least removes one point of potential disambiguity. This out of sync
>> issue with timeouts was one of the reasons I originally proposed dealing
>> with all bg commands asynchronously (which doesn't prevent users from still
>> time-slicing). This approach is supposed to make the code simpler, but leaves
>> this mess. Currently we state:
>>
>>	 * In the event of timeout, the mailbox state is
>>	 * indeterminate until the next successful command submission and the
>>	 * driver can get back in sync with the hardware state.
>
>Right, but my struggle is how to explain to userspace what to do next.
>Does userspace keep submitting until it gets a successful acceptance?

Yes. It would be the users' responsibility to deal with timeouts.

>How does userspace know that its failure is induced by background
>command collision?

Knowing about the apriori timeout, would the user care?

>
>That's why I think it is ok to block and wait on the previous background
>command completion when the next command is submitted. That seems to satisfy
>least surprise. The submitter asked for the last command to end in
>failure and is asking for the next one to queue up and complete.
>
>I think this means that cxl_mbox_background_complete() needs to be called
>before every submission and needs to check for "background operation"
>(mailbox status bit0) being cleared to manage potential collisions.

All this really defeats the "simpler" angle. Without accepting that it's
the user who needs to resubmit until in sync, it seems doing synchronously
is sw shooting itself in the foot.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] cxl/mbox: Add background cmd handling machinery
  2023-05-22 20:28               ` Davidlohr Bueso
@ 2023-05-22 21:21                 ` Dan Williams
  2023-05-22 21:26                   ` Davidlohr Bueso
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2023-05-22 21:21 UTC (permalink / raw)
  To: Davidlohr Bueso, Dan Williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, linux-cxl

Davidlohr Bueso wrote:
> On Mon, 22 May 2023, Dan Williams wrote:
> 
> >Davidlohr Bueso wrote:
> >> It at least removes one point of potential disambiguity. This out of sync
> >> issue with timeouts was one of the reasons I originally proposed dealing
> >> with all bg commands asynchronously (which doesn't prevent users from still
> >> time-slicing). This approach is supposed to make the code simpler, but leaves
> >> this mess. Currently we state:
> >>
> >>	 * In the event of timeout, the mailbox state is
> >>	 * indeterminate until the next successful command submission and the
> >>	 * driver can get back in sync with the hardware state.
> >
> >Right, but my struggle is how to explain to userspace what to do next.
> >Does userspace keep submitting until it gets a successful acceptance?
> 
> Yes. It would be the users' responsibility to deal with timeouts.
> 
> >How does userspace know that its failure is induced by background
> >command collision?
> 
> Knowing about the apriori timeout, would the user care?

It's not just timeouts its also SIGTERM. If the user is allowed to say
"I do not care about the previous completion" then the driver should have
some way to get back in sync. Either the next submission needs to wait,
or there needs to be a facility to flush previous commands.

> >That's why I think it is ok to block and wait on the previous background
> >command completion when the next command is submitted. That seems to satisfy
> >least surprise. The submitter asked for the last command to end in
> >failure and is asking for the next one to queue up and complete.
> >
> >I think this means that cxl_mbox_background_complete() needs to be called
> >before every submission and needs to check for "background operation"
> >(mailbox status bit0) being cleared to manage potential collisions.
> 
> All this really defeats the "simpler" angle. Without accepting that it's
> the user who needs to resubmit until in sync, it seems doing synchronously
> is sw shooting itself in the foot.

Make it as simple as possible, but no simpler. The driver saying "you
are on your own after first cancellation or timeout" is too simple.
Especially when the driver designed with an eye towards not allowing
long running background commands be submitted (outside of the special
sanitization consideration).

So the choices are either allow the background command wait to be
interruptible and then do an implied sync on the next submission, or
make it uninterruptible where timeouts are either a real device problem
or a driver issue that needs to reduce the size of the background
operation to fit the timeout.

This middle ground of allowing SIGTERM and then userspace is on its own
to repair the connection violates least surprise.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] cxl/mbox: Add background cmd handling machinery
  2023-05-22 21:21                 ` Dan Williams
@ 2023-05-22 21:26                   ` Davidlohr Bueso
  2023-05-22 22:48                     ` Dan Williams
  0 siblings, 1 reply; 24+ messages in thread
From: Davidlohr Bueso @ 2023-05-22 21:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, linux-cxl

On Mon, 22 May 2023, Dan Williams wrote:

>So the choices are either allow the background command wait to be
>interruptible and then do an implied sync on the next submission, or
>make it uninterruptible where timeouts are either a real device problem
>or a driver issue that needs to reduce the size of the background
>operation to fit the timeout.

I obviously vote for the uninterruptible option and treat timeouts as
a rare occurrence. In my answers I was only mentioning timeout case
because of that uninterruptible wait context. The amount of bg capable
commands are limited, and would expect timeout parameters to be fairly
well inspected(?).

So if ok with you a v3 would have these semantics.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] cxl/mbox: Add background cmd handling machinery
  2023-05-22 21:26                   ` Davidlohr Bueso
@ 2023-05-22 22:48                     ` Dan Williams
  0 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2023-05-22 22:48 UTC (permalink / raw)
  To: Davidlohr Bueso, Dan Williams
  Cc: dave.jiang, alison.schofield, vishal.l.verma, Jonathan.Cameron,
	fan.ni, a.manzanares, linux-cxl

Davidlohr Bueso wrote:
> On Mon, 22 May 2023, Dan Williams wrote:
> 
> >So the choices are either allow the background command wait to be
> >interruptible and then do an implied sync on the next submission, or
> >make it uninterruptible where timeouts are either a real device problem
> >or a driver issue that needs to reduce the size of the background
> >operation to fit the timeout.
> 
> I obviously vote for the uninterruptible option and treat timeouts as
> a rare occurrence. In my answers I was only mentioning timeout case
> because of that uninterruptible wait context. The amount of bg capable
> commands are limited, and would expect timeout parameters to be fairly
> well inspected(?).
> 
> So if ok with you a v3 would have these semantics.

Sounds good, we can always complicate later if need be.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-05-22 22:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-02 17:18 [PATCH 0/3] cxl: Handle background commands Davidlohr Bueso
2023-05-02 17:18 ` [PATCH 1/3] rcuwait: Support timeouts Davidlohr Bueso
2023-05-19 21:38   ` Dan Williams
2023-05-19 22:55     ` Davidlohr Bueso
2023-05-20 11:03       ` Peter Zijlstra
2023-05-02 17:18 ` [PATCH 2/3] cxl/pci: Allocate irq vectors earlier in pci probe Davidlohr Bueso
2023-05-15 10:08   ` Jonathan Cameron
2023-05-02 17:18 ` [PATCH 3/3] cxl/mbox: Add background cmd handling machinery Davidlohr Bueso
2023-05-02 17:55   ` Davidlohr Bueso
2023-05-03 14:57   ` [PATCH v2] " Davidlohr Bueso
2023-05-15 10:30     ` Jonathan Cameron
2023-05-15 15:40       ` Davidlohr Bueso
2023-05-15 16:19         ` Jonathan Cameron
2023-05-16  7:58     ` Li, Ming
2023-05-16 17:02       ` Davidlohr Bueso
2023-05-19 23:13     ` Dan Williams
2023-05-22 16:58       ` Davidlohr Bueso
2023-05-22 18:19         ` Dan Williams
2023-05-22 18:57           ` Davidlohr Bueso
2023-05-22 20:16             ` Dan Williams
2023-05-22 20:28               ` Davidlohr Bueso
2023-05-22 21:21                 ` Dan Williams
2023-05-22 21:26                   ` Davidlohr Bueso
2023-05-22 22:48                     ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox