[PATCH RFT v3 0/3] ufs: core: cleanup and threaded irq handler

linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFT v3 0/3] ufs: core: cleanup and threaded irq handler
@ 2025-04-07 10:17 Neil Armstrong
  2025-04-07 10:17 ` [PATCH RFT v3 1/3] ufs: core: drop last_intr_status/ts stats Neil Armstrong
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Neil Armstrong @ 2025-04-07 10:17 UTC (permalink / raw)
  To: Alim Akhtar, Avri Altman, Bart Van Assche, James E.J. Bottomley,
	Martin K. Petersen
  Cc: Manivannan Sadhasivam, linux-arm-msm, linux-scsi, linux-kernel,
	Neil Armstrong

On systems with a large number request slots and unavailable MCQ,
the current design of the interrupt handler can delay handling of
other subsystems interrupts causing display artifacts, GPU stalls
or system firmware requests timeouts.

Example of errors reported on a loaded system:
 [drm:dpu_encoder_frame_done_timeout:2706] [dpu error]enc32 frame done timeout
 msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.20.1: hangcheck detected gpu lockup rb 2!
 msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.20.1:     completed fence: 74285
 msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.20.1:     submitted fence: 74286
 Error sending AMC RPMH requests (-110)

Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
Changes in v3:
- reword patch 1 commit message, add review tag
- add intermediate patch to track the ESI state
- use the ESI state instead of ufshcd_is_intr_aggr_allowed()
- reword commit message to explain why in mcq + esi mode we bypass threaded irq
- Link to v2: https://lore.kernel.org/r/20250326-topic-ufs-use-threaded-irq-v2-0-7b3e8a5037e6@linaro.org

Changes in v2:
- Removed last_intr_status/last_intr_ts stats
- Handle irq in prinmary handler for MCQ case
- Stop touching REG_INTERRUPT_ENABLE register
- Link to v1: https://lore.kernel.org/r/20250321-topic-ufs-use-threaded-irq-v1-1-7a55816a4b1d@linaro.org

---
Neil Armstrong (3):
      ufs: core: drop last_intr_status/ts stats
      ufs: core: track when MCQ ESI is enabled
      ufs: core: delegate the interrupt service routine to a threaded irq handler

 drivers/ufs/core/ufshcd.c | 42 +++++++++++++++++++++++++++++++-----------
 include/ufs/ufshcd.h      |  7 ++-----
 2 files changed, 33 insertions(+), 16 deletions(-)
---
base-commit: ff7f9b199e3f4cc7d61df5a9a26a7cbb5c1492e6
change-id: 20250321-topic-ufs-use-threaded-irq-53af30f2529f

Best regards,
-- 
Neil Armstrong <neil.armstrong@linaro.org>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFT v3 1/3] ufs: core: drop last_intr_status/ts stats
  2025-04-07 10:17 [PATCH RFT v3 0/3] ufs: core: cleanup and threaded irq handler Neil Armstrong
@ 2025-04-07 10:17 ` Neil Armstrong
  2025-04-07 10:17 ` [PATCH RFT v3 2/3] ufs: core: track when MCQ ESI is enabled Neil Armstrong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Neil Armstrong @ 2025-04-07 10:17 UTC (permalink / raw)
  To: Alim Akhtar, Avri Altman, Bart Van Assche, James E.J. Bottomley,
	Martin K. Petersen
  Cc: Manivannan Sadhasivam, linux-arm-msm, linux-scsi, linux-kernel,
	Neil Armstrong

In order to prepare introduction of a threaded interrupt handler,
drop last_intr_status & last_intr_ts drop the ufs_stats struct,
and the associated debug code.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
 drivers/ufs/core/ufshcd.c | 11 +++--------
 include/ufs/ufshcd.h      |  5 -----
 2 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 0534390c2a35d0671156d79a4b1981a257d2fbfa..5e73ac1e00788f3d599f0b3eb6e2806df9b6f6c3 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -643,9 +643,6 @@ static void ufshcd_print_host_state(struct ufs_hba *hba)
 		"last_hibern8_exit_tstamp at %lld us, hibern8_exit_cnt=%d\n",
 		div_u64(hba->ufs_stats.last_hibern8_exit_tstamp, 1000),
 		hba->ufs_stats.hibern8_exit_cnt);
-	dev_err(hba->dev, "last intr at %lld us, last intr status=0x%x\n",
-		div_u64(hba->ufs_stats.last_intr_ts, 1000),
-		hba->ufs_stats.last_intr_status);
 	dev_err(hba->dev, "error handling flags=0x%x, req. abort count=%d\n",
 		hba->eh_flags, hba->req_abort_count);
 	dev_err(hba->dev, "hba->ufs_version=0x%x, Host capabilities=0x%x, caps=0x%x\n",
@@ -6984,14 +6981,12 @@ static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
  */
 static irqreturn_t ufshcd_intr(int irq, void *__hba)
 {
-	u32 intr_status, enabled_intr_status = 0;
+	u32 last_intr_status, intr_status, enabled_intr_status = 0;
 	irqreturn_t retval = IRQ_NONE;
 	struct ufs_hba *hba = __hba;
 	int retries = hba->nutrs;
 
-	intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
-	hba->ufs_stats.last_intr_status = intr_status;
-	hba->ufs_stats.last_intr_ts = local_clock();
+	last_intr_status = intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
 
 	/*
 	 * There could be max of hba->nutrs reqs in flight and in worst case
@@ -7015,7 +7010,7 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
 		dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x, 0x%08x)\n",
 					__func__,
 					intr_status,
-					hba->ufs_stats.last_intr_status,
+					last_intr_status,
 					enabled_intr_status);
 		ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
 	}
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index e3909cc691b2a854a270279901edacaa5c5120d6..fffa9cc465433604570f91b8e882b58cd985f35b 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -501,8 +501,6 @@ struct ufs_event_hist {
 
 /**
  * struct ufs_stats - keeps usage/err statistics
- * @last_intr_status: record the last interrupt status.
- * @last_intr_ts: record the last interrupt timestamp.
  * @hibern8_exit_cnt: Counter to keep track of number of exits,
  *		reset this after link-startup.
  * @last_hibern8_exit_tstamp: Set time after the hibern8 exit.
@@ -510,9 +508,6 @@ struct ufs_event_hist {
  * @event: array with event history.
  */
 struct ufs_stats {
-	u32 last_intr_status;
-	u64 last_intr_ts;
-
 	u32 hibern8_exit_cnt;
 	u64 last_hibern8_exit_tstamp;
 	struct ufs_event_hist event[UFS_EVT_CNT];

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFT v3 2/3] ufs: core: track when MCQ ESI is enabled
  2025-04-07 10:17 [PATCH RFT v3 0/3] ufs: core: cleanup and threaded irq handler Neil Armstrong
  2025-04-07 10:17 ` [PATCH RFT v3 1/3] ufs: core: drop last_intr_status/ts stats Neil Armstrong
@ 2025-04-07 10:17 ` Neil Armstrong
  2025-04-07 19:36   ` Bart Van Assche
  2025-04-07 10:17 ` [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service routine to a threaded irq handler Neil Armstrong
  2025-04-12  1:25 ` [PATCH RFT v3 0/3] ufs: core: cleanup and " Martin K. Petersen
  3 siblings, 1 reply; 13+ messages in thread
From: Neil Armstrong @ 2025-04-07 10:17 UTC (permalink / raw)
  To: Alim Akhtar, Avri Altman, Bart Van Assche, James E.J. Bottomley,
	Martin K. Petersen
  Cc: Manivannan Sadhasivam, linux-arm-msm, linux-scsi, linux-kernel,
	Neil Armstrong

In preparation of adding a threaded interrupt handler, track when
the MCQ ESI interrupt handlers were installed so we can optimize the
MCQ interrupt handling to avoid walking the threaded handler in the case
ESI handlers are enabled.

Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
 drivers/ufs/core/ufshcd.c | 1 +
 include/ufs/ufshcd.h      | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 5e73ac1e00788f3d599f0b3eb6e2806df9b6f6c3..7f256f77b8ba9853569157db7785d177b6cd6dee 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -8717,6 +8717,7 @@ static void ufshcd_config_mcq(struct ufs_hba *hba)
 	u32 intrs;
 
 	ret = ufshcd_mcq_vops_config_esi(hba);
+	hba->mcq_esi_enabled = !ret;
 	dev_info(hba->dev, "ESI %sconfigured\n", ret ? "is not " : "");
 
 	intrs = UFSHCD_ENABLE_MCQ_INTRS;
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index fffa9cc465433604570f91b8e882b58cd985f35b..ec999fa671240cb87bf540d339aa830b6847eb71 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -954,6 +954,7 @@ enum ufshcd_mcq_opr {
  *	ufshcd_resume_complete()
  * @mcq_sup: is mcq supported by UFSHC
  * @mcq_enabled: is mcq ready to accept requests
+ * @mcq_esi_enabled: is mcq ESI configured
  * @res: array of resource info of MCQ registers
  * @mcq_base: Multi circular queue registers base address
  * @uhq: array of supported hardware queues
@@ -1122,6 +1123,7 @@ struct ufs_hba {
 	bool mcq_sup;
 	bool lsdb_sup;
 	bool mcq_enabled;
+	bool mcq_esi_enabled;
 	struct ufshcd_res_info res[RES_MAX];
 	void __iomem *mcq_base;
 	struct ufs_hw_queue *uhq;

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH RFT v3 2/3] ufs: core: track when MCQ ESI is enabled
  2025-04-07 10:17 ` [PATCH RFT v3 2/3] ufs: core: track when MCQ ESI is enabled Neil Armstrong
@ 2025-04-07 19:36   ` Bart Van Assche
  0 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2025-04-07 19:36 UTC (permalink / raw)
  To: Neil Armstrong, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen
  Cc: Manivannan Sadhasivam, linux-arm-msm, linux-scsi, linux-kernel

On 4/7/25 3:17 AM, Neil Armstrong wrote:
> In preparation of adding a threaded interrupt handler, track when
> the MCQ ESI interrupt handlers were installed so we can optimize the
> MCQ interrupt handling to avoid walking the threaded handler in the case
> ESI handlers are enabled.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service routine to a threaded irq handler
  2025-04-07 10:17 [PATCH RFT v3 0/3] ufs: core: cleanup and threaded irq handler Neil Armstrong
  2025-04-07 10:17 ` [PATCH RFT v3 1/3] ufs: core: drop last_intr_status/ts stats Neil Armstrong
  2025-04-07 10:17 ` [PATCH RFT v3 2/3] ufs: core: track when MCQ ESI is enabled Neil Armstrong
@ 2025-04-07 10:17 ` Neil Armstrong
  2025-04-07 19:44   ` Bart Van Assche
                     ` (2 more replies)
  2025-04-12  1:25 ` [PATCH RFT v3 0/3] ufs: core: cleanup and " Martin K. Petersen
  3 siblings, 3 replies; 13+ messages in thread
From: Neil Armstrong @ 2025-04-07 10:17 UTC (permalink / raw)
  To: Alim Akhtar, Avri Altman, Bart Van Assche, James E.J. Bottomley,
	Martin K. Petersen
  Cc: Manivannan Sadhasivam, linux-arm-msm, linux-scsi, linux-kernel,
	Neil Armstrong

On systems with a large number request slots and unavailable MCQ ESI,
the current design of the interrupt handler can delay handling of
other subsystems interrupts causing display artifacts, GPU stalls
or system firmware requests timeouts.

Since the interrupt routine can take quite some time, it's
preferable to move it to a threaded handler and leave the
hard interrupt handler wake up the threaded interrupt routine,
the interrupt line would be masked until the processing is
finished in the thread thanks to the IRQS_ONESHOT flag.

When MCQ & ESI interrupts are enabled the I/O completions are now
directly handled in the "hard" interrupt routine to keep IOPs high
since queues handling is done in separate per-queue interrupt routines.

This fixes all encountered issued when running FIO tests
on the Qualcomm SM8650 platform.

Example of errors reported on a loaded system:
 [drm:dpu_encoder_frame_done_timeout:2706] [dpu error]enc32 frame done timeout
 msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.20.1: hangcheck detected gpu lockup rb 2!
 msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.20.1:     completed fence: 74285
 msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.20.1:     submitted fence: 74286
 Error sending AMC RPMH requests (-110)

Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
 drivers/ufs/core/ufshcd.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 7f256f77b8ba9853569157db7785d177b6cd6dee..b40660ca2fa6b3488645bd26121752554a8d6a08 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -6971,7 +6971,7 @@ static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
 }
 
 /**
- * ufshcd_intr - Main interrupt service routine
+ * ufshcd_threaded_intr - Threaded interrupt service routine
  * @irq: irq number
  * @__hba: pointer to adapter instance
  *
@@ -6979,7 +6979,7 @@ static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
  *  IRQ_HANDLED - If interrupt is valid
  *  IRQ_NONE    - If invalid interrupt
  */
-static irqreturn_t ufshcd_intr(int irq, void *__hba)
+static irqreturn_t ufshcd_threaded_intr(int irq, void *__hba)
 {
 	u32 last_intr_status, intr_status, enabled_intr_status = 0;
 	irqreturn_t retval = IRQ_NONE;
@@ -7018,6 +7018,29 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
 	return retval;
 }
 
+/**
+ * ufshcd_intr - Main interrupt service routine
+ * @irq: irq number
+ * @__hba: pointer to adapter instance
+ *
+ * Return:
+ *  IRQ_HANDLED     - If interrupt is valid
+ *  IRQ_WAKE_THREAD - If handling is moved to threaded handled
+ *  IRQ_NONE        - If invalid interrupt
+ */
+static irqreturn_t ufshcd_intr(int irq, void *__hba)
+{
+	struct ufs_hba *hba = __hba;
+
+	/* Move interrupt handling to thread when MCQ & ESI are not enabled */
+	if (!hba->mcq_enabled || !hba->mcq_esi_enabled)
+		return IRQ_WAKE_THREAD;
+
+	/* Directly handle interrupts since MCQ ESI handlers does the hard job */
+	return ufshcd_sl_intr(hba, ufshcd_readl(hba, REG_INTERRUPT_STATUS) &
+				   ufshcd_readl(hba, REG_INTERRUPT_ENABLE));
+}
+
 static int ufshcd_clear_tm_cmd(struct ufs_hba *hba, int tag)
 {
 	int err = 0;
@@ -10577,7 +10600,8 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 	ufshcd_readl(hba, REG_INTERRUPT_ENABLE);
 
 	/* IRQ registration */
-	err = devm_request_irq(dev, irq, ufshcd_intr, IRQF_SHARED, UFSHCD, hba);
+	err = devm_request_threaded_irq(dev, irq, ufshcd_intr, ufshcd_threaded_intr,
+					IRQF_ONESHOT | IRQF_SHARED, UFSHCD, hba);
 	if (err) {
 		dev_err(hba->dev, "request irq failed\n");
 		goto out_disable;

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service routine to a threaded irq handler
  2025-04-07 10:17 ` [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service routine to a threaded irq handler Neil Armstrong
@ 2025-04-07 19:44   ` Bart Van Assche
  2025-07-21 12:04   ` André Draszik
  2025-07-28 23:11   ` Nitin Rawat
  2 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2025-04-07 19:44 UTC (permalink / raw)
  To: Neil Armstrong, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen
  Cc: Manivannan Sadhasivam, linux-arm-msm, linux-scsi, linux-kernel

On 4/7/25 3:17 AM, Neil Armstrong wrote:
> +static irqreturn_t ufshcd_intr(int irq, void *__hba)
> +{
> +	struct ufs_hba *hba = __hba;
> +
> +	/* Move interrupt handling to thread when MCQ & ESI are not enabled */
> +	if (!hba->mcq_enabled || !hba->mcq_esi_enabled)
> +		return IRQ_WAKE_THREAD;
> +
> +	/* Directly handle interrupts since MCQ ESI handlers does the hard job */
> +	return ufshcd_sl_intr(hba, ufshcd_readl(hba, REG_INTERRUPT_STATUS) &
> +				   ufshcd_readl(hba, REG_INTERRUPT_ENABLE));
> +}

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service routine to a threaded irq handler
  2025-04-07 10:17 ` [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service routine to a threaded irq handler Neil Armstrong
  2025-04-07 19:44   ` Bart Van Assche
@ 2025-07-21 12:04   ` André Draszik
  2025-07-21 12:10     ` André Draszik
  2025-07-21 15:28     ` Bart Van Assche
  2025-07-28 23:11   ` Nitin Rawat
  2 siblings, 2 replies; 13+ messages in thread
From: André Draszik @ 2025-07-21 12:04 UTC (permalink / raw)
  To: Neil Armstrong, Alim Akhtar, Avri Altman, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen
  Cc: Manivannan Sadhasivam, linux-arm-msm, linux-scsi, linux-kernel,
	Peter Griffin, Will McVicker, kernel-team, Tudor Ambarus

Hi,

On Mon, 2025-04-07 at 12:17 +0200, Neil Armstrong wrote:
> On systems with a large number request slots and unavailable MCQ ESI,
> the current design of the interrupt handler can delay handling of
> other subsystems interrupts causing display artifacts, GPU stalls
> or system firmware requests timeouts.
> 
> Since the interrupt routine can take quite some time, it's
> preferable to move it to a threaded handler and leave the
> hard interrupt handler wake up the threaded interrupt routine,
> the interrupt line would be masked until the processing is
> finished in the thread thanks to the IRQS_ONESHOT flag.
> 
> When MCQ & ESI interrupts are enabled the I/O completions are now
> directly handled in the "hard" interrupt routine to keep IOPs high
> since queues handling is done in separate per-queue interrupt routines.

This patch adversely affects Pixel 6 UFS performance. It has a
UFSHCI v3.x controller I believe (and therefore probably all
devices with < v4) - if my limited understanding is correct,
MCQ & ESI are a feature of v4 controllers only.

On Pixel 6, fio reports following performance on linux-next with
this patch:

read [1] / write [2]:
   READ: bw=17.1MiB/s (17.9MB/s), 17.1MiB/s-17.1MiB/s (17.9MB/s-17.9MB/s), io=684MiB (718MB), run=40001-40001msec
  WRITE: bw=20.6MiB/s (21.5MB/s), 20.6MiB/s-20.6MiB/s (21.5MB/s-21.5MB/s), io=822MiB (862MB), run=40003-40003msec

With this patch reverted, performance changes back to:

read [1] / write [2]:

   READ: bw=19.9MiB/s (20.8MB/s), 19.9MiB/s-19.9MiB/s (20.8MB/s-20.8MB/s), io=795MiB (833MB), run=40001-40001msec
  WRITE: bw=28.0MiB/s (29.4MB/s), 28.0MiB/s-28.0MiB/s (29.4MB/s-29.4MB/s), io=1122MiB (1176MB), run=40003-40003msec

all over multiple runs.

which is a ~26% reduction for write and ~14% reduction for read.

PCBenchmark even reports performance drops of ~41%.


I don't know much about UFS at this stage, but could the code simply
check for the controller version and revert to original behaviour
if < v4? Any thoughts on such a change?


[1]: fio --name=randread --rw=randread --ioengine=libaio --direct=1 \
         --bs=4k --numjobs=1 --size=1g --ramp_time=10 --runtime=40 --time_based \
         --end_fsync=1 --group_reporting --filename=/foo

[2]: fio --name=randwrite --rw=randwrite --ioengine=libaio --direct=1 \
         --bs=4k --numjobs=1 --size=1g --ramp_time=10 --runtime=40 --time_based \
         --end_fsync=1 --group_reporting --filename=/foo

Cheers,
Andre'

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service routine to a threaded irq handler
  2025-07-21 12:04   ` André Draszik
@ 2025-07-21 12:10     ` André Draszik
  2025-07-21 15:28     ` Bart Van Assche
  1 sibling, 0 replies; 13+ messages in thread
From: André Draszik @ 2025-07-21 12:10 UTC (permalink / raw)
  To: Neil Armstrong, Alim Akhtar, Avri Altman, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen
  Cc: Manivannan Sadhasivam, linux-arm-msm, linux-scsi, linux-kernel,
	Peter Griffin, Will McVicker, kernel-team, Tudor Ambarus

On Mon, 2025-07-21 at 13:04 +0100, André Draszik wrote:
> Hi,
> 
> On Mon, 2025-04-07 at 12:17 +0200, Neil Armstrong wrote:
> > On systems with a large number request slots and unavailable MCQ ESI,
> > the current design of the interrupt handler can delay handling of
> > other subsystems interrupts causing display artifacts, GPU stalls
> > or system firmware requests timeouts.
> > 
> > Since the interrupt routine can take quite some time, it's
> > preferable to move it to a threaded handler and leave the
> > hard interrupt handler wake up the threaded interrupt routine,
> > the interrupt line would be masked until the processing is
> > finished in the thread thanks to the IRQS_ONESHOT flag.
> > 
> > When MCQ & ESI interrupts are enabled the I/O completions are now
> > directly handled in the "hard" interrupt routine to keep IOPs high
> > since queues handling is done in separate per-queue interrupt routines.
> 
> This patch adversely affects Pixel 6 UFS performance. It has a
> UFSHCI v3.x controller I believe (and therefore probably all
> devices with < v4) - if my limited understanding is correct,
> MCQ & ESI are a feature of v4 controllers only.
> 
> On Pixel 6, fio reports following performance on linux-next with
> this patch:
> 
> read [1] / write [2]:
>    READ: bw=17.1MiB/s (17.9MB/s), 17.1MiB/s-17.1MiB/s (17.9MB/s-17.9MB/s), io=684MiB (718MB), run=40001-40001msec
>   WRITE: bw=20.6MiB/s (21.5MB/s), 20.6MiB/s-20.6MiB/s (21.5MB/s-21.5MB/s), io=822MiB (862MB), run=40003-40003msec
> 
> With this patch reverted, performance changes back to:
> 
> read [1] / write [2]:
> 
>    READ: bw=19.9MiB/s (20.8MB/s), 19.9MiB/s-19.9MiB/s (20.8MB/s-20.8MB/s), io=795MiB (833MB), run=40001-40001msec
>   WRITE: bw=28.0MiB/s (29.4MB/s), 28.0MiB/s-28.0MiB/s (29.4MB/s-29.4MB/s), io=1122MiB (1176MB), run=40003-40003msec
> 
> all over multiple runs.
> 
> which is a ~26% reduction for write and ~14% reduction for read.
> 
> PCBenchmark even reports performance drops of ~41%.

Additional fio results (numjobs=8 instead of 1):

current linux-next:

fio --name=randread --rw=randread --ioengine=libaio --direct=1 --bs=4k --numjobs=8 --size=1g --runtime=30 --time_based --end_fsync=1 --
group_reporting --filename=/foo
   READ: bw=52.1MiB/s (54.6MB/s), 52.1MiB/s-52.1MiB/s (54.6MB/s-54.6MB/s), io=1562MiB (1638MB), run=30001-30001msec
  WRITE: bw=74.7MiB/s (78.3MB/s), 74.7MiB/s-74.7MiB/s (78.3MB/s-78.3MB/s), io=2242MiB (2351MB), run=30004-30004msec


with patch reverted:

fio --name=randread --rw=randread --ioengine=libaio --direct=1 --bs=4k --numjobs=8 --size=1g --runtime=30 --time_based --end_fsync=1 --
group_reporting --filename=/foo
   READ: bw=83.5MiB/s (87.6MB/s), 83.5MiB/s-83.5MiB/s (87.6MB/s-87.6MB/s), io=2506MiB (2628MB), run=30001-30001msec
  WRITE: bw=83.3MiB/s (87.4MB/s), 83.3MiB/s-83.3MiB/s (87.4MB/s-87.4MB/s), io=2501MiB (2622MB), run=30003-30003msec



which is an even higher 37% reduction for read.

A.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service routine to a threaded irq handler
  2025-07-21 12:04   ` André Draszik
  2025-07-21 12:10     ` André Draszik
@ 2025-07-21 15:28     ` Bart Van Assche
  2025-07-22  9:22       ` André Draszik
  1 sibling, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2025-07-21 15:28 UTC (permalink / raw)
  To: André Draszik, Neil Armstrong, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen
  Cc: Manivannan Sadhasivam, linux-arm-msm, linux-scsi, linux-kernel,
	Peter Griffin, Will McVicker, kernel-team, Tudor Ambarus

On 7/21/25 5:04 AM, André Draszik wrote:
> I don't know much about UFS at this stage, but could the code simply
> check for the controller version and revert to original behaviour
> if < v4? Any thoughts on such a change?

I'm not sure that's the best possible solution. A more elegant solution
could be to remove the "if (!hba->mcq_enabled || !hba->mcq_esi_enabled)"
check, to restrict the number of commands completed by 
ufshcd_transfer_req_compl() and only to return IRQ_WAKE_THREAD if more
commands are pending than a certain threshold.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service routine to a threaded irq handler
  2025-07-21 15:28     ` Bart Van Assche
@ 2025-07-22  9:22       ` André Draszik
  2025-07-24  9:55         ` André Draszik
  0 siblings, 1 reply; 13+ messages in thread
From: André Draszik @ 2025-07-22  9:22 UTC (permalink / raw)
  To: Bart Van Assche, Neil Armstrong, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen
  Cc: Manivannan Sadhasivam, linux-arm-msm, linux-scsi, linux-kernel,
	Peter Griffin, Will McVicker, kernel-team, Tudor Ambarus

On Mon, 2025-07-21 at 08:28 -0700, Bart Van Assche wrote:
> On 7/21/25 5:04 AM, André Draszik wrote:
> > I don't know much about UFS at this stage, but could the code simply
> > check for the controller version and revert to original behaviour
> > if < v4? Any thoughts on such a change?
> 
> I'm not sure that's the best possible solution. A more elegant solution
> could be to remove the "if (!hba->mcq_enabled || !hba->mcq_esi_enabled)"
> check, to restrict the number of commands completed by 
> ufshcd_transfer_req_compl() and only to return IRQ_WAKE_THREAD if more
> commands are pending than a certain threshold.

Thanks Bart. I'll try that instead,

Cheers,
Andre'

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service routine to a threaded irq handler
  2025-07-22  9:22       ` André Draszik
@ 2025-07-24  9:55         ` André Draszik
  0 siblings, 0 replies; 13+ messages in thread
From: André Draszik @ 2025-07-24  9:55 UTC (permalink / raw)
  To: Bart Van Assche, Neil Armstrong, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen
  Cc: Manivannan Sadhasivam, linux-arm-msm, linux-scsi, linux-kernel,
	Peter Griffin, Will McVicker, kernel-team, Tudor Ambarus

On Tue, 2025-07-22 at 10:22 +0100, André Draszik wrote:
> On Mon, 2025-07-21 at 08:28 -0700, Bart Van Assche wrote:
> > On 7/21/25 5:04 AM, André Draszik wrote:
> > > I don't know much about UFS at this stage, but could the code simply
> > > check for the controller version and revert to original behaviour
> > > if < v4? Any thoughts on such a change?
> > 
> > I'm not sure that's the best possible solution. A more elegant solution
> > could be to remove the "if (!hba->mcq_enabled || !hba->mcq_esi_enabled)"
> > check, to restrict the number of commands completed by 
> > ufshcd_transfer_req_compl() and only to return IRQ_WAKE_THREAD if more
> > commands are pending than a certain threshold.
> 
> Thanks Bart. I'll try that instead,

I went with a time limit instead of counting the requests in the end,
because that should be more deterministic:

https://lore.kernel.org/r/20250724-ufshcd-hardirq-v1-1-6398a52f8f02@linaro.org


Cheers,
Andre'

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service routine to a threaded irq handler
  2025-04-07 10:17 ` [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service routine to a threaded irq handler Neil Armstrong
  2025-04-07 19:44   ` Bart Van Assche
  2025-07-21 12:04   ` André Draszik
@ 2025-07-28 23:11   ` Nitin Rawat
  2 siblings, 0 replies; 13+ messages in thread
From: Nitin Rawat @ 2025-07-28 23:11 UTC (permalink / raw)
  To: Neil Armstrong, Alim Akhtar, Avri Altman, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen
  Cc: Manivannan Sadhasivam, linux-arm-msm, linux-scsi, linux-kernel



On 4/7/2025 3:47 PM, Neil Armstrong wrote:
> On systems with a large number request slots and unavailable MCQ ESI,
> the current design of the interrupt handler can delay handling of
> other subsystems interrupts causing display artifacts, GPU stalls
> or system firmware requests timeouts.
> 
> Since the interrupt routine can take quite some time, it's
> preferable to move it to a threaded handler and leave the
> hard interrupt handler wake up the threaded interrupt routine,
> the interrupt line would be masked until the processing is
> finished in the thread thanks to the IRQS_ONESHOT flag.
> 
> When MCQ & ESI interrupts are enabled the I/O completions are now
> directly handled in the "hard" interrupt routine to keep IOPs high
> since queues handling is done in separate per-queue interrupt routines.
> 
> This fixes all encountered issued when running FIO tests
> on the Qualcomm SM8650 platform.

Hi Neil,

I tested this change on both SM8750 and SM8650 using the upstream kernel 
with MCQ enabled locally. I also validated it on the SM8750 downstream 
codebase. In all cases, enabling MCQ mode led to boot-up issues on these 
targets.

The root cause was that in MCQ mode, the Interrupt Status (IS) register 
was not being cleared in the ufshcd_intr function. This resulted in 
abnormal behavior during subsequent UIC commands, ultimately causing 
boot failures.

To address this issue, I’ve submitted the following patch: [PATCH V1] 
ufs: core: Fix interrupt handling for MCQ Mode in ufshcd_intr.

I also have plan to get the performance number with SDB and MCQ mode 
with these change. I'll update the thread once i get the number.

Thanks,
Nitin

> 
> Example of errors reported on a loaded system:
>   [drm:dpu_encoder_frame_done_timeout:2706] [dpu error]enc32 frame done timeout
>   msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.20.1: hangcheck detected gpu lockup rb 2!
>   msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.20.1:     completed fence: 74285
>   msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.20.1:     submitted fence: 74286
>   Error sending AMC RPMH requests (-110)
> 
> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
> ---
>   drivers/ufs/core/ufshcd.c | 30 +++++++++++++++++++++++++++---
>   1 file changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 7f256f77b8ba9853569157db7785d177b6cd6dee..b40660ca2fa6b3488645bd26121752554a8d6a08 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -6971,7 +6971,7 @@ static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
>   }
>   
>   /**
> - * ufshcd_intr - Main interrupt service routine
> + * ufshcd_threaded_intr - Threaded interrupt service routine
>    * @irq: irq number
>    * @__hba: pointer to adapter instance
>    *
> @@ -6979,7 +6979,7 @@ static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
>    *  IRQ_HANDLED - If interrupt is valid
>    *  IRQ_NONE    - If invalid interrupt
>    */
> -static irqreturn_t ufshcd_intr(int irq, void *__hba)
> +static irqreturn_t ufshcd_threaded_intr(int irq, void *__hba)
>   {
>   	u32 last_intr_status, intr_status, enabled_intr_status = 0;
>   	irqreturn_t retval = IRQ_NONE;
> @@ -7018,6 +7018,29 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
>   	return retval;
>   }
>   
> +/**
> + * ufshcd_intr - Main interrupt service routine
> + * @irq: irq number
> + * @__hba: pointer to adapter instance
> + *
> + * Return:
> + *  IRQ_HANDLED     - If interrupt is valid
> + *  IRQ_WAKE_THREAD - If handling is moved to threaded handled
> + *  IRQ_NONE        - If invalid interrupt
> + */
> +static irqreturn_t ufshcd_intr(int irq, void *__hba)
> +{
> +	struct ufs_hba *hba = __hba;
> +
> +	/* Move interrupt handling to thread when MCQ & ESI are not enabled */
> +	if (!hba->mcq_enabled || !hba->mcq_esi_enabled)
> +		return IRQ_WAKE_THREAD;
> +
> +	/* Directly handle interrupts since MCQ ESI handlers does the hard job */
> +	return ufshcd_sl_intr(hba, ufshcd_readl(hba, REG_INTERRUPT_STATUS) &
> +				   ufshcd_readl(hba, REG_INTERRUPT_ENABLE));
> +}
> +
>   static int ufshcd_clear_tm_cmd(struct ufs_hba *hba, int tag)
>   {
>   	int err = 0;
> @@ -10577,7 +10600,8 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
>   	ufshcd_readl(hba, REG_INTERRUPT_ENABLE);
>   
>   	/* IRQ registration */
> -	err = devm_request_irq(dev, irq, ufshcd_intr, IRQF_SHARED, UFSHCD, hba);
> +	err = devm_request_threaded_irq(dev, irq, ufshcd_intr, ufshcd_threaded_intr,
> +					IRQF_ONESHOT | IRQF_SHARED, UFSHCD, hba);
>   	if (err) {
>   		dev_err(hba->dev, "request irq failed\n");
>   		goto out_disable;
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFT v3 0/3] ufs: core: cleanup and threaded irq handler
  2025-04-07 10:17 [PATCH RFT v3 0/3] ufs: core: cleanup and threaded irq handler Neil Armstrong
                   ` (2 preceding siblings ...)
  2025-04-07 10:17 ` [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service routine to a threaded irq handler Neil Armstrong
@ 2025-04-12  1:25 ` Martin K. Petersen
  3 siblings, 0 replies; 13+ messages in thread
From: Martin K. Petersen @ 2025-04-12  1:25 UTC (permalink / raw)
  To: Neil Armstrong
  Cc: Alim Akhtar, Avri Altman, Bart Van Assche, James E.J. Bottomley,
	Martin K. Petersen, Manivannan Sadhasivam, linux-arm-msm,
	linux-scsi, linux-kernel


Neil,

> On systems with a large number request slots and unavailable MCQ, the
> current design of the interrupt handler can delay handling of other
> subsystems interrupts causing display artifacts, GPU stalls or system
> firmware requests timeouts.

Applied to 6.16/scsi-staging, thanks!

-- 
Martin K. Petersen

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-07-28 23:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-07 10:17 [PATCH RFT v3 0/3] ufs: core: cleanup and threaded irq handler Neil Armstrong
2025-04-07 10:17 ` [PATCH RFT v3 1/3] ufs: core: drop last_intr_status/ts stats Neil Armstrong
2025-04-07 10:17 ` [PATCH RFT v3 2/3] ufs: core: track when MCQ ESI is enabled Neil Armstrong
2025-04-07 19:36   ` Bart Van Assche
2025-04-07 10:17 ` [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service routine to a threaded irq handler Neil Armstrong
2025-04-07 19:44   ` Bart Van Assche
2025-07-21 12:04   ` André Draszik
2025-07-21 12:10     ` André Draszik
2025-07-21 15:28     ` Bart Van Assche
2025-07-22  9:22       ` André Draszik
2025-07-24  9:55         ` André Draszik
2025-07-28 23:11   ` Nitin Rawat
2025-04-12  1:25 ` [PATCH RFT v3 0/3] ufs: core: cleanup and " Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).