* [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery
@ 2023-11-03 8:47 Adrian Hunter
2023-11-03 8:47 ` [PATCH V2 1/6] mmc: block: Do not lose cache flush during CQE error recovery Adrian Hunter
` (7 more replies)
0 siblings, 8 replies; 21+ messages in thread
From: Adrian Hunter @ 2023-11-03 8:47 UTC (permalink / raw)
To: Ulf Hansson, Kornel Dulęba, Radoslaw Biernacki,
Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu, linux-mmc,
linux-kernel
Hi
Some issues have been found with CQE error recovery. Here are some fixes.
As of V2, the alternative implementation for the patch from Kornel Dulęba:
https://lore.kernel.org/linux-mmc/e7c12e07-7540-47ea-8891-2cec73d58df1@intel.com/T/#u
is now included, see patch 6 "mmc: cqhci: Fix task clearing in CQE error
recovery")
Please also note ->post_disable() seems to be missing from
cqhci_recovery_start(). It would be good if ->post_disable()
users could check if this needs attention.
Changes in V2:
mmc: cqhci: Fix task clearing in CQE error recovery
New patch
mmc: cqhci: Warn of halt or task clear failure
Add fixes and stable tags
Adrian Hunter (6):
mmc: block: Do not lose cache flush during CQE error recovery
mmc: cqhci: Increase recovery halt timeout
mmc: block: Be sure to wait while busy in CQE error recovery
mmc: block: Retry commands in CQE error recovery
mmc: cqhci: Warn of halt or task clear failure
mmc: cqhci: Fix task clearing in CQE error recovery
drivers/mmc/core/block.c | 2 ++
drivers/mmc/core/core.c | 9 +++++++--
drivers/mmc/host/cqhci-core.c | 44 +++++++++++++++++++++----------------------
3 files changed, 31 insertions(+), 24 deletions(-)
Regards
Adrian
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH V2 1/6] mmc: block: Do not lose cache flush during CQE error recovery
2023-11-03 8:47 [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery Adrian Hunter
@ 2023-11-03 8:47 ` Adrian Hunter
2023-11-03 10:18 ` Avri Altman
2023-11-03 8:47 ` [PATCH V2 2/6] mmc: cqhci: Increase recovery halt timeout Adrian Hunter
` (6 subsequent siblings)
7 siblings, 1 reply; 21+ messages in thread
From: Adrian Hunter @ 2023-11-03 8:47 UTC (permalink / raw)
To: Ulf Hansson, Kornel Dulęba, Radoslaw Biernacki,
Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu, linux-mmc,
linux-kernel
During CQE error recovery, error-free data commands get requeued if there
is any data left to transfer, but non-data commands are completed even
though they have not been processed. Requeue them instead.
Note the only non-data command is cache flush, which would have resulted in
a cache flush being lost if it was queued at the time of CQE recovery.
Fixes: 1e8e55b67030 ("mmc: block: Add CQE support")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
drivers/mmc/core/block.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 3a8f27c3e310..4a32b756b7d8 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1482,6 +1482,8 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req)
blk_mq_requeue_request(req, true);
else
__blk_mq_end_request(req, BLK_STS_OK);
+ } else if (mq->in_recovery) {
+ blk_mq_requeue_request(req, true);
} else {
blk_mq_end_request(req, BLK_STS_OK);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH V2 2/6] mmc: cqhci: Increase recovery halt timeout
2023-11-03 8:47 [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery Adrian Hunter
2023-11-03 8:47 ` [PATCH V2 1/6] mmc: block: Do not lose cache flush during CQE error recovery Adrian Hunter
@ 2023-11-03 8:47 ` Adrian Hunter
2023-11-03 10:37 ` Avri Altman
2023-11-03 8:47 ` [PATCH V2 3/6] mmc: block: Be sure to wait while busy in CQE error recovery Adrian Hunter
` (5 subsequent siblings)
7 siblings, 1 reply; 21+ messages in thread
From: Adrian Hunter @ 2023-11-03 8:47 UTC (permalink / raw)
To: Ulf Hansson, Kornel Dulęba, Radoslaw Biernacki,
Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu, linux-mmc,
linux-kernel
Failing to halt complicates the recovery. Additionally, unless the card or
controller are stuck, which is expected to be very rare, then the halt
should succeed, so it is better to wait. Set a large timeout.
Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled host")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
drivers/mmc/host/cqhci-core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
index b3d7d6d8d654..15f5a069af1f 100644
--- a/drivers/mmc/host/cqhci-core.c
+++ b/drivers/mmc/host/cqhci-core.c
@@ -984,10 +984,10 @@ static bool cqhci_halt(struct mmc_host *mmc, unsigned int timeout)
/*
* After halting we expect to be able to use the command line. We interpret the
* failure to halt to mean the data lines might still be in use (and the upper
- * layers will need to send a STOP command), so we set the timeout based on a
- * generous command timeout.
+ * layers will need to send a STOP command), however failing to halt complicates
+ * the recovery, so set a timeout that would reasonably allow I/O to complete.
*/
-#define CQHCI_START_HALT_TIMEOUT 5
+#define CQHCI_START_HALT_TIMEOUT 500
static void cqhci_recovery_start(struct mmc_host *mmc)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH V2 3/6] mmc: block: Be sure to wait while busy in CQE error recovery
2023-11-03 8:47 [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery Adrian Hunter
2023-11-03 8:47 ` [PATCH V2 1/6] mmc: block: Do not lose cache flush during CQE error recovery Adrian Hunter
2023-11-03 8:47 ` [PATCH V2 2/6] mmc: cqhci: Increase recovery halt timeout Adrian Hunter
@ 2023-11-03 8:47 ` Adrian Hunter
2023-11-03 10:48 ` Avri Altman
` (2 more replies)
2023-11-03 8:47 ` [PATCH V2 4/6] mmc: block: Retry commands " Adrian Hunter
` (4 subsequent siblings)
7 siblings, 3 replies; 21+ messages in thread
From: Adrian Hunter @ 2023-11-03 8:47 UTC (permalink / raw)
To: Ulf Hansson, Kornel Dulęba, Radoslaw Biernacki,
Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu, linux-mmc,
linux-kernel
STOP command does not guarantee to wait while busy, but subsequent command
MMC_CMDQ_TASK_MGMT to discard the queue will fail if the card is busy, so
be sure to wait by employing mmc_poll_for_busy().
Fixes: 72a5af554df8 ("mmc: core: Add support for handling CQE requests")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
drivers/mmc/core/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 3d3e0ca52614..befde2bd26d3 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -553,6 +553,8 @@ int mmc_cqe_recovery(struct mmc_host *host)
cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
mmc_wait_for_cmd(host, &cmd, 0);
+ mmc_poll_for_busy(host->card, MMC_CQE_RECOVERY_TIMEOUT, true, MMC_BUSY_IO);
+
memset(&cmd, 0, sizeof(cmd));
cmd.opcode = MMC_CMDQ_TASK_MGMT;
cmd.arg = 1; /* Discard entire queue */
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH V2 4/6] mmc: block: Retry commands in CQE error recovery
2023-11-03 8:47 [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery Adrian Hunter
` (2 preceding siblings ...)
2023-11-03 8:47 ` [PATCH V2 3/6] mmc: block: Be sure to wait while busy in CQE error recovery Adrian Hunter
@ 2023-11-03 8:47 ` Adrian Hunter
2023-11-06 7:37 ` Avri Altman
2023-11-03 8:47 ` [PATCH V2 5/6] mmc: cqhci: Warn of halt or task clear failure Adrian Hunter
` (3 subsequent siblings)
7 siblings, 1 reply; 21+ messages in thread
From: Adrian Hunter @ 2023-11-03 8:47 UTC (permalink / raw)
To: Ulf Hansson, Kornel Dulęba, Radoslaw Biernacki,
Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu, linux-mmc,
linux-kernel
It is important that MMC_CMDQ_TASK_MGMT command to discard the queue is
successful because otherwise a subsequent reset might fail to flush the
cache first. Retry it and the previous STOP command.
Fixes: 72a5af554df8 ("mmc: core: Add support for handling CQE requests")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
drivers/mmc/core/core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index befde2bd26d3..a8c17b4cd737 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -551,7 +551,7 @@ int mmc_cqe_recovery(struct mmc_host *host)
cmd.flags = MMC_RSP_R1B | MMC_CMD_AC;
cmd.flags &= ~MMC_RSP_CRC; /* Ignore CRC */
cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
- mmc_wait_for_cmd(host, &cmd, 0);
+ mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
mmc_poll_for_busy(host->card, MMC_CQE_RECOVERY_TIMEOUT, true, MMC_BUSY_IO);
@@ -561,10 +561,13 @@ int mmc_cqe_recovery(struct mmc_host *host)
cmd.flags = MMC_RSP_R1B | MMC_CMD_AC;
cmd.flags &= ~MMC_RSP_CRC; /* Ignore CRC */
cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
- err = mmc_wait_for_cmd(host, &cmd, 0);
+ err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
host->cqe_ops->cqe_recovery_finish(host);
+ if (err)
+ err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
+
mmc_retune_release(host);
return err;
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH V2 5/6] mmc: cqhci: Warn of halt or task clear failure
2023-11-03 8:47 [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery Adrian Hunter
` (3 preceding siblings ...)
2023-11-03 8:47 ` [PATCH V2 4/6] mmc: block: Retry commands " Adrian Hunter
@ 2023-11-03 8:47 ` Adrian Hunter
2023-11-03 11:58 ` Avri Altman
2023-11-06 7:38 ` Avri Altman
2023-11-03 8:47 ` [PATCH V2 6/6] mmc: cqhci: Fix task clearing in CQE error recovery Adrian Hunter
` (2 subsequent siblings)
7 siblings, 2 replies; 21+ messages in thread
From: Adrian Hunter @ 2023-11-03 8:47 UTC (permalink / raw)
To: Ulf Hansson, Kornel Dulęba, Radoslaw Biernacki,
Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu, linux-mmc,
linux-kernel
A correctly operating controller should successfully halt and clear tasks.
Failure may result in errors elsewhere, so promote messages from debug to
warnings.
Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled host")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
drivers/mmc/host/cqhci-core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
index 15f5a069af1f..948799a0980c 100644
--- a/drivers/mmc/host/cqhci-core.c
+++ b/drivers/mmc/host/cqhci-core.c
@@ -942,8 +942,8 @@ static bool cqhci_clear_all_tasks(struct mmc_host *mmc, unsigned int timeout)
ret = cqhci_tasks_cleared(cq_host);
if (!ret)
- pr_debug("%s: cqhci: Failed to clear tasks\n",
- mmc_hostname(mmc));
+ pr_warn("%s: cqhci: Failed to clear tasks\n",
+ mmc_hostname(mmc));
return ret;
}
@@ -976,7 +976,7 @@ static bool cqhci_halt(struct mmc_host *mmc, unsigned int timeout)
ret = cqhci_halted(cq_host);
if (!ret)
- pr_debug("%s: cqhci: Failed to halt\n", mmc_hostname(mmc));
+ pr_warn("%s: cqhci: Failed to halt\n", mmc_hostname(mmc));
return ret;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH V2 6/6] mmc: cqhci: Fix task clearing in CQE error recovery
2023-11-03 8:47 [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery Adrian Hunter
` (4 preceding siblings ...)
2023-11-03 8:47 ` [PATCH V2 5/6] mmc: cqhci: Warn of halt or task clear failure Adrian Hunter
@ 2023-11-03 8:47 ` Adrian Hunter
2023-11-09 9:23 ` Avri Altman
2023-11-03 10:10 ` [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery Avri Altman
2023-11-15 15:51 ` Ulf Hansson
7 siblings, 1 reply; 21+ messages in thread
From: Adrian Hunter @ 2023-11-03 8:47 UTC (permalink / raw)
To: Ulf Hansson, Kornel Dulęba, Radoslaw Biernacki,
Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu, linux-mmc,
linux-kernel
If a task completion notification (TCN) is received when there is no
outstanding task, the cqhci driver issues a "spurious TCN" warning. This
was observed to happen right after CQE error recovery.
When an error interrupt is received the driver runs recovery logic.
It halts the controller, clears all pending tasks, and then re-enables
it. On some platforms, like Intel Jasper Lake, a stale task completion
event was observed, regardless of the CQHCI_CLEAR_ALL_TASKS bit being set.
This results in either:
a) Spurious TC completion event for an empty slot.
b) Corrupted data being passed up the stack, as a result of premature
completion for a newly added task.
Rather than add a quirk for affected controllers, ensure tasks are cleared
by toggling CQHCI_ENABLE, which would happen anyway if
cqhci_clear_all_tasks() timed out. This is simpler and should be safe and
effective for all controllers.
Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled host")
Cc: stable@vger.kernel.org
Reported-by: Kornel Dulęba <korneld@chromium.org>
Tested-by: Kornel Dulęba <korneld@chromium.org>
Co-developed-by: Kornel Dulęba <korneld@chromium.org>
Signed-off-by: Kornel Dulęba <korneld@chromium.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
drivers/mmc/host/cqhci-core.c | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
index 948799a0980c..41e94cd14109 100644
--- a/drivers/mmc/host/cqhci-core.c
+++ b/drivers/mmc/host/cqhci-core.c
@@ -1075,28 +1075,28 @@ static void cqhci_recovery_finish(struct mmc_host *mmc)
ok = cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT);
- if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT))
- ok = false;
-
/*
* The specification contradicts itself, by saying that tasks cannot be
* cleared if CQHCI does not halt, but if CQHCI does not halt, it should
* be disabled/re-enabled, but not to disable before clearing tasks.
* Have a go anyway.
*/
- if (!ok) {
- pr_debug("%s: cqhci: disable / re-enable\n", mmc_hostname(mmc));
- cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
- cqcfg &= ~CQHCI_ENABLE;
- cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
- cqcfg |= CQHCI_ENABLE;
- cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
- /* Be sure that there are no tasks */
- ok = cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT);
- if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT))
- ok = false;
- WARN_ON(!ok);
- }
+ if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT))
+ ok = false;
+
+ /* Disable to make sure tasks really are cleared */
+ cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
+ cqcfg &= ~CQHCI_ENABLE;
+ cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+
+ cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
+ cqcfg |= CQHCI_ENABLE;
+ cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+
+ cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT);
+
+ if (!ok)
+ cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT);
cqhci_recover_mrqs(cq_host);
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* RE: [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery
2023-11-03 8:47 [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery Adrian Hunter
` (5 preceding siblings ...)
2023-11-03 8:47 ` [PATCH V2 6/6] mmc: cqhci: Fix task clearing in CQE error recovery Adrian Hunter
@ 2023-11-03 10:10 ` Avri Altman
2023-11-06 6:38 ` Adrian Hunter
2023-11-15 15:51 ` Ulf Hansson
7 siblings, 1 reply; 21+ messages in thread
From: Avri Altman @ 2023-11-03 10:10 UTC (permalink / raw)
To: Adrian Hunter, Ulf Hansson, Kornel Dulęba,
Radoslaw Biernacki, Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu,
linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org
Does the double "recovery" in the subject intentional?
Thanks,
Avri
> Hi
>
> Some issues have been found with CQE error recovery. Here are some fixes.
>
> As of V2, the alternative implementation for the patch from Kornel Dulęba:
>
> https://lore.kernel.org/linux-mmc/e7c12e07-7540-47ea-8891-
> 2cec73d58df1@intel.com/T/#u
>
> is now included, see patch 6 "mmc: cqhci: Fix task clearing in CQE error
> recovery")
>
> Please also note ->post_disable() seems to be missing from
> cqhci_recovery_start(). It would be good if ->post_disable() users could
> check if this needs attention.
>
>
> Changes in V2:
>
> mmc: cqhci: Fix task clearing in CQE error recovery
> New patch
>
> mmc: cqhci: Warn of halt or task clear failure
> Add fixes and stable tags
>
>
> Adrian Hunter (6):
> mmc: block: Do not lose cache flush during CQE error recovery
> mmc: cqhci: Increase recovery halt timeout
> mmc: block: Be sure to wait while busy in CQE error recovery
> mmc: block: Retry commands in CQE error recovery
> mmc: cqhci: Warn of halt or task clear failure
> mmc: cqhci: Fix task clearing in CQE error recovery
>
> drivers/mmc/core/block.c | 2 ++
> drivers/mmc/core/core.c | 9 +++++++--
> drivers/mmc/host/cqhci-core.c | 44 +++++++++++++++++++++-----------------
> -----
> 3 files changed, 31 insertions(+), 24 deletions(-)
>
>
> Regards
> Adrian
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH V2 1/6] mmc: block: Do not lose cache flush during CQE error recovery
2023-11-03 8:47 ` [PATCH V2 1/6] mmc: block: Do not lose cache flush during CQE error recovery Adrian Hunter
@ 2023-11-03 10:18 ` Avri Altman
0 siblings, 0 replies; 21+ messages in thread
From: Avri Altman @ 2023-11-03 10:18 UTC (permalink / raw)
To: Adrian Hunter, Ulf Hansson, Kornel Dulęba,
Radoslaw Biernacki, Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu,
linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org
>
> During CQE error recovery, error-free data commands get requeued if there
> is any data left to transfer, but non-data commands are completed even
> though they have not been processed. Requeue them instead.
>
> Note the only non-data command is cache flush, which would have resulted
> in a cache flush being lost if it was queued at the time of CQE recovery.
>
> Fixes: 1e8e55b67030 ("mmc: block: Add CQE support")
> Cc: stable@vger.kernel.org
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
> ---
> drivers/mmc/core/block.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index
> 3a8f27c3e310..4a32b756b7d8 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -1482,6 +1482,8 @@ static void mmc_blk_cqe_complete_rq(struct
> mmc_queue *mq, struct request *req)
> blk_mq_requeue_request(req, true);
> else
> __blk_mq_end_request(req, BLK_STS_OK);
> + } else if (mq->in_recovery) {
> + blk_mq_requeue_request(req, true);
> } else {
> blk_mq_end_request(req, BLK_STS_OK);
> }
> --
> 2.34.1
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH V2 2/6] mmc: cqhci: Increase recovery halt timeout
2023-11-03 8:47 ` [PATCH V2 2/6] mmc: cqhci: Increase recovery halt timeout Adrian Hunter
@ 2023-11-03 10:37 ` Avri Altman
2023-11-06 6:57 ` Adrian Hunter
0 siblings, 1 reply; 21+ messages in thread
From: Avri Altman @ 2023-11-03 10:37 UTC (permalink / raw)
To: Adrian Hunter, Ulf Hansson, Kornel Dulęba,
Radoslaw Biernacki, Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu,
linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org
> Failing to halt complicates the recovery. Additionally, unless the card or
> controller are stuck, which is expected to be very rare, then the halt should
> succeed, so it is better to wait. Set a large timeout.
Maybe also explain that If task queuing is in progress, CQE needs to complete the operation, sending both commands and processing the responses.
>
> Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled
> host")
> Cc: stable@vger.kernel.org
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
> ---
> drivers/mmc/host/cqhci-core.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
> index b3d7d6d8d654..15f5a069af1f 100644
> --- a/drivers/mmc/host/cqhci-core.c
> +++ b/drivers/mmc/host/cqhci-core.c
> @@ -984,10 +984,10 @@ static bool cqhci_halt(struct mmc_host *mmc,
> unsigned int timeout)
> /*
> * After halting we expect to be able to use the command line. We interpret
> the
> * failure to halt to mean the data lines might still be in use (and the upper
> - * layers will need to send a STOP command), so we set the timeout based
> on a
> - * generous command timeout.
> + * layers will need to send a STOP command), however failing to halt
> + complicates
> + * the recovery, so set a timeout that would reasonably allow I/O to
> complete.
> */
> -#define CQHCI_START_HALT_TIMEOUT 5
> +#define CQHCI_START_HALT_TIMEOUT 500
>
> static void cqhci_recovery_start(struct mmc_host *mmc) {
> --
> 2.34.1
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH V2 3/6] mmc: block: Be sure to wait while busy in CQE error recovery
2023-11-03 8:47 ` [PATCH V2 3/6] mmc: block: Be sure to wait while busy in CQE error recovery Adrian Hunter
@ 2023-11-03 10:48 ` Avri Altman
2023-11-06 6:35 ` Adrian Hunter
2023-11-06 7:35 ` Avri Altman
2023-11-09 9:47 ` Christian Loehle
2 siblings, 1 reply; 21+ messages in thread
From: Avri Altman @ 2023-11-03 10:48 UTC (permalink / raw)
To: Adrian Hunter, Ulf Hansson, Kornel Dulęba,
Radoslaw Biernacki, Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu,
linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org
> STOP command does not guarantee to wait while busy, but subsequent
> command MMC_CMDQ_TASK_MGMT to discard the queue will fail if the
> card is busy, so be sure to wait by employing mmc_poll_for_busy().
Doesn't the Task Discard Sequence expects you to check CQDPT[i]==1
before sending MMC_CMDQ_TASK_MGMT to discard task id i?
Thanks,
Avri
>
> Fixes: 72a5af554df8 ("mmc: core: Add support for handling CQE requests")
> Cc: stable@vger.kernel.org
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
> drivers/mmc/core/core.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index
> 3d3e0ca52614..befde2bd26d3 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -553,6 +553,8 @@ int mmc_cqe_recovery(struct mmc_host *host)
> cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
> mmc_wait_for_cmd(host, &cmd, 0);
>
> + mmc_poll_for_busy(host->card, MMC_CQE_RECOVERY_TIMEOUT,
> true,
> + MMC_BUSY_IO);
> +
> memset(&cmd, 0, sizeof(cmd));
> cmd.opcode = MMC_CMDQ_TASK_MGMT;
> cmd.arg = 1; /* Discard entire queue */
> --
> 2.34.1
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH V2 5/6] mmc: cqhci: Warn of halt or task clear failure
2023-11-03 8:47 ` [PATCH V2 5/6] mmc: cqhci: Warn of halt or task clear failure Adrian Hunter
@ 2023-11-03 11:58 ` Avri Altman
2023-11-06 7:38 ` Avri Altman
1 sibling, 0 replies; 21+ messages in thread
From: Avri Altman @ 2023-11-03 11:58 UTC (permalink / raw)
To: Adrian Hunter, Ulf Hansson, Kornel Dulęba,
Radoslaw Biernacki, Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu,
linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org
> A correctly operating controller should successfully halt and clear tasks.
> Failure may result in errors elsewhere, so promote messages from debug to
> warnings.
>
> Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled
> host")
> Cc: stable@vger.kernel.org
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
> ---
> drivers/mmc/host/cqhci-core.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
> index 15f5a069af1f..948799a0980c 100644
> --- a/drivers/mmc/host/cqhci-core.c
> +++ b/drivers/mmc/host/cqhci-core.c
> @@ -942,8 +942,8 @@ static bool cqhci_clear_all_tasks(struct mmc_host
> *mmc, unsigned int timeout)
> ret = cqhci_tasks_cleared(cq_host);
>
> if (!ret)
> - pr_debug("%s: cqhci: Failed to clear tasks\n",
> - mmc_hostname(mmc));
> + pr_warn("%s: cqhci: Failed to clear tasks\n",
> + mmc_hostname(mmc));
>
> return ret;
> }
> @@ -976,7 +976,7 @@ static bool cqhci_halt(struct mmc_host *mmc,
> unsigned int timeout)
> ret = cqhci_halted(cq_host);
>
> if (!ret)
> - pr_debug("%s: cqhci: Failed to halt\n", mmc_hostname(mmc));
> + pr_warn("%s: cqhci: Failed to halt\n",
> + mmc_hostname(mmc));
>
> return ret;
> }
> --
> 2.34.1
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V2 3/6] mmc: block: Be sure to wait while busy in CQE error recovery
2023-11-03 10:48 ` Avri Altman
@ 2023-11-06 6:35 ` Adrian Hunter
0 siblings, 0 replies; 21+ messages in thread
From: Adrian Hunter @ 2023-11-06 6:35 UTC (permalink / raw)
To: Avri Altman, Ulf Hansson, Kornel Dulęba, Radoslaw Biernacki,
Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu,
linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org
On 3/11/23 12:48, Avri Altman wrote:
>> STOP command does not guarantee to wait while busy, but subsequent
>> command MMC_CMDQ_TASK_MGMT to discard the queue will fail if the
>> card is busy, so be sure to wait by employing mmc_poll_for_busy().
> Doesn't the Task Discard Sequence expects you to check CQDPT[i]==1
> before sending MMC_CMDQ_TASK_MGMT to discard task id i?
We do not clear individual tasks. Instead the MMC_CMDQ_TASK_MGMT is
sent with the op-code to "discard entire queue", which will also
work even if the queue is empty. Refer JESD84-B51A,
6.6.39.6 CMDQ_TASK_MGMT and Table 43 — Task Management op-codes.
>
> Thanks,
> Avri
>
>>
>> Fixes: 72a5af554df8 ("mmc: core: Add support for handling CQE requests")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>> drivers/mmc/core/core.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index
>> 3d3e0ca52614..befde2bd26d3 100644
>> --- a/drivers/mmc/core/core.c
>> +++ b/drivers/mmc/core/core.c
>> @@ -553,6 +553,8 @@ int mmc_cqe_recovery(struct mmc_host *host)
>> cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
>> mmc_wait_for_cmd(host, &cmd, 0);
>>
>> + mmc_poll_for_busy(host->card, MMC_CQE_RECOVERY_TIMEOUT,
>> true,
>> + MMC_BUSY_IO);
>> +
>> memset(&cmd, 0, sizeof(cmd));
>> cmd.opcode = MMC_CMDQ_TASK_MGMT;
>> cmd.arg = 1; /* Discard entire queue */
>> --
>> 2.34.1
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery
2023-11-03 10:10 ` [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery Avri Altman
@ 2023-11-06 6:38 ` Adrian Hunter
0 siblings, 0 replies; 21+ messages in thread
From: Adrian Hunter @ 2023-11-06 6:38 UTC (permalink / raw)
To: Avri Altman, Ulf Hansson, Kornel Dulęba, Radoslaw Biernacki,
Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu,
linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org
On 3/11/23 12:10, Avri Altman wrote:
> Does the double "recovery" in the subject intentional?
No, must be an echo in here
>
> Thanks,
> Avri
>
>> Hi
>>
>> Some issues have been found with CQE error recovery. Here are some fixes.
>>
>> As of V2, the alternative implementation for the patch from Kornel Dulęba:
>>
>> https://lore.kernel.org/linux-mmc/e7c12e07-7540-47ea-8891-
>> 2cec73d58df1@intel.com/T/#u
>>
>> is now included, see patch 6 "mmc: cqhci: Fix task clearing in CQE error
>> recovery")
>>
>> Please also note ->post_disable() seems to be missing from
>> cqhci_recovery_start(). It would be good if ->post_disable() users could
>> check if this needs attention.
>>
>>
>> Changes in V2:
>>
>> mmc: cqhci: Fix task clearing in CQE error recovery
>> New patch
>>
>> mmc: cqhci: Warn of halt or task clear failure
>> Add fixes and stable tags
>>
>>
>> Adrian Hunter (6):
>> mmc: block: Do not lose cache flush during CQE error recovery
>> mmc: cqhci: Increase recovery halt timeout
>> mmc: block: Be sure to wait while busy in CQE error recovery
>> mmc: block: Retry commands in CQE error recovery
>> mmc: cqhci: Warn of halt or task clear failure
>> mmc: cqhci: Fix task clearing in CQE error recovery
>>
>> drivers/mmc/core/block.c | 2 ++
>> drivers/mmc/core/core.c | 9 +++++++--
>> drivers/mmc/host/cqhci-core.c | 44 +++++++++++++++++++++-----------------
>> -----
>> 3 files changed, 31 insertions(+), 24 deletions(-)
>>
>>
>> Regards
>> Adrian
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V2 2/6] mmc: cqhci: Increase recovery halt timeout
2023-11-03 10:37 ` Avri Altman
@ 2023-11-06 6:57 ` Adrian Hunter
0 siblings, 0 replies; 21+ messages in thread
From: Adrian Hunter @ 2023-11-06 6:57 UTC (permalink / raw)
To: Avri Altman, Ulf Hansson, Kornel Dulęba, Radoslaw Biernacki,
Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu,
linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org
On 3/11/23 12:37, Avri Altman wrote:
>> Failing to halt complicates the recovery. Additionally, unless the card or
>> controller are stuck, which is expected to be very rare, then the halt should
>> succeed, so it is better to wait. Set a large timeout.
> Maybe also explain that If task queuing is in progress, CQE needs to complete the operation, sending both commands and processing the responses.
True, although those commands should be quite quick.
>
>>
>> Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled
>> host")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> Reviewed-by: Avri Altman <avri.altman@wdc.com>
>
>> ---
>> drivers/mmc/host/cqhci-core.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
>> index b3d7d6d8d654..15f5a069af1f 100644
>> --- a/drivers/mmc/host/cqhci-core.c
>> +++ b/drivers/mmc/host/cqhci-core.c
>> @@ -984,10 +984,10 @@ static bool cqhci_halt(struct mmc_host *mmc,
>> unsigned int timeout)
>> /*
>> * After halting we expect to be able to use the command line. We interpret
>> the
>> * failure to halt to mean the data lines might still be in use (and the upper
>> - * layers will need to send a STOP command), so we set the timeout based
>> on a
>> - * generous command timeout.
>> + * layers will need to send a STOP command), however failing to halt
>> + complicates
>> + * the recovery, so set a timeout that would reasonably allow I/O to
>> complete.
>> */
>> -#define CQHCI_START_HALT_TIMEOUT 5
>> +#define CQHCI_START_HALT_TIMEOUT 500
>>
>> static void cqhci_recovery_start(struct mmc_host *mmc) {
>> --
>> 2.34.1
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH V2 3/6] mmc: block: Be sure to wait while busy in CQE error recovery
2023-11-03 8:47 ` [PATCH V2 3/6] mmc: block: Be sure to wait while busy in CQE error recovery Adrian Hunter
2023-11-03 10:48 ` Avri Altman
@ 2023-11-06 7:35 ` Avri Altman
2023-11-09 9:47 ` Christian Loehle
2 siblings, 0 replies; 21+ messages in thread
From: Avri Altman @ 2023-11-06 7:35 UTC (permalink / raw)
To: Adrian Hunter, Ulf Hansson, Kornel Dulęba,
Radoslaw Biernacki, Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu,
linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org
>
> STOP command does not guarantee to wait while busy, but subsequent
> command MMC_CMDQ_TASK_MGMT to discard the queue will fail if the
> card is busy, so be sure to wait by employing mmc_poll_for_busy().
>
> Fixes: 72a5af554df8 ("mmc: core: Add support for handling CQE requests")
> Cc: stable@vger.kernel.org
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
> ---
> drivers/mmc/core/core.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index
> 3d3e0ca52614..befde2bd26d3 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -553,6 +553,8 @@ int mmc_cqe_recovery(struct mmc_host *host)
> cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
> mmc_wait_for_cmd(host, &cmd, 0);
>
> + mmc_poll_for_busy(host->card, MMC_CQE_RECOVERY_TIMEOUT,
> true,
> + MMC_BUSY_IO);
> +
> memset(&cmd, 0, sizeof(cmd));
> cmd.opcode = MMC_CMDQ_TASK_MGMT;
> cmd.arg = 1; /* Discard entire queue */
> --
> 2.34.1
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH V2 4/6] mmc: block: Retry commands in CQE error recovery
2023-11-03 8:47 ` [PATCH V2 4/6] mmc: block: Retry commands " Adrian Hunter
@ 2023-11-06 7:37 ` Avri Altman
0 siblings, 0 replies; 21+ messages in thread
From: Avri Altman @ 2023-11-06 7:37 UTC (permalink / raw)
To: Adrian Hunter, Ulf Hansson, Kornel Dulęba,
Radoslaw Biernacki, Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu,
linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org
> It is important that MMC_CMDQ_TASK_MGMT command to discard the
> queue is successful because otherwise a subsequent reset might fail to flush
> the cache first. Retry it and the previous STOP command.
>
> Fixes: 72a5af554df8 ("mmc: core: Add support for handling CQE requests")
> Cc: stable@vger.kernel.org
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
> ---
> drivers/mmc/core/core.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index
> befde2bd26d3..a8c17b4cd737 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -551,7 +551,7 @@ int mmc_cqe_recovery(struct mmc_host *host)
> cmd.flags = MMC_RSP_R1B | MMC_CMD_AC;
> cmd.flags &= ~MMC_RSP_CRC; /* Ignore CRC */
> cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
> - mmc_wait_for_cmd(host, &cmd, 0);
> + mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
>
> mmc_poll_for_busy(host->card, MMC_CQE_RECOVERY_TIMEOUT, true,
> MMC_BUSY_IO);
>
> @@ -561,10 +561,13 @@ int mmc_cqe_recovery(struct mmc_host *host)
> cmd.flags = MMC_RSP_R1B | MMC_CMD_AC;
> cmd.flags &= ~MMC_RSP_CRC; /* Ignore CRC */
> cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
> - err = mmc_wait_for_cmd(host, &cmd, 0);
> + err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
>
> host->cqe_ops->cqe_recovery_finish(host);
>
> + if (err)
> + err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
> +
> mmc_retune_release(host);
>
> return err;
> --
> 2.34.1
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH V2 5/6] mmc: cqhci: Warn of halt or task clear failure
2023-11-03 8:47 ` [PATCH V2 5/6] mmc: cqhci: Warn of halt or task clear failure Adrian Hunter
2023-11-03 11:58 ` Avri Altman
@ 2023-11-06 7:38 ` Avri Altman
1 sibling, 0 replies; 21+ messages in thread
From: Avri Altman @ 2023-11-06 7:38 UTC (permalink / raw)
To: Adrian Hunter, Ulf Hansson, Kornel Dulęba,
Radoslaw Biernacki, Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu,
linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org
> A correctly operating controller should successfully halt and clear tasks.
> Failure may result in errors elsewhere, so promote messages from debug to
> warnings.
>
> Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled
> host")
> Cc: stable@vger.kernel.org
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
> ---
> drivers/mmc/host/cqhci-core.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
> index 15f5a069af1f..948799a0980c 100644
> --- a/drivers/mmc/host/cqhci-core.c
> +++ b/drivers/mmc/host/cqhci-core.c
> @@ -942,8 +942,8 @@ static bool cqhci_clear_all_tasks(struct mmc_host
> *mmc, unsigned int timeout)
> ret = cqhci_tasks_cleared(cq_host);
>
> if (!ret)
> - pr_debug("%s: cqhci: Failed to clear tasks\n",
> - mmc_hostname(mmc));
> + pr_warn("%s: cqhci: Failed to clear tasks\n",
> + mmc_hostname(mmc));
>
> return ret;
> }
> @@ -976,7 +976,7 @@ static bool cqhci_halt(struct mmc_host *mmc,
> unsigned int timeout)
> ret = cqhci_halted(cq_host);
>
> if (!ret)
> - pr_debug("%s: cqhci: Failed to halt\n", mmc_hostname(mmc));
> + pr_warn("%s: cqhci: Failed to halt\n",
> + mmc_hostname(mmc));
>
> return ret;
> }
> --
> 2.34.1
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH V2 6/6] mmc: cqhci: Fix task clearing in CQE error recovery
2023-11-03 8:47 ` [PATCH V2 6/6] mmc: cqhci: Fix task clearing in CQE error recovery Adrian Hunter
@ 2023-11-09 9:23 ` Avri Altman
0 siblings, 0 replies; 21+ messages in thread
From: Avri Altman @ 2023-11-09 9:23 UTC (permalink / raw)
To: Adrian Hunter, Ulf Hansson, Kornel Dulęba,
Radoslaw Biernacki, Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu,
linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org
> If a task completion notification (TCN) is received when there is no
> outstanding task, the cqhci driver issues a "spurious TCN" warning. This was
> observed to happen right after CQE error recovery.
>
> When an error interrupt is received the driver runs recovery logic.
> It halts the controller, clears all pending tasks, and then re-enables it. On
> some platforms, like Intel Jasper Lake, a stale task completion event was
> observed, regardless of the CQHCI_CLEAR_ALL_TASKS bit being set.
>
> This results in either:
> a) Spurious TC completion event for an empty slot.
> b) Corrupted data being passed up the stack, as a result of premature
> completion for a newly added task.
>
> Rather than add a quirk for affected controllers, ensure tasks are cleared by
> toggling CQHCI_ENABLE, which would happen anyway if
> cqhci_clear_all_tasks() timed out. This is simpler and should be safe and
> effective for all controllers.
>
> Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled
> host")
> Cc: stable@vger.kernel.org
> Reported-by: Kornel Dulęba <korneld@chromium.org>
> Tested-by: Kornel Dulęba <korneld@chromium.org>
> Co-developed-by: Kornel Dulęba <korneld@chromium.org>
> Signed-off-by: Kornel Dulęba <korneld@chromium.org>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V2 3/6] mmc: block: Be sure to wait while busy in CQE error recovery
2023-11-03 8:47 ` [PATCH V2 3/6] mmc: block: Be sure to wait while busy in CQE error recovery Adrian Hunter
2023-11-03 10:48 ` Avri Altman
2023-11-06 7:35 ` Avri Altman
@ 2023-11-09 9:47 ` Christian Loehle
2 siblings, 0 replies; 21+ messages in thread
From: Christian Loehle @ 2023-11-09 9:47 UTC (permalink / raw)
To: Adrian Hunter, Ulf Hansson, Kornel Dulęba,
Radoslaw Biernacki, Gwendal Grignou, Asutosh Das
Cc: Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper, Haibo Chen,
Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih, Ben Chuang,
Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu, linux-mmc,
linux-kernel
On 03/11/2023 08:47, Adrian Hunter wrote:
> STOP command does not guarantee to wait while busy, but subsequent command
> MMC_CMDQ_TASK_MGMT to discard the queue will fail if the card is busy, so
> be sure to wait by employing mmc_poll_for_busy().
>
> Fixes: 72a5af554df8 ("mmc: core: Add support for handling CQE requests")
> Cc: stable@vger.kernel.org
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
> ---
> drivers/mmc/core/core.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 3d3e0ca52614..befde2bd26d3 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -553,6 +553,8 @@ int mmc_cqe_recovery(struct mmc_host *host)
> cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
> mmc_wait_for_cmd(host, &cmd, 0);
>
> + mmc_poll_for_busy(host->card, MMC_CQE_RECOVERY_TIMEOUT, true, MMC_BUSY_IO);
> +
> memset(&cmd, 0, sizeof(cmd));
> cmd.opcode = MMC_CMDQ_TASK_MGMT;
> cmd.arg = 1; /* Discard entire queue */
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery
2023-11-03 8:47 [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery Adrian Hunter
` (6 preceding siblings ...)
2023-11-03 10:10 ` [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery Avri Altman
@ 2023-11-15 15:51 ` Ulf Hansson
7 siblings, 0 replies; 21+ messages in thread
From: Ulf Hansson @ 2023-11-15 15:51 UTC (permalink / raw)
To: Adrian Hunter
Cc: Kornel Dulęba, Radoslaw Biernacki, Gwendal Grignou,
Asutosh Das, Chaotian Jing, Bhavya Kapoor, Kamal Dasu, Al Cooper,
Haibo Chen, Shaik Sajida Bhanu, Sai Krishna Potthuri, Victor Shih,
Ben Chuang, Thierry Reding, Aniruddha Tvs Rao, Chun-Hung Wu,
linux-mmc, linux-kernel
On Fri, 3 Nov 2023 at 09:48, Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Hi
>
> Some issues have been found with CQE error recovery. Here are some fixes.
>
> As of V2, the alternative implementation for the patch from Kornel Dulęba:
>
> https://lore.kernel.org/linux-mmc/e7c12e07-7540-47ea-8891-2cec73d58df1@intel.com/T/#u
>
> is now included, see patch 6 "mmc: cqhci: Fix task clearing in CQE error
> recovery")
>
> Please also note ->post_disable() seems to be missing from
> cqhci_recovery_start(). It would be good if ->post_disable()
> users could check if this needs attention.
>
>
> Changes in V2:
>
> mmc: cqhci: Fix task clearing in CQE error recovery
> New patch
>
> mmc: cqhci: Warn of halt or task clear failure
> Add fixes and stable tags
>
>
> Adrian Hunter (6):
> mmc: block: Do not lose cache flush during CQE error recovery
> mmc: cqhci: Increase recovery halt timeout
> mmc: block: Be sure to wait while busy in CQE error recovery
> mmc: block: Retry commands in CQE error recovery
> mmc: cqhci: Warn of halt or task clear failure
> mmc: cqhci: Fix task clearing in CQE error recovery
>
> drivers/mmc/core/block.c | 2 ++
> drivers/mmc/core/core.c | 9 +++++++--
> drivers/mmc/host/cqhci-core.c | 44 +++++++++++++++++++++----------------------
> 3 files changed, 31 insertions(+), 24 deletions(-)
>
>
Applied for fixes, thanks!
Kind regards
Uffe
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2023-11-15 15:53 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-03 8:47 [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery Adrian Hunter
2023-11-03 8:47 ` [PATCH V2 1/6] mmc: block: Do not lose cache flush during CQE error recovery Adrian Hunter
2023-11-03 10:18 ` Avri Altman
2023-11-03 8:47 ` [PATCH V2 2/6] mmc: cqhci: Increase recovery halt timeout Adrian Hunter
2023-11-03 10:37 ` Avri Altman
2023-11-06 6:57 ` Adrian Hunter
2023-11-03 8:47 ` [PATCH V2 3/6] mmc: block: Be sure to wait while busy in CQE error recovery Adrian Hunter
2023-11-03 10:48 ` Avri Altman
2023-11-06 6:35 ` Adrian Hunter
2023-11-06 7:35 ` Avri Altman
2023-11-09 9:47 ` Christian Loehle
2023-11-03 8:47 ` [PATCH V2 4/6] mmc: block: Retry commands " Adrian Hunter
2023-11-06 7:37 ` Avri Altman
2023-11-03 8:47 ` [PATCH V2 5/6] mmc: cqhci: Warn of halt or task clear failure Adrian Hunter
2023-11-03 11:58 ` Avri Altman
2023-11-06 7:38 ` Avri Altman
2023-11-03 8:47 ` [PATCH V2 6/6] mmc: cqhci: Fix task clearing in CQE error recovery Adrian Hunter
2023-11-09 9:23 ` Avri Altman
2023-11-03 10:10 ` [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery Avri Altman
2023-11-06 6:38 ` Adrian Hunter
2023-11-15 15:51 ` Ulf Hansson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox