From: Russell King - ARM Linux <linux@arm.linux.org.uk>
To: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>,
Marcin Wojtas <mw@semihalf.com>,
Gregory CLEMENT <gregory.clement@free-electrons.com>,
Shawn Guo <shawnguo@kernel.org>,
Sascha Hauer <kernel@pengutronix.de>,
linux-mmc@vger.kernel.org
Subject: Re: [PATCH v2 07/24] mmc: sdhci: command response CRC error handling
Date: Sat, 2 Jan 2016 12:25:09 +0000 [thread overview]
Message-ID: <20160102122508.GW8644@n2100.arm.linux.org.uk> (raw)
In-Reply-To: <568285C4.9070909@intel.com>
On Tue, Dec 29, 2015 at 03:08:20PM +0200, Adrian Hunter wrote:
> On 21/12/15 13:40, Russell King wrote:
> > When we get a response CRC error on a command, it means that the
> > response we received back from the card was not correct. It does not
> > mean that the card did not receive the command correctly. If the
>
> Pedantically, if the timeout bit is set as well (CMD line conflict),
> it does mean the card did not receive the command, so it should be coded
> that way.
Good catch, the SDHCI spec contains a table which describes the CRC and
timeout bit states, though it's not quite as you describe above...
CRC and timeout indicates a command line conflict at some point.
> > Fix this by handing a response CRC error slightly differently: record
> > the failure of the data initiating command, but allow the remainder of
> > the request to be processed normally. This is safe as core MMC checks
>
> "processed normally" confused me at first because it sounded like you are
> ignoring the error. Not sure why you have a much better explanation in the
> cover email than here.
They're written at different times? I don't accept your comment though -
"record the failure" _clearly_ does not mean that we're ignoring the error.
> > the status of all commands and data transfer phases of the request.
>
> MMC core is not the only initiator of requests, but it is safe because the
> command error takes precedence by design.
>
> Also you don't explain why it is better to continue rather than attempt to
> send a stop command and clean up the request properly. It looks simpler and
> less racy, but if that is the reason then it seems worth saying so.
This patch results from the analysis of failures seen on iMX6 hardware,
where the card has entered data mode, and started to send its data.
Right now, this screws up the next command.
> > If the card does not initiate a data transfer, then we should time out
> > according to the data transfer parameters.
> >
> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> > ---
> > drivers/mmc/host/sdhci.c | 17 +++++++++++++++++
> > 1 file changed, 17 insertions(+)
> >
> > diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> > index 86310b162304..3e718e465a1b 100644
> > --- a/drivers/mmc/host/sdhci.c
> > +++ b/drivers/mmc/host/sdhci.c
> > @@ -2340,6 +2340,23 @@ static void sdhci_cmd_irq(struct sdhci_host *host, u32 intmask, u32 *mask)
> > else
> > host->cmd->error = -EILSEQ;
> >
> > + /*
> > + * If this command initiates a data phase and a response
> > + * CRC error is signalled, the card can start transferring
> > + * data - the card may have received the command without
> > + * error. We must not terminate the request early.
>
> This is misleading. We could terminate the request early if we cleaned it
> up. You should say here why it is better to continue.
That is _not_ misleading, it is entirely accurate. What the code
currently does when it encounters a CRC error is it terminates the
_request_ early. The _request_ being "struct mmc_request" - and
it terminates it _without_ sending a STOP command.
Resetting the host controller does not influence what state the card
is in.
So what happens at the moment is that we send a command which initiates
a data phase from the card. The card responds with a valid response,
and starts sending data to the host. The host incorrectly receives
the card response with a CRC error.
At this point, the code decides that it had a failure, queues the
finish tasklet, which resets the SDHCI controller, leaving the card
transmitting data to the host, potentially endlessly. The driver
reports to the MMC layer that the mmc_request is complete, and we
get the next request to process.
We try sending the next request to the card, but the card is still
sending data to the host... That's the problem here.
Yes, sending a STOP command is one solution, but that's a far bigger
change, one which is likely to be far more buggy based on the fact
that the driver can send the STOP automatically.
>
> > + *
> > + * If the card did not receive the command, the data phase
> > + * will time out.
> > + *
> > + * FIXME: we also need to clean up the data phase if any
> > + * command fails, not just the data initiating command.
>
> This FIXME is too vague. Please give at least one example of what
> needs fixing.
I don't remember anymore, sorry. I'll delete the fixme. :)
--
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
next prev parent reply other threads:[~2016-01-02 12:25 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-21 11:39 [PATCH v2 00/24] MMC/SDHCI fixes Russell King - ARM Linux
2015-12-21 11:40 ` [PATCH v2 01/24] mmc: core: shut up "voltage-ranges unspecified" pr_info() Russell King
2015-12-21 11:40 ` [PATCH v2 02/24] mmc: core: improve mmc_of_parse_voltage() to return better status Russell King
2015-12-21 11:40 ` [PATCH v2 03/24] mmc: block: shut up "retrying because a re-tune was needed" message Russell King
2015-12-21 11:40 ` [PATCH v2 04/24] mmc: core: report tuning command execution failure reason Russell King
2015-12-21 11:40 ` [PATCH v2 05/24] mmc: sdhci: move initialisation of command error member Russell King
2015-12-21 11:40 ` [PATCH v2 06/24] mmc: sdhci: clean up command error handling Russell King
2015-12-21 11:40 ` [PATCH v2 07/24] mmc: sdhci: command response CRC " Russell King
2015-12-29 13:08 ` Adrian Hunter
2016-01-02 12:25 ` Russell King - ARM Linux [this message]
2016-01-04 11:24 ` Adrian Hunter
2016-01-26 13:35 ` Russell King - ARM Linux
2015-12-21 11:41 ` [PATCH v2 08/24] mmc: sdhci: avoid unnecessary mapping/unmapping of align buffer Russell King
2015-12-29 13:44 ` Adrian Hunter
2016-01-02 12:29 ` Russell King - ARM Linux
2016-01-02 14:31 ` Russell King - ARM Linux
2016-01-04 11:41 ` Adrian Hunter
2016-01-04 11:50 ` Adrian Hunter
2016-01-04 11:56 ` Russell King - ARM Linux
2015-12-21 11:41 ` [PATCH v2 09/24] mmc: sdhci: clean up coding style in sdhci_adma_table_pre() Russell King
2015-12-21 11:41 ` [PATCH v2 10/24] mmc: sdhci: avoid walking SG list for writes Russell King
2015-12-21 11:41 ` [PATCH v2 11/24] mmc: sdhci: factor out common DMA cleanup in sdhci_finish_data() Russell King
2015-12-21 11:41 ` [PATCH v2 12/24] mmc: sdhci: move sdhci_pre_dma_transfer() Russell King
2015-12-21 11:41 ` [PATCH v2 13/24] mmc: sdhci: factor out sdhci_pre_dma_transfer() from sdhci_adma_table_pre() Russell King
2015-12-21 11:41 ` [PATCH v2 14/24] mmc: sdhci: pass the cookie into sdhci_pre_dma_transfer() Russell King
2015-12-21 11:41 ` [PATCH v2 15/24] mmc: sdhci: always unmap a mapped data transfer in sdhci_post_req() Russell King
2015-12-21 11:41 ` [PATCH v2 16/24] mmc: sdhci: clean up host cookie handling Russell King
2015-12-21 11:41 ` [PATCH v2 17/24] mmc: sdhci: plug DMA mapping leak on error Russell King
2015-12-21 11:41 ` [PATCH v2 18/24] mmc: sdhci-pxav3: fix higher speed mode capabilities Russell King
2015-12-21 11:54 ` Marcin Wojtas
2015-12-21 11:41 ` [PATCH v2 19/24] mmc: sdhci: further fix for DMA unmapping in sdhci_post_req() Russell King
2015-12-21 11:42 ` [PATCH v2 20/24] mmc: sdhci: fix data timeout (part 1) Russell King
2015-12-21 11:42 ` [PATCH v2 21/24] mmc: sdhci: fix data timeout (part 2) Russell King
2015-12-21 11:42 ` [PATCH v2 22/24] mmc: sdhci: prepare DMA address/size quirk handling consolidation Russell King
2015-12-21 11:42 ` [PATCH v2 23/24] mmc: sdhci: consolidate the DMA/ADMA size/address quicks Russell King
2015-12-21 11:42 ` [PATCH v2 24/24] mmc: sdhci: further code simplication Russell King
2015-12-21 12:35 ` [PATCH v2 00/24] MMC/SDHCI fixes Ulf Hansson
2015-12-21 12:51 ` Russell King - ARM Linux
2015-12-21 13:23 ` Ulf Hansson
2015-12-21 13:41 ` Russell King - ARM Linux
2015-12-21 13:59 ` Ulf Hansson
2015-12-22 11:25 ` Ulf Hansson
2015-12-22 11:40 ` Russell King - ARM Linux
2015-12-21 12:58 ` Russell King - ARM Linux
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160102122508.GW8644@n2100.arm.linux.org.uk \
--to=linux@arm.linux.org.uk \
--cc=adrian.hunter@intel.com \
--cc=gregory.clement@free-electrons.com \
--cc=kernel@pengutronix.de \
--cc=linux-mmc@vger.kernel.org \
--cc=mw@semihalf.com \
--cc=shawnguo@kernel.org \
--cc=ulf.hansson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).