linux-mmc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Adrian Hunter <adrian.hunter@intel.com>
To: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Ulf Hansson <ulf.hansson@linaro.org>,
	Marcin Wojtas <mw@semihalf.com>,
	Gregory CLEMENT <gregory.clement@free-electrons.com>,
	Shawn Guo <shawnguo@kernel.org>,
	Sascha Hauer <kernel@pengutronix.de>,
	linux-mmc@vger.kernel.org
Subject: Re: [PATCH v2 07/24] mmc: sdhci: command response CRC error handling
Date: Mon, 4 Jan 2016 13:24:34 +0200	[thread overview]
Message-ID: <568A5672.4080402@intel.com> (raw)
In-Reply-To: <20160102122508.GW8644@n2100.arm.linux.org.uk>

On 02/01/16 14:25, Russell King - ARM Linux wrote:
> On Tue, Dec 29, 2015 at 03:08:20PM +0200, Adrian Hunter wrote:
>> On 21/12/15 13:40, Russell King wrote:
>>> When we get a response CRC error on a command, it means that the
>>> response we received back from the card was not correct.  It does not
>>> mean that the card did not receive the command correctly.  If the
>>
>> Pedantically, if the timeout bit is set as well (CMD line conflict),
>> it does mean the card did not receive the command, so it should be coded
>> that way.
> 
> Good catch, the SDHCI spec contains a table which describes the CRC and
> timeout bit states, though it's not quite as you describe above...
> CRC and timeout indicates a command line conflict at some point.

In the case of CMD line conflict, the host controller aborts the command, so
presumably there will not be any data timeout.  Will you change it?

> 
>>> Fix this by handing a response CRC error slightly differently: record
>>> the failure of the data initiating command, but allow the remainder of
>>> the request to be processed normally.  This is safe as core MMC checks
>>
>> "processed normally" confused me at first because it sounded like you are
>> ignoring the error.  Not sure why you have a much better explanation in the
>> cover email than here.
> 
> They're written at different times?  I don't accept your comment though -
> "record the failure" _clearly_ does not mean that we're ignoring the error.
> 
>>> the status of all commands and data transfer phases of the request.
>>
>> MMC core is not the only initiator of requests, but it is safe because the
>> command error takes precedence by design.
>>
>> Also you don't explain why it is better to continue rather than attempt to
>> send a stop command and clean up the request properly.  It looks simpler and
>> less racy, but if that is the reason then it seems worth saying so.
> 
> This patch results from the analysis of failures seen on iMX6 hardware,
> where the card has entered data mode, and started to send its data.
> Right now, this screws up the next command.
> 
>>> If the card does not initiate a data transfer, then we should time out
>>> according to the data transfer parameters.
>>>
>>> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
>>> ---
>>>  drivers/mmc/host/sdhci.c | 17 +++++++++++++++++
>>>  1 file changed, 17 insertions(+)
>>>
>>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
>>> index 86310b162304..3e718e465a1b 100644
>>> --- a/drivers/mmc/host/sdhci.c
>>> +++ b/drivers/mmc/host/sdhci.c
>>> @@ -2340,6 +2340,23 @@ static void sdhci_cmd_irq(struct sdhci_host *host, u32 intmask, u32 *mask)
>>>  		else
>>>  			host->cmd->error = -EILSEQ;
>>>  
>>> +		/*
>>> +		 * If this command initiates a data phase and a response
>>> +		 * CRC error is signalled, the card can start transferring
>>> +		 * data - the card may have received the command without
>>> +		 * error.  We must not terminate the request early.
>>
>> This is misleading.  We could terminate the request early if we cleaned it
>> up.  You should say here why it is better to continue.
> 
> That is _not_ misleading, it is entirely accurate.  What the code
> currently does when it encounters a CRC error is it terminates the
> _request_ early.  The _request_ being "struct mmc_request" - and
> it terminates it _without_ sending a STOP command.

Sure, but the person reading the comment not should have to know the history
of the code to interpret it.  But it is not a big thing - the comment could
just be:

	We must not terminate early because we don't bother to clean up.

> 
> Resetting the host controller does not influence what state the card
> is in.
> 
> So what happens at the moment is that we send a command which initiates
> a data phase from the card.  The card responds with a valid response,
> and starts sending data to the host.  The host incorrectly receives
> the card response with a CRC error.
> 
> At this point, the code decides that it had a failure, queues the
> finish tasklet, which resets the SDHCI controller, leaving the card
> transmitting data to the host, potentially endlessly.  The driver
> reports to the MMC layer that the mmc_request is complete, and we
> get the next request to process.
> 
> We try sending the next request to the card, but the card is still
> sending data to the host...  That's the problem here.
> 
> Yes, sending a STOP command is one solution, but that's a far bigger
> change, one which is likely to be far more buggy based on the fact
> that the driver can send the STOP automatically.
> 
>>
>>> +		 *
>>> +		 * If the card did not receive the command, the data phase
>>> +		 * will time out.
>>> +		 *
>>> +		 * FIXME: we also need to clean up the data phase if any
>>> +		 * command fails, not just the data initiating command.
>>
>> This FIXME is too vague.  Please give at least one example of what
>> needs fixing.
> 
> I don't remember anymore, sorry.  I'll delete the fixme. :)
> 


  reply	other threads:[~2016-01-04 11:27 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-21 11:39 [PATCH v2 00/24] MMC/SDHCI fixes Russell King - ARM Linux
2015-12-21 11:40 ` [PATCH v2 01/24] mmc: core: shut up "voltage-ranges unspecified" pr_info() Russell King
2015-12-21 11:40 ` [PATCH v2 02/24] mmc: core: improve mmc_of_parse_voltage() to return better status Russell King
2015-12-21 11:40 ` [PATCH v2 03/24] mmc: block: shut up "retrying because a re-tune was needed" message Russell King
2015-12-21 11:40 ` [PATCH v2 04/24] mmc: core: report tuning command execution failure reason Russell King
2015-12-21 11:40 ` [PATCH v2 05/24] mmc: sdhci: move initialisation of command error member Russell King
2015-12-21 11:40 ` [PATCH v2 06/24] mmc: sdhci: clean up command error handling Russell King
2015-12-21 11:40 ` [PATCH v2 07/24] mmc: sdhci: command response CRC " Russell King
2015-12-29 13:08   ` Adrian Hunter
2016-01-02 12:25     ` Russell King - ARM Linux
2016-01-04 11:24       ` Adrian Hunter [this message]
2016-01-26 13:35         ` Russell King - ARM Linux
2015-12-21 11:41 ` [PATCH v2 08/24] mmc: sdhci: avoid unnecessary mapping/unmapping of align buffer Russell King
2015-12-29 13:44   ` Adrian Hunter
2016-01-02 12:29     ` Russell King - ARM Linux
2016-01-02 14:31       ` Russell King - ARM Linux
2016-01-04 11:41         ` Adrian Hunter
2016-01-04 11:50       ` Adrian Hunter
2016-01-04 11:56         ` Russell King - ARM Linux
2015-12-21 11:41 ` [PATCH v2 09/24] mmc: sdhci: clean up coding style in sdhci_adma_table_pre() Russell King
2015-12-21 11:41 ` [PATCH v2 10/24] mmc: sdhci: avoid walking SG list for writes Russell King
2015-12-21 11:41 ` [PATCH v2 11/24] mmc: sdhci: factor out common DMA cleanup in sdhci_finish_data() Russell King
2015-12-21 11:41 ` [PATCH v2 12/24] mmc: sdhci: move sdhci_pre_dma_transfer() Russell King
2015-12-21 11:41 ` [PATCH v2 13/24] mmc: sdhci: factor out sdhci_pre_dma_transfer() from sdhci_adma_table_pre() Russell King
2015-12-21 11:41 ` [PATCH v2 14/24] mmc: sdhci: pass the cookie into sdhci_pre_dma_transfer() Russell King
2015-12-21 11:41 ` [PATCH v2 15/24] mmc: sdhci: always unmap a mapped data transfer in sdhci_post_req() Russell King
2015-12-21 11:41 ` [PATCH v2 16/24] mmc: sdhci: clean up host cookie handling Russell King
2015-12-21 11:41 ` [PATCH v2 17/24] mmc: sdhci: plug DMA mapping leak on error Russell King
2015-12-21 11:41 ` [PATCH v2 18/24] mmc: sdhci-pxav3: fix higher speed mode capabilities Russell King
2015-12-21 11:54   ` Marcin Wojtas
2015-12-21 11:41 ` [PATCH v2 19/24] mmc: sdhci: further fix for DMA unmapping in sdhci_post_req() Russell King
2015-12-21 11:42 ` [PATCH v2 20/24] mmc: sdhci: fix data timeout (part 1) Russell King
2015-12-21 11:42 ` [PATCH v2 21/24] mmc: sdhci: fix data timeout (part 2) Russell King
2015-12-21 11:42 ` [PATCH v2 22/24] mmc: sdhci: prepare DMA address/size quirk handling consolidation Russell King
2015-12-21 11:42 ` [PATCH v2 23/24] mmc: sdhci: consolidate the DMA/ADMA size/address quicks Russell King
2015-12-21 11:42 ` [PATCH v2 24/24] mmc: sdhci: further code simplication Russell King
2015-12-21 12:35 ` [PATCH v2 00/24] MMC/SDHCI fixes Ulf Hansson
2015-12-21 12:51   ` Russell King - ARM Linux
2015-12-21 13:23     ` Ulf Hansson
2015-12-21 13:41       ` Russell King - ARM Linux
2015-12-21 13:59         ` Ulf Hansson
2015-12-22 11:25           ` Ulf Hansson
2015-12-22 11:40             ` Russell King - ARM Linux
2015-12-21 12:58 ` Russell King - ARM Linux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=568A5672.4080402@intel.com \
    --to=adrian.hunter@intel.com \
    --cc=gregory.clement@free-electrons.com \
    --cc=kernel@pengutronix.de \
    --cc=linux-mmc@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mw@semihalf.com \
    --cc=shawnguo@kernel.org \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).