public inbox for linux-mmc@vger.kernel.org
 help / color / mirror / Atom feed
From: Adrian Hunter <adrian.hunter@intel.com>
To: Peter Geis <pgwipeout@gmail.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>,
	Asutosh Das <asutoshd@codeaurora.org>,
	"linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org>,
	"open list:ARM/Rockchip SoC..."
	<linux-rockchip@lists.infradead.org>,
	nuumiofi@gmail.com, Michal Simek <michal.simek@xilinx.com>,
	Manish Narani <manish.narani@xilinx.com>
Subject: Re: [BUG] cqe unable to handle buggy cards
Date: Wed, 2 Dec 2020 10:05:32 +0200	[thread overview]
Message-ID: <2fd27057-1991-fd04-2004-0981e5cef4ce@intel.com> (raw)
In-Reply-To: <CAMdYzYrMJBs3=cwK404jQauN1OhntsVq3Ppcu=9-TwW_dsmqmw@mail.gmail.com>

On 16/11/20 8:30 pm, Peter Geis wrote:
> On Mon, Nov 16, 2020 at 3:22 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>> On 6/11/20 10:54 pm, Peter Geis wrote:
>>> On Wed, Nov 4, 2020 at 7:39 PM Peter Geis <pgwipeout@gmail.com> wrote:
>>>>
>>>> On Mon, Nov 2, 2020 at 11:32 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>>>>>
>>>>> On 2/11/20 4:54 pm, Ulf Hansson wrote:
>>>>>> + cqhci maintainers
>>>>>>
>>>>>> On Sat, 31 Oct 2020 at 13:54, Peter Geis <pgwipeout@gmail.com> wrote:
>>>>>>>
>>>>>>> Good Morning,
>>>>>>>
>>>>>>> We are seeing an issue on the rk3399 with certain Foresee emmc modules
>>>>>>> where the module reports it supports command queuing but fails in actual
>>>>>>> implementation.
>>>>>>>
>>>>>>> Unfortunately there doesn't seem to be any method for the mmc core code
>>>>>>> to detect this situation and disable command queue automatically.
>>>>>>> There also appears to be no way to disable it at runtime.
>>>>>
>>>>> Since v5.5, if you know how to use SDHCI debug quirks there is
>>>>> SDHCI_QUIRK_BROKEN_CQE
>>>>> e.g. kernel command line option sdhci.debug_quirks=0x0x2020000
>>>>
>>>> Thank you, we will test this.
>>>
>>> This does resolve the issue of disabling cqe entirely for debugging, thanks!
>>>
>>>>
>>>>>
>>>>>>>
>>>>>>> Certain modified kernels have added a patch to enable runtime disable of
>>>>>>> command queue entirely, but this will affect mmc core as a whole and not
>>>>>>> just the buggy card.
>>>>>>>
>>>>>>> Does anyone have any insight into this issue?
>>>>>>> Thank you for your time.
>>>>>>
>>>>>> Unfortunate, not me personally. I assume the issue is either be card
>>>>>> specific or host specific. Before looking at a disable option, we need
>>>>>> to know more about what goes wrong, I think.
>>>>>>
>>>>>> Kind regards
>>>>>> Uffe
>>>>>>
>>>>>>>
>>>>>>> Very Respectfully,
>>>>>>> Peter Geis
>>>>>>>
>>>>>>> [   64.472882] mmc2: cqhci: timeout for tag 2
>>>>>>> [   64.473349] mmc2: cqhci: ============ CQHCI REGISTER DUMP ===========
>>>>>>> [   64.474057] mmc2: cqhci: Caps:      0x00000000 | Version:  0x00000510
>>>>>>> [   64.474763] mmc2: cqhci: Config:    0x00000000 | Control:  0x00000000
>>>>>>> [   64.475468] mmc2: cqhci: Int stat:  0x00000000 | Int enab: 0x00000000
>>>>>>> [   64.476172] mmc2: cqhci: Int sig:   0x00000000 | Int Coal: 0x00000000
>>>>>>> [   64.476875] mmc2: cqhci: TDL base:  0x00000000 | TDL up32: 0x00000000
>>>>>
>>>>> TDL base cannot be zero, so the register values have been lost.
>>>>> Could be a reset issue like this one but for sdhci-of-arasan.c :
>>>>>
>>>>> https://lore.kernel.org/linux-mmc/20200819121848.16967-1-adrian.hunter@intel.com/
>>>>
>>>> Excellent, we will see if a similar implementation makes a difference
>>>> here for us as well.
>>>
>>> I wrote a patch to implement this in the arasan driver.
>>> https://paste.ee/p/cl5SX
>>> Nuumiofi tested it with the buggy card.
>>>
>>> The good news, it solves the register clearing issue.
>>> The bad news, the card still is broken, here is the kernel log:
>>> https://paste.ubuntu.com/p/sF2yMwxpcV/
>>
>> You will need to enable dynamic debug to get more messages e.g.
>>         kernel config: CONFIG_DYNAMIC_DEBUG=y
>>         kernel command line:
>> dyndbg="file drivers/mmc/core/* +p;file drivers/mmc/host/* +p"
>>
>> If it can't be made to work, you can set SDHCI_QUIRK_BROKEN_CQE in the
>> driver as needed.
> 
> nuumiofi has collected logs with dynamic debug enabled.
> He currently has four logs in a google drive folder:
> cqe disabled
> cqe enabled
> cqe disabled with reset patch applied
> cqe enabled with reset patch applied
> 
> https://drive.google.com/drive/folders/1m1DqzisxHH-6BiMkqR_ExLg9kgH1BDTy

Looks like some kind of issue with timeout settings:

[    1.728289] mmc2: sdhci: Too large timeout 0xf requested for CMD18!
[    1.736197] mmc2: sdhci: Too large timeout 0xf requested for CMD25!
[    1.802390] mmc2: sdhci: Too large timeout 0xf requested for CMD25!
[    1.810496] mmc2: sdhci: Too large timeout 0xf requested for CMD25!
[    1.818562] mmc2: sdhci: Too large timeout 0xf requested for CMD25!
[    1.826652] mmc2: sdhci: Too large timeout 0xf requested for CMD25!
[    1.834470] mmc2: sdhci: Too large timeout 0xf requested for CMD25!
[    1.842256] mmc2: sdhci: Too large timeout 0xf requested for CMD25!

[    4.878368] mmc2: cqhci: error IRQ status: 0x00000000 cmd error -110 data error 0 TERRI: 0x0000822c
[    4.880436] mmc2: failed to start CQE transfer for tag 3, error -16
[    4.884075] mmc2: sdhci: Too large timeout 0xf requested for CMD12!
[    4.886398] mmc2: sdhci: Too large timeout 0xf requested for CMD48!
[    4.976803] mmc2: sdhci: Too large timeout 0xf requested for CMD8!
[    5.032979] mmc2: cqhci: error IRQ status: 0x00000000 cmd error -110 data error 0 TERRI: 0x0000862c
[    5.035910] mmc2: failed to start CQE transfer for tag 0, error -16
[    5.039436] mmc2: sdhci: Too large timeout 0xf requested for CMD12!
[    5.041640] mmc2: sdhci: Too large timeout 0xf requested for CMD48!
[    5.048557] mmc2: cqhci: error IRQ status: 0x00000000 cmd error -110 data error 0 TERRI: 0x0000822c
[    5.051405] mmc2: failed to start CQE transfer for tag 3, error -16
[    5.054861] mmc2: sdhci: Too large timeout 0xf requested for CMD12!
[    5.056984] mmc2: sdhci: Too large timeout 0xf requested for CMD48!

Note error -110 is a timeout error

> 
>>
>>>
>>>>
>>>>>
>>>>>
>>>>>>> [   64.477578] mmc2: cqhci: Doorbell:  0x00000000 | TCN:      0x00000000
>>>>>>> [   64.478281] mmc2: cqhci: Dev queue: 0x00000000 | Dev Pend: 0x00000000
>>>>>>> [   64.478984] mmc2: cqhci: Task clr:  0x00000000 | SSC1:     0x00011000
>>>>>>> [   64.479687] mmc2: cqhci: SSC2:      0x00000000 | DCMD rsp: 0x00000000
>>>>>>> [   64.489785] mmc2: cqhci: RED mask:  0xfdf9a080 | TERRI:    0x00000000
>>>>>>> [   64.499774] mmc2: cqhci: Resp idx:  0x00000000 | Resp arg: 0x00000000
>>>>>>> [   64.509687] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
>>>>>>> [   64.519597] mmc2: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
>>>>>>> [   64.529521] mmc2: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000000
>>>>>>> [   64.539440] mmc2: sdhci: Argument:  0x00010000 | Trn mode: 0x00000010
>>>>>>> [   64.549352] mmc2: sdhci: Present:   0x1fff0000 | Host ctl: 0x00000034
>>>>>>> [   64.559277] mmc2: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
>>>>>>> [   64.569214] mmc2: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
>>>>>>> [   64.579061] mmc2: sdhci: Timeout:   0x0000000e | Int stat: 0x00000000
>>>>>>> [   64.588842] mmc2: sdhci: Int enab:  0x02ff4000 | Sig enab: 0x02ff4000
>>>>>>> [   64.598671] mmc2: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
>>>>>>> [   64.608446] mmc2: sdhci: Caps:      0x44edc880 | Caps_1:   0x800020f7
>>>>>>> [   64.618161] mmc2: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
>>>>>>> [   64.627801] mmc2: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x642017d7
>>>>>>> [   64.637376] mmc2: sdhci: Resp[2]:   0x4e436172 | Resp[3]:  0x00880103
>>>>>>> [   64.646855] mmc2: sdhci: Host ctl2: 0x00000083
>>>>>>> [   64.656080] mmc2: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0xf0628208
>>>>>>> [   64.665445] mmc2: sdhci: ============================================
>>>>>>> [   64.674998] mmc2: running CQE recovery
>>>>>>>
>>>>>>> [  125.912941] mmc2: cqhci: timeout for tag 3
>>>>>>> [  125.921978] mmc2: cqhci: ============ CQHCI REGISTER DUMP ===========
>>>>>>> [  125.931200] mmc2: cqhci: Caps:      0x00000000 | Version:  0x00000510
>>>>>>> [  125.940389] mmc2: cqhci: Config:    0x00000001 | Control:  0x00000000
>>>>>>> [  125.949499] mmc2: cqhci: Int stat:  0x00000000 | Int enab: 0x00000006
>>>>>>> [  125.958527] mmc2: cqhci: Int sig:   0x00000006 | Int Coal: 0x00000000
>>>>>>> [  125.967486] mmc2: cqhci: TDL base:  0x00000000 | TDL up32: 0x00000000
>>>>>>> [  125.976260] mmc2: cqhci: Doorbell:  0x00000000 | TCN:      0x00000000
>>>>>>> [  125.985065] mmc2: cqhci: Dev queue: 0x00000000 | Dev Pend: 0x00000000
>>>>>>> [  125.993698] mmc2: cqhci: Task clr:  0x00000000 | SSC1:     0x00011000
>>>>>>> [  126.002244] mmc2: cqhci: SSC2:      0x00000000 | DCMD rsp: 0x00000000
>>>>>>> [  126.010716] mmc2: cqhci: RED mask:  0xfdf9a080 | TERRI:    0x00000000
>>>>>>> [  126.019159] mmc2: cqhci: Resp idx:  0x00000000 | Resp arg: 0x00000000
>>>>>>> [  126.027525] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
>>>>>>> [  126.035955] mmc2: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
>>>>>>> [  126.044258] mmc2: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000000
>>>>>>> [  126.052396] mmc2: sdhci: Argument:  0x00000001 | Trn mode: 0x00000010
>>>>>>> [  126.060370] mmc2: sdhci: Present:   0x1fff0000 | Host ctl: 0x00000034
>>>>>>> [  126.068241] mmc2: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
>>>>>>> [  126.075978] mmc2: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
>>>>>>> [  126.083552] mmc2: sdhci: Timeout:   0x0000000e | Int stat: 0x00000000
>>>>>>> [  126.090937] mmc2: sdhci: Int enab:  0x02ff4000 | Sig enab: 0x02ff4000
>>>>>>> [  126.098219] mmc2: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
>>>>>>> [  126.105403] mmc2: sdhci: Caps:      0x44edc880 | Caps_1:   0x800020f7
>>>>>>> [  126.112649] mmc2: sdhci: Cmd:       0x00003013 | Max curr: 0x00000000
>>>>>>> [  126.119700] mmc2: sdhci: Resp[0]:   0x00400800 | Resp[1]:  0x642017d7
>>>>>>> [  126.126594] mmc2: sdhci: Resp[2]:   0x4e436172 | Resp[3]:  0x00880103
>>>>>>> [  126.133334] mmc2: sdhci: Host ctl2: 0x00000083
>>>>>>> [  126.139652] mmc2: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0xf0628208
>>>>>>> [  126.146008] mmc2: sdhci: ============================================
>>>>>>> [  126.152361] mmc2: running CQE recovery
>>>>>>>
>>>>>>>
>>>>>
>>


      reply	other threads:[~2020-12-02  8:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-31 12:54 [BUG] cqe unable to handle buggy cards Peter Geis
2020-11-02 14:54 ` Ulf Hansson
2020-11-02 16:32   ` Adrian Hunter
2020-11-05  0:39     ` Peter Geis
2020-11-06 20:54       ` Peter Geis
2020-11-16  8:21         ` Adrian Hunter
2020-11-16 18:30           ` Peter Geis
2020-12-02  8:05             ` Adrian Hunter [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2fd27057-1991-fd04-2004-0981e5cef4ce@intel.com \
    --to=adrian.hunter@intel.com \
    --cc=asutoshd@codeaurora.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=manish.narani@xilinx.com \
    --cc=michal.simek@xilinx.com \
    --cc=nuumiofi@gmail.com \
    --cc=pgwipeout@gmail.com \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox