From: "Jorge Ramirez-Ortiz, Foundries" <jorge@foundries.io>
To: Adrian Hunter <adrian.hunter@intel.com>
Cc: "Jorge Ramirez-Ortiz, Foundries" <jorge@foundries.io>,
CLoehle@hyperstone.com, jinpu.wang@ionos.com, hare@suse.de,
Ulf Hansson <ulf.hansson@linaro.org>,
beanhuo@micron.com, yangyingliang@huawei.com, asuk4.q@gmail.com,
yibin.ding@unisoc.com, victor.shih@genesyslogic.com.tw,
marex@denx.de, rafael.beims@toradex.com, robimarko@gmail.com,
ricardo@foundries.io, linux-mmc@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCHv2] mmc: rpmb: add quirk MMC_QUIRK_BROKEN_RPMB_RETUNE
Date: Fri, 1 Dec 2023 16:54:18 +0100 [thread overview]
Message-ID: <ZWoBqs/5m6tCuBGo@trax> (raw)
In-Reply-To: <fecd033b-b2ea-4906-a320-22a5c2ede46c@intel.com>
On 01/12/23 13:46:25, Adrian Hunter wrote:
> On 1/12/23 09:40, Jorge Ramirez-Ortiz, Foundries wrote:
> > On 30/11/23 23:19:45, Jorge Ramirez-Ortiz, Foundries wrote:
> >> On 30/11/23 23:02:15, Jorge Ramirez-Ortiz, Foundries wrote:
> >>> On 30/11/23 21:12:28, Adrian Hunter wrote:
> >>>> On 30/11/23 15:24, Jorge Ramirez-Ortiz, Foundries wrote:
> >>>>> On 30/11/23 11:34:18, Ulf Hansson wrote:
> >>>>>> On Wed, 29 Nov 2023 at 17:05, Jorge Ramirez-Ortiz <jorge@foundries.io> wrote:
> >>>>>>>
> >>>>>>> On the eMMC SanDisk iNAND 7250 configured with HS200, requesting a
> >>>>>>> re-tune before switching to the RPMB partition would randomly cause
> >>>>>>> subsequent RPMB requests to fail with EILSEQ:
> >>>>>>> * data error -84, tigggered in __mmc_blk_ioctl_cmd()
> >>>>>>>
> >>>>>>> This commit skips the retune when switching to RPMB.
> >>>>>>> Tested over several days with per minute RPMB reads.
> >>>>>>
> >>>>>> This sounds weird to me and needs more testing/debugging in my
> >>>>>> opinion, especially at the host driver level. Perhaps add some new
> >>>>>> tests in mmc_test, that does a partition switch to/from any partition
> >>>>>> and then run regular I/O again to see if the problem is easier to
> >>>>>> reproduce?
> >>>>>
> >>>>> hi Uffe
> >>>>>
> >>>>> ok I'll have a look - I have never used this driver before, so if you
> >>>>> have anything in the works I'll be glad to integrated and adapt.
> >>>>>
> >>>>>>
> >>>>>> The point is, I wonder what is so special with RPMB here? Note that,
> >>>>>> it has been quite common that host drivers/controllers have had issues
> >>>>>> with their tuning support, so I would not be surprised if that is the
> >>>>>> case here too.
> >>>>>
> >>>>> Right, it is just that the tuning function for of-arasan is the generic
> >>>>> __sdhci_execute_tuning() - only wrapped around arasan DLL reset
> >>>>> calls. Hence why I aimed for the card: __sdhci_execute_tuning and ZynqMP
> >>>>> are not recent functions or architectures.
> >>>>>
> >>>>>
> >>>>>> Certainly I would be surprised if the problem is at
> >>>>>> the eMMC card side, but I may be wrong.
> >>>>>
> >>>>> How do maintainers test the tuning methods? is there anything else for
> >>>>> me to do other than forcing a retune with different partitions?
> >>>>>
> >>>>>>
> >>>>>> Kind regards
> >>>>>> Uffe
> >>>>>
> >>>>> For completeness this is the error message - notice that we have a
> >>>>> trusted application (fiovb) going through OP-TEE and back to the TEE
> >>>>> supplicant issuing an rpmb read of a variable (pretty normal these days,
> >>>>> we use it on many different platforms - ST, NXP, AMD/Xilinx, TI..).
> >>>>>
> >>>>> The issue on this Zynqmp platform is scarily simple to reproduce; you
> >>>>> can ignore the OP-TEE trace, it is just the TEE way of reporting that
> >>>>> the RPMB read failed.
> >>>>>
> >>>>> root@uz3cg-dwg-sec:/var/rootdirs/home/fio# fiovb_printenv m4hash
> >>>>> [ 461.775084] sdhci-arasan ff160000.mmc: __mmc_blk_ioctl_cmd: data error -84
> >>>>> E/TC:? 0
> >>>>> E/TC:? 0 TA panicked with code 0xffff0000
> >>>>> E/LD: Status of TA 22250a54-0bf1-48fe-8002-7b20f1c9c9b1
> >>>>> E/LD: arch: aarch64
> >>>>> E/LD: region 0: va 0xc0004000 pa 0x7e200000 size 0x002000 flags rw-s (ldelf)
> >>>>> E/LD: region 1: va 0xc0006000 pa 0x7e202000 size 0x008000 flags r-xs (ldelf)
> >>>>> E/LD: region 2: va 0xc000e000 pa 0x7e20a000 size 0x001000 flags rw-s (ldelf)
> >>>>> E/LD: region 3: va 0xc000f000 pa 0x7e20b000 size 0x004000 flags rw-s (ldelf)
> >>>>> E/LD: region 4: va 0xc0013000 pa 0x7e20f000 size 0x001000 flags r--s
> >>>>> E/LD: region 5: va 0xc0014000 pa 0x7e22c000 size 0x005000 flags rw-s (stack)
> >>>>> E/LD: region 6: va 0xc0019000 pa 0x816b31fc8 size 0x001000 flags rw-- (param)
> >>>>> E/LD: region 7: va 0xc001a000 pa 0x816aa1fc8 size 0x002000 flags rw-- (param)
> >>>>> E/LD: region 8: va 0xc006b000 pa 0x00001000 size 0x014000 flags r-xs [0]
> >>>>> E/LD: region 9: va 0xc007f000 pa 0x00015000 size 0x008000 flags rw-s [0]
> >>>>> E/LD: [0] 22250a54-0bf1-48fe-8002-7b20f1c9c9b1 @ 0xc006b000
> >>>>> E/LD: Call stack:
> >>>>> E/LD: 0xc006de58
> >>>>> E/LD: 0xc006b388
> >>>>> E/LD: 0xc006ed40
> >>>>> E/LD: 0xc006b624
> >>>>> Read persistent value for m4hash failed: Exec format error
> >>>>
> >>>> Have you tried dynamic debug for mmc
> >>>>
> >>>> Kernel must be configured:
> >>>>
> >>>> CONFIG_DYNAMIC_DEBUG=y
> >>>>
> >>>> To enable mmc debug via sysfs:
> >>>>
> >>>> echo 'file drivers/mmc/core/* +p' > /sys/kernel/debug/dynamic_debug/control
> >>>> echo 'file drivers/mmc/host/* +p' > /sys/kernel/debug/dynamic_debug/control
> >>>>
> >>>>
> >>>
> >>> hi Adrian
> >>>
> >>> Sure, this is the output of the trace:
> >>>
> >>> [ 422.018756] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.018789] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.018817] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.018848] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.018875] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.018902] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.018932] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.020013] mmc0: sdhci: IRQ status 0x00000001
> >>> [ 422.020027] mmc0: sdhci: IRQ status 0x00000002
> >>> [ 422.020034] mmc0: req done (CMD6): 0: 00000800 00000000 00000000 00000000
> >>> [ 422.020054] mmc0: starting CMD13 arg 00010000 flags 00000195
> >>> [ 422.020068] mmc0: sdhci: IRQ status 0x00000001
> >>> [ 422.020076] mmc0: req done (CMD13): 0: 00000900 00000000 00000000 00000000
> >>> [ 422.020092] <mmc0: starting CMD23 arg 00000001 flags 00000015>
> >>> [ 422.020101] mmc0: starting CMD25 arg 00000000 flags 00000035
> >>> [ 422.020108] mmc0: blksz 512 blocks 1 flags 00000100 tsac 400 ms nsac 0
> >>> [ 422.020124] mmc0: sdhci: IRQ status 0x00000001
> >>> [ 422.021671] mmc0: sdhci: IRQ status 0x00000002
> >>> [ 422.021691] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000
> >>> [ 422.021700] mmc0: req done (CMD25): 0: 00000900 00000000 00000000 00000000
> >>> [ 422.021708] mmc0: 512 bytes transferred: 0
> >>> [ 422.021728] mmc0: starting CMD13 arg 00010000 flags 00000195
> >>> [ 422.021743] mmc0: sdhci: IRQ status 0x00000001
> >>> [ 422.021752] mmc0: req done (CMD13): 0: 00000900 00000000 00000000 00000000
> >>> [ 422.021771] <mmc0: starting CMD23 arg 00000001 flags 00000015>
> >>> [ 422.021779] mmc0: starting CMD18 arg 00000000 flags 00000035
> >>> [ 422.021785] mmc0: blksz 512 blocks 1 flags 00000200 tsac 100 ms nsac 0
> >>> [ 422.021804] mmc0: sdhci: IRQ status 0x00000001
> >>> [ 422.022566] mmc0: sdhci: IRQ status 0x00208000 <---------------------------------- this doesnt seem right
> >>> [ 422.022629] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000
> >>> [ 422.022639] mmc0: req done (CMD18): 0: 00000900 00000000 00000000 00000000
> >>> [ 422.022647] mmc0: 0 bytes transferred: -84 < --------------------------------- it should have transfered 4096 bytes
> >>> [ 422.022669] sdhci-arasan ff160000.mmc: __mmc_blk_ioctl_cmd: data error -84
> >>> [ 422.029619] mmc0: starting CMD6 arg 03b30001 flags 0000049d
> >>> [ 422.029636] mmc0: sdhci: IRQ status 0x00000001
> >>> [ 422.029652] mmc0: sdhci: IRQ status 0x00000002
> >>> [ 422.029660] mmc0: req done (CMD6): 0: 00000800 00000000 00000000 00000000
> >>> [ 422.029680] mmc0: starting CMD13 arg 00010000 flags 00000195
> >>> [ 422.029693] mmc0: sdhci: IRQ status 0x00000001
> >>> [ 422.029702] mmc0: req done (CMD13): 0: 00000900 00000000 00000000 00000000
> >>> [ 422.196996] <mmc0: starting CMD23 arg 00000400 flags 00000015>
> >>> [ 422.197051] mmc0: starting CMD25 arg 058160e0 flags 000000b5
> >>> [ 422.197079] mmc0: blksz 512 blocks 1024 flags 00000100 tsac 400 ms nsac 0
> >>> [ 422.197110] mmc0: CMD12 arg 00000000 flags 0000049d
> >>> [ 422.199455] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.199526] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.199585] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.199641] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.199695] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.199753] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.199811] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.199865] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.199919] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.199972] mmc0: sdhci: IRQ status 0x00000020
> >>> [ 422.200026] mmc0: sdhci: IRQ status 0x00000020
> >>>
> >>>
> >>> does this help?
> >
> > Just asking because it doesn't mean much to me other than the obvious CRC
> > problem.
> >
> > Being this issue so easy to trigger - and to fix - indicates a problem
> > on the card more than on the algorithm (otherwise faults would be all
> > over the place). But I am not an expert on this area.
> >
> > any additional suggestions welcome.
>
> My guess is that sometimes tuning produces a "bad" result. Perhaps
> the margins are very tight and the difference is only 1 tap. When
> a "bad" result happens in non-RPMB, a CRC error results in re-tuning
> and retry, so no errors are seen. When it happens in RPMB, that is
> not possible, so the error is obvious. Not re-tuning before RPMB
> switch helps because the CRC-error->re-tuning to a "good" result has
> probably already happened.
>
> However, based on that theory, it is not necessary the eMMC that is
> at fault.
>
> It may be worth considering a stronger eMMC driver strength setting.
sure I can tune the value (just building now). however I am not sure
about the implications - is there any negative consequence of increasing
this value that I could monitor (if tests pass)?
>
> sdhci supports err_stats in debugfs - that may show how many CRC
> errors there are when not accessing RPMB.
ok
>
> I don't object to skipping re-tuning before RPMB switch, but I am
> not sure about tying it to a specific eMMC.
thanks. will follow up after further testing.
>
next prev parent reply other threads:[~2023-12-01 15:54 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-29 16:05 [PATCHv2] mmc: rpmb: add quirk MMC_QUIRK_BROKEN_RPMB_RETUNE Jorge Ramirez-Ortiz
2023-11-30 10:34 ` Ulf Hansson
2023-11-30 13:24 ` Jorge Ramirez-Ortiz, Foundries
2023-11-30 17:34 ` Ulf Hansson
2023-11-30 19:12 ` Adrian Hunter
2023-11-30 22:02 ` Jorge Ramirez-Ortiz, Foundries
2023-11-30 22:19 ` Jorge Ramirez-Ortiz, Foundries
2023-12-01 7:40 ` Jorge Ramirez-Ortiz, Foundries
2023-12-01 11:46 ` Adrian Hunter
2023-12-01 15:54 ` Jorge Ramirez-Ortiz, Foundries [this message]
2023-12-01 17:09 ` Jorge Ramirez-Ortiz, Foundries
2023-12-02 16:47 ` Avri Altman
2023-12-03 16:26 ` Jorge Ramirez-Ortiz, Foundries
2023-12-04 11:31 ` Avri Altman
2023-12-04 12:59 ` Avri Altman
2023-12-04 13:58 ` Jorge Ramirez-Ortiz, Foundries
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZWoBqs/5m6tCuBGo@trax \
--to=jorge@foundries.io \
--cc=CLoehle@hyperstone.com \
--cc=adrian.hunter@intel.com \
--cc=asuk4.q@gmail.com \
--cc=beanhuo@micron.com \
--cc=hare@suse.de \
--cc=jinpu.wang@ionos.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mmc@vger.kernel.org \
--cc=marex@denx.de \
--cc=rafael.beims@toradex.com \
--cc=ricardo@foundries.io \
--cc=robimarko@gmail.com \
--cc=ulf.hansson@linaro.org \
--cc=victor.shih@genesyslogic.com.tw \
--cc=yangyingliang@huawei.com \
--cc=yibin.ding@unisoc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.